US6385665B1 - System and method for managing faults in a data transmission system - Google Patents

System and method for managing faults in a data transmission system Download PDF

Info

Publication number
US6385665B1
US6385665B1 US09/216,568 US21656898A US6385665B1 US 6385665 B1 US6385665 B1 US 6385665B1 US 21656898 A US21656898 A US 21656898A US 6385665 B1 US6385665 B1 US 6385665B1
Authority
US
United States
Prior art keywords
fault
data transmission
reenabled
manager
particular type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/216,568
Inventor
Kevin E. Canady
Byron T. Butterfield
Dwight W. Doss
Dennis C. Dupont
Mark C. Tindall
Richard S. Weldon, Jr.
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alcatel USA Sourcing Inc
Original Assignee
Alcatel USA Sourcing Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alcatel USA Sourcing Inc filed Critical Alcatel USA Sourcing Inc
Priority to US09/216,568 priority Critical patent/US6385665B1/en
Assigned to DSC TELECOM L.P. reassignment DSC TELECOM L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WELDON, RICHARD S., JR., BUTTERFIELD, BYRON T., DOSS, DWIGHT W., CANADY, KEVIN E., DUPONT, DENNIS C., TINDALL, MARK C.
Assigned to ALCATEL USA SOURCING, L.P. reassignment ALCATEL USA SOURCING, L.P. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: DSC TELECOM L.P.
Application granted granted Critical
Publication of US6385665B1 publication Critical patent/US6385665B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0686Additional information in the notification, e.g. enhancement of specific meta-data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Definitions

  • the present invention relates in general to the field of telecommunications switching equipment. More particularly, the invention relates to a system and method for managing faults in a data transmission system.
  • Adequate fault management systems must not only be able to detect faults, but also to determine the cause of the fault in order to ensure that the same type of fault does not continue to occur, and also to ensure that it does not cause other types of faults to subsequently occur. To do this the fault must be “isolated” so that the physical device or element responsible for causing the fault can be identified, and the proper steps taken to ensure that the faulty device is repaired and returned to operation. Fault isolation is often achieved by providing fault detection at various points along the data transmission path. For example, if data passes through three separate processing circuits, each of which are coupled together by separate communication links, fault detection may be provided at each of the three circuits, or may even be provided at multiple points along each of the three circuits. In this manner, when a fault is detected the system can readily determine which circuit or communication link is faulty.
  • a system and method for managing faults is provided in a data transmission system having a data path for transmitting signals containing data, and a plurality of application cards along the data path for processing the signals.
  • the method includes the steps of detecting the occurrence of a first fault of a particular type by one of the application cards, and in response to detecting this fault, generating a fault report for the purpose of identifying the cause of the fault.
  • the generation of subsequent fault reports by the application card that relate to that particular type of fault are prevented until a signal is received that indicates that fault report generation may be reenabled.
  • Subsequent steps may include receiving this signal and reenabling fault report generation, and generating a subsequent fault report in response to detecting a subsequent fault of that particular type.
  • Further steps may also include in response to detecting the first fault of the particular type, setting a fault status indicator associated with the application card that represents the particular fault type, and in response to receiving the signal indicating that fault report generation may be reenabled, clearing the fault status indicator.
  • the method includes detecting the occurrence of a first fault of a particular type by one of the plurality of application cards and determining a priority level of the detected fault in response to its detection.
  • the application card generates a fault report for the purpose of identifying the cause of the detected fault.
  • the application card prevents the generation of subsequent fault reports relating to faults of the determined priority level and lower until receiving a signal indicating that fault report generation may be reenabled.
  • Subsequent steps may include receiving this signal and reenabling fault report generation, and generating a subsequent fault report in response to detecting a subsequent fault of the determined priority level or lower. Further steps may also include in response to detecting the first fault of the particular type, setting a fault status indicator associated with the application card that represents the particular fault type, and in response to receiving the signal indicating that fault report generation may be reenabled, clearing the fault status indicator.
  • a fault management system for managing faults in a data transmission system
  • the data transmission system includes a data transmission path for transmitting signals containing data, a plurality of application cards along the data path for processing the signals, at least one unit controller for controlling the application cards, and at least one system manager for controlling the at least one unit controller.
  • the system includes application card software residing on the plurality of application cards, unit controller software residing on the at least one unit controller, and system manager software residing on the at least one system manager.
  • the application card software is capable of generating a first fault report in response to detecting that a first fault of a particular type has occurred in the data transmission system, and also of suppressing the generation of subsequent fault reports relating to faults of that particular type until receiving a signal indicating that fault report generation may be reenabled.
  • the fault report is sent to a fault management subroutine within the unit controller software, and the signal indicating that fault report generation may be reenabled is received from the fault manager.
  • the fault report is sent to a fault management subroutine within the system manager software, and the signal indicating that fault report generation is to be reenabled is received from the fault manager.
  • FIG. 1 is a system block diagram of an embodiment of a telecommunications switching system in which the fault management system and method of the present invention may be employed;
  • FIG. 2 is a block diagram illustrating the software control hierarchy in an exemplary embodiment of a fault management system
  • FIG. 3 is a block diagram illustrating the interaction between software building blocks in one embodiment a the fault management system
  • FIG. 4 is a block diagram illustrating the interaction of the Fault Management software building block with other building blocks
  • FIG. 5 is a flow chart illustrating fault report filtering by fault type.
  • FIG. 6 is a flow chart illustrating fault report filtering by fault priority level.
  • the fault management system and method of the present invention can be implemented in any type of data transmission system.
  • a particularly suitable application is in a telecommunication switching system where data integrity and reliable transmission are particularly important.
  • the present invention will be described below in relation to a telecommunications switching system, it is to be understood that the invention is not so limited.
  • telecommunications signals such as digitally encoded optical telecommunications data
  • a conductor such as an optical conductor 2 .
  • These signals are received by a delivery unit 3 and are processed by a series of different application cards 5 that each contain application circuitry that contributes to converting the optical signals into electrical signals and further processing these signals.
  • the electrical signals are then transmitted to a switch 6 for switching the electrical signals.
  • the various application circuits on application cards 5 may include optical terminator circuits that receive and convert the optical signals into electrical signals, and various other circuits that receive and terminate those electrical signals and perform the necessary multiplexing and demultiplexing to the appropriate signal levels for switching by switch 6 .
  • the series of application cards 5 that form the data transmission path between the incoming optical signals and switch 6 constitute a “shelf”. Although only one shelf is shown in FIG. 1, there may be multiple shelves within delivery unit 3 , each being controlled and managed by a unit controller 8 .
  • Unit controller 8 provides administration and maintenance for application cards 5 within delivery unit 3 by sending control data to and receiving status information from application cards 5 .
  • a service unit 10 is coupled to delivery unit 3 and includes one or more system managers 12 .
  • System manager 12 provides centralized control, administration operations, and maintenance for delivery unit 3 . Although only one delivery unit is shown in FIG. 1, the telecommunications switching system many include multiple delivery units, each under the common control of service unit 10 .
  • delivery unit 3 also includes software necessary to multiplex and demultiplex optical signals to the appropriate signal levels, and to interface these signals to switch 6 .
  • unit controller 8 and service unit 10 include unit controller software 21 and system manager software 22 respectively, that is necessary to the control, administration and maintenance functions that these devices perform.
  • application card software 20 residing on application cards 5 together with the unit controller software, and the system manager software form part of the fault management system, that will be described below.
  • the software is composed of software building blocks in an object oriented programming environment. Each building block is a software product comprised of objects that interface with other building blocks.
  • FIG. 2 illustrates the software management and control hierarchy for an exemplary telecommunication switching system in which the system and method of the present invention may be employed.
  • application card software 20 At the bottom of the hierarchy is application card software 20 .
  • Software at this level communicates to product specific interfaces, such as the individual application circuits.
  • Unit controller software 21 that is associated with unit controller 8 , manages the application cards, and provides the maintenance and administration functions for these application cards.
  • system manager software 22 in system manager 12 of service unit 10 provides centralized control and administration over delivery units 3 .
  • the fault management system and method of the present invention occupies all architectural layers, and is primarily divided into two major building blocks, Fault Detection, and Fault Managing.
  • the Fault Management building block includes a Fault Routing subroutine.
  • the Fault Detection building block provides constant monitor tests, or “heartbeat” tests, at service unit 10 and unit controller 8 layers. As shown in FIG. 3, Fault Detection on unit controller 8 periodically sends out heartbeat tests 301 to selected application cards 5 . If the application cards are operating properly, each will return an acknowledgment response 302 . If an expected acknowledgment response is not received, Fault Detection may issue a fault report reporting on the malfunctioning application card 5 (see FIG. 4 ). Similarly, the Fault Detection on system manager 12 will periodically conduct heartbeat tests 303 and 304 on unit controllers 8 and on selected application cards 5 . The unit controllers and application cards receiving the heartbeat messages will likewise acknowledge the message 305 , 306 to indicate that they are functioning properly.
  • Fault Detection at the unit controller and system manager layers also periodically conduct self-tests, as indicated by 307 and 308 in FIG. 3 .
  • These self-tests may include, for example, memory integrity tests, I/O module functionality tests, and processor status tests.
  • application cards 5 and unit controllers 8 may also issue various fault reports themselves that are not prompted by Fault Detection, but are nevertheless received and managed by the fault management system.
  • application cards 5 are each capable of detecting device faults, timing faults, and/or path faults.
  • Device faults include those faults that can be directly attributed to the failure of a device, and timing faults are faults that result from a failure of the timing network.
  • path faults are faults that result from erroneous and degraded connections in the signal data paths of the telecommunications switching system.
  • Path faults may include parity errors, cyclical redundancy errors, and path verification errors. These types of faults may be detected at various points along the data transmission path, as was described above, and by any of various means that are well known in the art. For example, the appropriate means may be employed so that these types of faults are detected at two points along each application card in the data transmission path, such as at both the receiving end and at the transmitting end of the card.
  • the fault reports referred to above include the data and information necessary to allow the Fault Management building block to assess and isolate the detected fault, and to take the steps necessary to correct the problem. Information such as the time of the event, the fault type, the identification of the failed device, the identification of the component where the failed device resides, the identification of the fault detector, the priority of the fault (described below), and the destination of the fault report may all be included within the fault report. As shown in FIG. 4, fault reports that are issued by application cards 5 or by Fault Detection 300 are forwarded to Fault Management 400 , which occupies both the system manager and unit controller architectural layers.
  • the destination when a fault report is issued by unit controller 8 , application card 5 , or by Fault Detection, the destination is specified as the Fault Management building block in next higher architectural layer; and the destination of a fault report issued at the system manager layer is the Fault Management building block in the same architectural layer.
  • a Fault Routing 401 subroutine may be included within the Fault Management 400 building block, which allows any building block in the same architectural layer, or any building block in a higher architectural layer to register to receive fault reports generated by specified devices.
  • fault reports are received by Fault Routing 401 , and if a destination is not designated in the fault report, Fault Reporting will consult a registration table 402 to determine if any building block has registered to receive that unit's fault reports. If so, the fault report will be routed to the destination specified in the registration table. Otherwise, the a default destination will be assigned, such as that described above.
  • fault reports are sent to Fault Management at either the unit controller 8 layer or the system manager 12 layer.
  • Fault Management 400 receives a fault report it attempts to isolate the cause of the fault to a particular device, then removes the faulty device from service, and later returns the device to service when the problem has been corrected.
  • Fault Management interfaces with several other building blocks to accomplish these tasks. As shown in FIG. 4, Fault Management 400 may interface with a test management building block 403 to make test requests to gather information for the purpose of isolating the faulty device. It may also interface with a configuration management building block 404 to request removal of faulty devices from service, and to restore those devices once any problems have been corrected. Finally, Fault Management may interface with a fault history database 405 that includes the status of the current fault state of each device, as well as the last fault isolated to each device.
  • system described above is an exemplary system, and the inventive aspect of the fault management system as described below may be implemented in such a system, or in any fault management system having a similar architecture.
  • this status will be maintained on the card until the card is instructed to clear the flag by Fault Management, such as after Fault Management has isolated and corrected the problem that caused the fault report to be generated. Further, once a latched fault flag has been set, the application card software will cause all subsequent fault reports relating to that fault type to be suppressed until the latched fault flag is cleared, thereby ensuring that only the first fault of a given type is reported and subsequently addressed by Fault Management.
  • the fault management system may interrogate any card at any time as to its latched fault status by examining the status of the latched fault flags, thereby providing an additional way in which to isolate faults.
  • FIG. 5 illustrates an exemplary procedure by which an application card may filter fault reports of a specified type to make the fault management system of the present invention more efficient and more reliable. Although described in conjunction with a particular type of fault, it is to be understood that the procedure described below applies independently to monitoring faults of all types.
  • the application card detects the occurrence of a particular type of fault, such as a data fault. This may be achieved, for example, by either periodically monitoring the status of a detection device or circuit, or by receiving a signal, such as an interrupt signal, in response to the detection of a fault by the detection device.
  • the card will set the latched fault status flag in step 503 and then determine in step 501 whether fault reports of that type are currently being suppressed. As described above, fault reports of that type will be suppressed if a fault of that type was previously detected and reported, and the appropriate latched fault status flag is still set. If such fault reports are currently being suppressed, the card will wait until another fault is detected, and then return to step 500 .
  • the fault will be recorded in step 502 for record keeping purposes, but will not be forwarded to Fault Management for the purpose of being acted upon by Fault Management. If fault reports of that type are not currently being suppressed, the card will proceed in step 504 to generate a fault report and forward it to Fault Management. The application card software will then proceed to suppress the generation of subsequent fault reports in step 505 , and will continue to suppress these fault reports until it has been instructed to clear the latched fault status flag by Fault Management. It should be understood that latched fault status flag may also be set either after the fault report has been generated, or after the appropriate steps have been taken to suppress the generation of subsequent reports.
  • the specified flag When an instruction to clear the latched fault status flag is received from Fault Management, the specified flag will be cleared and the application card software will reenable fault report generation for fault reports relating to the corresponding type of fault. Thus, once reenabled, on a subsequent pass through the steps of FIG. 5, when the next fault of that type is detected, the application card will determine in step 502 that fault reports are not currently being suppressed, and will proceed to generate a fault report for that fault.
  • the types of faults may also be prioritized to allow more critical higher priority faults to be addressed first.
  • data faults may be considered more critical than timing faults, and thus assigned a higher priority in the fault management system.
  • the latch fault status flag is set in step 607 followed by the determination of the priority level of that fault in step 601 . Once the priority is determined, in step 602 it will be determined whether or not fault reports of that priority level have been suppressed.
  • the appropriate latched fault status flag is set in step 603 , a fault report is generated in step 604 , and subsequent fault report generation for fault reports relating to faults of that priority level and lower priority levels will then be suppressed in step 605 .
  • fault reports of the appropriate priority level will be suppressed until the application card receives an instruction from Fault Management to clear the appropriate latched fault status flag, at which time the flag will be cleared and fault report generation will be reenabled for the corresponding fault priority levels. Once reenabled, on a subsequent pass through the steps of FIG. 6, the determination made at step 603 will result in a report being generated for the next occurring fault.
  • the application card software will determine whether the latched fault status flag for that fault type has been set in step 606 and, if not, it will record the fault in step 608 . This is done to ensure that the latched fault status flag accurately reflects whether each type of fault has occurred since the last time the flags were cleared. Since fault reports of a given priority level and lower may be suppressed in this embodiment, the fact that one type of fault will not be reported will not necessarily mean that that type of fault has, in fact, occurred.
  • Unit controllers 8 may similarly filter fault reports by, for example, processing and generating a fault report for only the first path fault received from the application cards, or for the first fault of a given priority level that is received. Further, at the highest level the fault management system will act only on the first fault of a designated type or priority level that is reported. Thus, the present invention, by introducing fault report filtering at each level, will enable the fault management system to more efficiently isolate and address faults by focusing on faults that are more likely to be at the root of the problem, rather than those that likely have resulted from or have been spawned by initial faults.
  • Other modifications of the invention described above will be obvious to those skilled in the art, and it is intended that the scope of the invention be limited only as set forth in the appended claims.

Abstract

A system and method for managing faults in a data transmission system that includes a data path for transmitting signals containing data, and a plurality of application cards located along the data path for processing the signals. The method includes the steps of detecting the occurrence of a fault of a particular type at one of a plurality of points along the data transmission path; in response, the application card generating a fault report for the purpose of identifying the cause of the detected fault; and, preventing the generation by the application card of subsequent fault reports relating to that particular type of fault until receiving a signal indicating that fault report generation may be reenabled. In the fault management system, the data transmission system also includes at least one unit controller for controlling the application cards, and at least one system manager for controlling the at least one unit controller. The system further includes application card software residing on the plurality of application cards, unit controller software residing on the at least one unit controller, and system manager software residing on the at least one system manager. The application card software is capable of generating a first fault report in response to detecting that a first fault of a particular type has occurred in the data transmission system, and suppressing the generation of subsequent fault reports relating to that particular type of fault until receiving a signal indicating that fault report generation may be reenabled.

Description

TECHNICAL FIELD OF THE INVENTION
The present invention relates in general to the field of telecommunications switching equipment. More particularly, the invention relates to a system and method for managing faults in a data transmission system.
BACKGROUND OF THE INVENTION
In any type of data transmission system, the ability to reliably transmit data without interruption is of the utmost importance. Data transmission, however, is always subject to error or faults due to signal integrity problems and/or failure of the physical devices or elements that form the data transmission path. To address these inevitable faults, most data transmission systems will include a subsystem or process by which data and device faults are detected and corrected. Such “fault management” systems are intended to locate and correct system faults in the most efficient manner so that service disruptions are minimized
Adequate fault management systems must not only be able to detect faults, but also to determine the cause of the fault in order to ensure that the same type of fault does not continue to occur, and also to ensure that it does not cause other types of faults to subsequently occur. To do this the fault must be “isolated” so that the physical device or element responsible for causing the fault can be identified, and the proper steps taken to ensure that the faulty device is repaired and returned to operation. Fault isolation is often achieved by providing fault detection at various points along the data transmission path. For example, if data passes through three separate processing circuits, each of which are coupled together by separate communication links, fault detection may be provided at each of the three circuits, or may even be provided at multiple points along each of the three circuits. In this manner, when a fault is detected the system can readily determine which circuit or communication link is faulty.
In known fault management systems, each time a fault is detected anywhere in the data transmission system a fault report identifying the fault is generated and forwarded to a centralized fault management node. This central node will then attempt to isolate the fault, and perform the steps necessary to correct the problem. Thus, every fault that is detected is reported, and is individually addressed by this centralized fault management node.
Individually addressing each and every fault, however, is inefficient and has many drawbacks that adversely affect system performance. The system is inefficient because not all faults need to be reported and addressed. Often times a single initial fault will spawn many subsequent faults, but if the underlying fault is isolated and corrected, the subsequently spawned faults will correct themselves. For example, a timing fault caused by a defective timing circuit may appear as a data integrity fault at various places along the data transmission path, and be detected as such at each of these places. Thus, one timing fault leads to multiple subsequent faults. Of each of these detected faults, however, only the very first generated fault report is helpful in isolating and correcting the source of the problem. It is only the initial fault that is critical to isolate and address, and once corrected the subsequent resulting faults will be eliminated automatically. Thus, under many circumstances, subsequent fault reports are superfluous, and the processing of these superfluous reports utilizes resources of the fault management system that could be better used on addressing more urgent or more critical fault reports. Accordingly, known fault management systems unnecessarily address each and every fault, and therefore, do not provide the most effective manner by which to manage faults.
SUMMARY OF THE INVENTION
Accordingly, a need currently exists for a method for managing faults in a data transmission system that is more efficient in managing faults, and that reduces the burden on the fault management system of addressing each fault that is detected.
In accordance with the present invention, a system and method for managing faults is provided in a data transmission system having a data path for transmitting signals containing data, and a plurality of application cards along the data path for processing the signals. The method includes the steps of detecting the occurrence of a first fault of a particular type by one of the application cards, and in response to detecting this fault, generating a fault report for the purpose of identifying the cause of the fault. Next, the generation of subsequent fault reports by the application card that relate to that particular type of fault are prevented until a signal is received that indicates that fault report generation may be reenabled. Subsequent steps may include receiving this signal and reenabling fault report generation, and generating a subsequent fault report in response to detecting a subsequent fault of that particular type. Further steps may also include in response to detecting the first fault of the particular type, setting a fault status indicator associated with the application card that represents the particular fault type, and in response to receiving the signal indicating that fault report generation may be reenabled, clearing the fault status indicator.
In an alternate embodiment of the present invention, the method includes detecting the occurrence of a first fault of a particular type by one of the plurality of application cards and determining a priority level of the detected fault in response to its detection. The application card generates a fault report for the purpose of identifying the cause of the detected fault. Subsequently, the application card prevents the generation of subsequent fault reports relating to faults of the determined priority level and lower until receiving a signal indicating that fault report generation may be reenabled. Subsequent steps may include receiving this signal and reenabling fault report generation, and generating a subsequent fault report in response to detecting a subsequent fault of the determined priority level or lower. Further steps may also include in response to detecting the first fault of the particular type, setting a fault status indicator associated with the application card that represents the particular fault type, and in response to receiving the signal indicating that fault report generation may be reenabled, clearing the fault status indicator.
In another embodiment according to the present invention, a fault management system for managing faults in a data transmission system is provided, where the data transmission system includes a data transmission path for transmitting signals containing data, a plurality of application cards along the data path for processing the signals, at least one unit controller for controlling the application cards, and at least one system manager for controlling the at least one unit controller. The system includes application card software residing on the plurality of application cards, unit controller software residing on the at least one unit controller, and system manager software residing on the at least one system manager. The application card software is capable of generating a first fault report in response to detecting that a first fault of a particular type has occurred in the data transmission system, and also of suppressing the generation of subsequent fault reports relating to faults of that particular type until receiving a signal indicating that fault report generation may be reenabled. In one embodiment the fault report is sent to a fault management subroutine within the unit controller software, and the signal indicating that fault report generation may be reenabled is received from the fault manager. In an alternate embodiment the fault report is sent to a fault management subroutine within the system manager software, and the signal indicating that fault report generation is to be reenabled is received from the fault manager.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying drawings in which like reference numbers indicate like features wherein:
FIG. 1 is a system block diagram of an embodiment of a telecommunications switching system in which the fault management system and method of the present invention may be employed;
FIG. 2 is a block diagram illustrating the software control hierarchy in an exemplary embodiment of a fault management system;
FIG. 3 is a block diagram illustrating the interaction between software building blocks in one embodiment a the fault management system;
FIG. 4 is a block diagram illustrating the interaction of the Fault Management software building block with other building blocks;
FIG. 5 is a flow chart illustrating fault report filtering by fault type; and
FIG. 6 is a flow chart illustrating fault report filtering by fault priority level.
DETAILED DESCRIPTION OF THE INVENTION
The fault management system and method of the present invention can be implemented in any type of data transmission system. A particularly suitable application is in a telecommunication switching system where data integrity and reliable transmission are particularly important. Although the present invention will be described below in relation to a telecommunications switching system, it is to be understood that the invention is not so limited.
Referring now to FIG. 1, in an exemplary telecommunications switching system 1, telecommunications signals, such as digitally encoded optical telecommunications data, are transmitted over a conductor, such as an optical conductor 2. These signals are received by a delivery unit 3 and are processed by a series of different application cards 5 that each contain application circuitry that contributes to converting the optical signals into electrical signals and further processing these signals. The electrical signals are then transmitted to a switch 6 for switching the electrical signals. The various application circuits on application cards 5 may include optical terminator circuits that receive and convert the optical signals into electrical signals, and various other circuits that receive and terminate those electrical signals and perform the necessary multiplexing and demultiplexing to the appropriate signal levels for switching by switch 6. The series of application cards 5 that form the data transmission path between the incoming optical signals and switch 6 constitute a “shelf”. Although only one shelf is shown in FIG. 1, there may be multiple shelves within delivery unit 3, each being controlled and managed by a unit controller 8.
Unit controller 8 provides administration and maintenance for application cards 5 within delivery unit 3 by sending control data to and receiving status information from application cards 5. A service unit 10 is coupled to delivery unit 3 and includes one or more system managers 12. System manager 12 provides centralized control, administration operations, and maintenance for delivery unit 3. Although only one delivery unit is shown in FIG. 1, the telecommunications switching system many include multiple delivery units, each under the common control of service unit 10.
In addition to the hardware elements described above, delivery unit 3 also includes software necessary to multiplex and demultiplex optical signals to the appropriate signal levels, and to interface these signals to switch 6. In particular, unit controller 8 and service unit 10 include unit controller software 21 and system manager software 22 respectively, that is necessary to the control, administration and maintenance functions that these devices perform. Further, application card software 20 residing on application cards 5, together with the unit controller software, and the system manager software form part of the fault management system, that will be described below. In an exemplary embodiment, the software is composed of software building blocks in an object oriented programming environment. Each building block is a software product comprised of objects that interface with other building blocks.
FIG. 2 illustrates the software management and control hierarchy for an exemplary telecommunication switching system in which the system and method of the present invention may be employed. At the bottom of the hierarchy is application card software 20. Software at this level communicates to product specific interfaces, such as the individual application circuits. Unit controller software 21 that is associated with unit controller 8, manages the application cards, and provides the maintenance and administration functions for these application cards. Finally, system manager software 22 in system manager 12 of service unit 10 provides centralized control and administration over delivery units 3.
The fault management system and method of the present invention occupies all architectural layers, and is primarily divided into two major building blocks, Fault Detection, and Fault Managing. According to one embodiment the Fault Management building block includes a Fault Routing subroutine.
The Fault Detection building block provides constant monitor tests, or “heartbeat” tests, at service unit 10 and unit controller 8 layers. As shown in FIG. 3, Fault Detection on unit controller 8 periodically sends out heartbeat tests 301 to selected application cards 5. If the application cards are operating properly, each will return an acknowledgment response 302. If an expected acknowledgment response is not received, Fault Detection may issue a fault report reporting on the malfunctioning application card 5 (see FIG. 4). Similarly, the Fault Detection on system manager 12 will periodically conduct heartbeat tests 303 and 304 on unit controllers 8 and on selected application cards 5. The unit controllers and application cards receiving the heartbeat messages will likewise acknowledge the message 305, 306 to indicate that they are functioning properly. Otherwise, a fault report may be issued by the Fault Detection building block that did not receive the expected acknowledgment. Fault Detection at the unit controller and system manager layers also periodically conduct self-tests, as indicated by 307 and 308 in FIG. 3. These self-tests may include, for example, memory integrity tests, I/O module functionality tests, and processor status tests.
In addition to the fault reports that may be generated in response to the heartbeat and self-tests initiated by the Fault Detection building block, application cards 5 and unit controllers 8 may also issue various fault reports themselves that are not prompted by Fault Detection, but are nevertheless received and managed by the fault management system. In one embodiment, application cards 5 are each capable of detecting device faults, timing faults, and/or path faults. Device faults include those faults that can be directly attributed to the failure of a device, and timing faults are faults that result from a failure of the timing network. Finally, path faults are faults that result from erroneous and degraded connections in the signal data paths of the telecommunications switching system. Path faults may include parity errors, cyclical redundancy errors, and path verification errors. These types of faults may be detected at various points along the data transmission path, as was described above, and by any of various means that are well known in the art. For example, the appropriate means may be employed so that these types of faults are detected at two points along each application card in the data transmission path, such as at both the receiving end and at the transmitting end of the card.
The fault reports referred to above include the data and information necessary to allow the Fault Management building block to assess and isolate the detected fault, and to take the steps necessary to correct the problem. Information such as the time of the event, the fault type, the identification of the failed device, the identification of the component where the failed device resides, the identification of the fault detector, the priority of the fault (described below), and the destination of the fault report may all be included within the fault report. As shown in FIG. 4, fault reports that are issued by application cards 5 or by Fault Detection 300 are forwarded to Fault Management 400, which occupies both the system manager and unit controller architectural layers. In one embodiment of the invention, when a fault report is issued by unit controller 8, application card 5, or by Fault Detection, the destination is specified as the Fault Management building block in next higher architectural layer; and the destination of a fault report issued at the system manager layer is the Fault Management building block in the same architectural layer.
In an alternate embodiment, however, a Fault Routing 401 subroutine may be included within the Fault Management 400 building block, which allows any building block in the same architectural layer, or any building block in a higher architectural layer to register to receive fault reports generated by specified devices. In this alternate embodiment, fault reports are received by Fault Routing 401, and if a destination is not designated in the fault report, Fault Reporting will consult a registration table 402 to determine if any building block has registered to receive that unit's fault reports. If so, the fault report will be routed to the destination specified in the registration table. Otherwise, the a default destination will be assigned, such as that described above.
As indicated above, in one embodiment fault reports are sent to Fault Management at either the unit controller 8 layer or the system manager 12 layer. Once Fault Management 400 receives a fault report it attempts to isolate the cause of the fault to a particular device, then removes the faulty device from service, and later returns the device to service when the problem has been corrected. At both the system manager and the unit controller level, Fault Management interfaces with several other building blocks to accomplish these tasks. As shown in FIG. 4, Fault Management 400 may interface with a test management building block 403 to make test requests to gather information for the purpose of isolating the faulty device. It may also interface with a configuration management building block 404 to request removal of faulty devices from service, and to restore those devices once any problems have been corrected. Finally, Fault Management may interface with a fault history database 405 that includes the status of the current fault state of each device, as well as the last fault isolated to each device.
As indicated, the system described above is an exemplary system, and the inventive aspect of the fault management system as described below may be implemented in such a system, or in any fault management system having a similar architecture.
The fault management system of the present invention includes a system and method for filtering fault reports, thereby overcoming the above-described disadvantages of known systems in which all detected faults are reported and addressed.
In one embodiment of the present invention, fault reports are filtered by allowing each card, either an application card or unit controller card, to report only the first fault of a given type, i.e., a data, path, or device fault, that is received. The card is then precluded from issuing subsequent fault reports of that type until instructed otherwise by Fault Management. In one embodiment, each card will include multiple “latched fault” flags, each one of which corresponds to a different fault type. The status of these flags represent the card's “latched fault status.” When a particular latched fault flag is set, it indicates that a fault of the corresponding fault type has occurred since the last time the flag was reset. Once any of these flags have been set, this status will be maintained on the card until the card is instructed to clear the flag by Fault Management, such as after Fault Management has isolated and corrected the problem that caused the fault report to be generated. Further, once a latched fault flag has been set, the application card software will cause all subsequent fault reports relating to that fault type to be suppressed until the latched fault flag is cleared, thereby ensuring that only the first fault of a given type is reported and subsequently addressed by Fault Management. The fault management system may interrogate any card at any time as to its latched fault status by examining the status of the latched fault flags, thereby providing an additional way in which to isolate faults.
FIG. 5 illustrates an exemplary procedure by which an application card may filter fault reports of a specified type to make the fault management system of the present invention more efficient and more reliable. Although described in conjunction with a particular type of fault, it is to be understood that the procedure described below applies independently to monitoring faults of all types.
In step 500, the application card detects the occurrence of a particular type of fault, such as a data fault. This may be achieved, for example, by either periodically monitoring the status of a detection device or circuit, or by receiving a signal, such as an interrupt signal, in response to the detection of a fault by the detection device. Once detected, the card will set the latched fault status flag in step 503 and then determine in step 501 whether fault reports of that type are currently being suppressed. As described above, fault reports of that type will be suppressed if a fault of that type was previously detected and reported, and the appropriate latched fault status flag is still set. If such fault reports are currently being suppressed, the card will wait until another fault is detected, and then return to step 500. In one embodiment, however, before returning to wait until another fault is detected, the fault will be recorded in step 502 for record keeping purposes, but will not be forwarded to Fault Management for the purpose of being acted upon by Fault Management. If fault reports of that type are not currently being suppressed, the card will proceed in step 504 to generate a fault report and forward it to Fault Management. The application card software will then proceed to suppress the generation of subsequent fault reports in step 505, and will continue to suppress these fault reports until it has been instructed to clear the latched fault status flag by Fault Management. It should be understood that latched fault status flag may also be set either after the fault report has been generated, or after the appropriate steps have been taken to suppress the generation of subsequent reports. When an instruction to clear the latched fault status flag is received from Fault Management, the specified flag will be cleared and the application card software will reenable fault report generation for fault reports relating to the corresponding type of fault. Thus, once reenabled, on a subsequent pass through the steps of FIG. 5, when the next fault of that type is detected, the application card will determine in step 502 that fault reports are not currently being suppressed, and will proceed to generate a fault report for that fault.
In an alternate embodiment of the present invention, the types of faults may also be prioritized to allow more critical higher priority faults to be addressed first. For example, data faults may be considered more critical than timing faults, and thus assigned a higher priority in the fault management system. In this embodiment, as shown in FIG. 6, in response to detecting that a fault of a particular type has occurred in step 600, the latch fault status flag is set in step 607 followed by the determination of the priority level of that fault in step 601. Once the priority is determined, in step 602 it will be determined whether or not fault reports of that priority level have been suppressed. If they have not been suppressed, then the appropriate latched fault status flag is set in step 603, a fault report is generated in step 604, and subsequent fault report generation for fault reports relating to faults of that priority level and lower priority levels will then be suppressed in step 605. In a similar manner to that described above for filtering fault reports by type, fault reports of the appropriate priority level will be suppressed until the application card receives an instruction from Fault Management to clear the appropriate latched fault status flag, at which time the flag will be cleared and fault report generation will be reenabled for the corresponding fault priority levels. Once reenabled, on a subsequent pass through the steps of FIG. 6, the determination made at step 603 will result in a report being generated for the next occurring fault.
In one embodiment, if at step 602, it is determined that fault reports for that fault level are currently being suppressed, the application card software will determine whether the latched fault status flag for that fault type has been set in step 606 and, if not, it will record the fault in step 608. This is done to ensure that the latched fault status flag accurately reflects whether each type of fault has occurred since the last time the flags were cleared. Since fault reports of a given priority level and lower may be suppressed in this embodiment, the fact that one type of fault will not be reported will not necessarily mean that that type of fault has, in fact, occurred. Thus, by performing steps 606 and 608, although a fault report will not be generated and forwarded to Fault Management, the fault will be recorded and the state of the latched fault status flag will accurately reflect whether faults of the various types have actually occurred since the last time these flags were monitored by the fault management system.
The system and method for filtering faults described above with respect to application cards apply equally well at the unit controller level. Unit controllers 8 may similarly filter fault reports by, for example, processing and generating a fault report for only the first path fault received from the application cards, or for the first fault of a given priority level that is received. Further, at the highest level the fault management system will act only on the first fault of a designated type or priority level that is reported. Thus, the present invention, by introducing fault report filtering at each level, will enable the fault management system to more efficiently isolate and address faults by focusing on faults that are more likely to be at the root of the problem, rather than those that likely have resulted from or have been spawned by initial faults. Other modifications of the invention described above will be obvious to those skilled in the art, and it is intended that the scope of the invention be limited only as set forth in the appended claims.

Claims (18)

What is claimed is:
1. A method for managing faults in a data transmission system, said data transmission system including a data transmission path for transmitting signals containing data, and a plurality of application cards along said data path for processing said signals, comprising the steps of:
detecting an occurrence of a first fault of a particular type at one of a plurality of detection points along said data transmission path;
in response to detecting said first fault of said particular type, one of said application cards generating a fault report for the purpose of identifying the cause of said detected fault; and
preventing generation by said application card of subsequent fault reports relating to said particular type of fault until receiving a signal indicating that fault report generation may be reenabled.
2. The method according to claim 1, further comprising the steps of:
said application card receiving said signal indicating that fault report generation may be reenabled, and in response, reenabling fault report generation; and
said application card generating a subsequent fault report in response to detection of a subsequent fault of said particular type.
3. The method for managing faults according to claim 2, further comprising the steps of:
in response to detecting said first fault of said particular type, setting a fault status indicator associated with said application card that represents said particular fault type; and
in response to receiving said signal indicating that fault report generation may be reenabled, clearing said fault status indicator.
4. The method according to claim 3, wherein said data transmission system further includes a control unit for controlling said application cards, said unit controller including unit controller software, said method further comprising the step of sending said fault report to a fault manager subroutine of said unit controller software, said signal indicating that fault report generation may be reenabled being received from said fault manager.
5. The method according to claim 3, wherein said data transmission system further includes a unit controller for controlling said application cards, and a system manager for controlling said unit controller, said system manager including system manager software, said method further comprising the step of sending said fault report to a fault manager subroutine of said system manager software, said signal indicating that fault report generation may be reenabled being received from said fault manager.
6. The method according to claim 2, wherein said detected fault is a path fault.
7. The method according to claim 2, wherein said detected fault is a device fault.
8. The method according to claim 2, wherein said detected fault is a timing fault.
9. A method for managing faults in a data transmission system, said data transmission system including a data path for transmitting signals containing data, and a plurality of application cards along said data path for processing said signals, comprising the steps of:
detecting an occurrence of a first fault of a particular type at one of plurality of detection points along said data transmission path;
determining a priority level of said detected fault;
in response to detecting said first fault of said particular type, one of plurality of said application cards generating a fault report for the purpose of identifying the cause of said detected fault;
preventing generation by said application card of subsequent fault reports relating to faults of said determined priority level or lower until receiving a signal indicating that fault report generation may be reenabled.
10. The method according to claim 9, further comprising the steps of:
said application card receiving said signal, and in response, reenabling fault report generation; and
said application card generating a subsequent fault report in response to detection of a subsequent fault of said determined priority level or lower.
11. The method for managing faults according to claim 10, further comprising the steps of:
in response to detection of said fault of said particular type, setting a fault status indicator associated with said application card that represents said particular fault type; and
in response to receiving said signal indicating that fault report generation may be reenabled, clearing said fault status indicator.
12. The method according to claim 11, further comprising the step of sending said fault report to a fault manager within said data transmission system, wherein said signal indicating that fault report generation may be reenabled is received from said fault manager.
13. A method for managing faults in data transmission system, said data transmission system including a data path for transmitting signals containing data, and a plurality of application cards along said data path for processing said signals, comprising the steps of:
detecting an occurrence of a first fault of a particular type at one of plurality of detection points along said data transmission path;
in response to detecting said fault of said particular type, determining whether fault reports relating to said particular fault type are being suppressed;
if fault reports of said particular type are being suppressed, then waiting until another fault occurs and then returning to the detecting step;
if fault reports of said particular type are not being suppressed, then
setting a fault status indicator associated with said application card that represents said particular fault type;
said application card generating a fault report for the purpose of identifying the cause of said detected fault;
preventing generation by said application card of subsequent fault reports relating to said particular fault type until receiving a signal indicating that fault report generation may be reenabled; and
waiting until another fault occurs and then returning to the detecting step.
14. The method according to claim 13, further comprising the steps of:
said application card receiving said signal indicating that fault report generation should be reenabled, and in response, reenabling said fault report generation; and
said application card generating a subsequent fault report in response to detection of a subsequent fault of said particular type.
15. The method according to claim 14, further comprising the step of sending said fault report to a fault manager within said data transmission system, wherein said signal indicating that fault report generation may be reenabled is received from said fault manager.
16. A fault management system for managing faults in a data transmission system, said data transmission system including a data transmission path for transmitting signals containing data, a plurality of application cards along said data path for processing said signals, at least one unit controller for controlling said application cards, and at least one system manager for controlling said at least one unit controller, said system comprising:
an application card software residing on said plurality of application cards;
a unit controller software residing on said at least one unit controller; and
a system manager software residing on said at least one system manager, said application card software being capable of generating a first fault report in response to detecting that a first fault of a particular type has occurred in the data transmission system, and suppressing the generation of subsequent fault reports relating to said particular type of fault until receiving a signal indicating that fault report generation may be reenabled.
17. The system according to claim 16, wherein said fault report is sent to a fault management subroutine within said unit controller software, and wherein said signal indicating that fault report generation may be reenabled is received from said fault manager.
18. The system according to claim 16, wherein said fault report is sent to a fault management subroutine within said system manager software, and wherein said signal indicating that fault report generation may be reenabled is received from said fault manager.
US09/216,568 1998-12-18 1998-12-18 System and method for managing faults in a data transmission system Expired - Fee Related US6385665B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/216,568 US6385665B1 (en) 1998-12-18 1998-12-18 System and method for managing faults in a data transmission system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/216,568 US6385665B1 (en) 1998-12-18 1998-12-18 System and method for managing faults in a data transmission system

Publications (1)

Publication Number Publication Date
US6385665B1 true US6385665B1 (en) 2002-05-07

Family

ID=22807577

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/216,568 Expired - Fee Related US6385665B1 (en) 1998-12-18 1998-12-18 System and method for managing faults in a data transmission system

Country Status (1)

Country Link
US (1) US6385665B1 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138791A1 (en) * 2001-03-23 2002-09-26 Paul Durrant Computer system
US20020138782A1 (en) * 2001-03-23 2002-09-26 Paul Durrant Computer system
US6560738B1 (en) * 1999-07-06 2003-05-06 Nec Electronics Corporation Fault propagation path estimating method, fault propagation path estimating apparatus and recording media
US20030163764A1 (en) * 1999-10-06 2003-08-28 Sun Microsystems, Inc. Mechanism to improve fault isolation and diagnosis in computers
US20030208705A1 (en) * 2002-05-01 2003-11-06 Taylor William Scott System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US6738696B2 (en) * 2000-12-13 2004-05-18 Denso Corporation Controller for vehicle with information providing function and recording medium
US20040114526A1 (en) * 2001-01-16 2004-06-17 Barker Andrew James Alarm signal suppression in telecommunications networks
US20040221204A1 (en) * 2003-04-29 2004-11-04 Johnson Ted C. Error message suppression system and method
US6877105B1 (en) * 1999-09-29 2005-04-05 Hitachi, Ltd. Method for sending notice of failure detection
US20060271354A1 (en) * 2005-05-31 2006-11-30 Microsoft Corporation Audio codec post-filter
US20070174663A1 (en) * 2006-01-04 2007-07-26 International Business Machines Corporation Analysis of mutually exclusive conflicts among redundant devices
US20080307269A1 (en) * 2007-06-05 2008-12-11 Compuware Corporation Resolution of Computer Operations Problems Using Fault Trend Analysis
US20090210747A1 (en) * 2006-06-30 2009-08-20 Boone Lewis A Fault isolation system and method
US20100205486A1 (en) * 2009-02-06 2010-08-12 Inventec Corporation System and method of error reporting
US20100313074A1 (en) * 2007-10-04 2010-12-09 Robert Bosch Gmbh Method for describing a behavior of a technical apparatus
US20110314339A1 (en) * 2010-06-22 2011-12-22 International Business Machines Corporation Systems for agile error determination and reporting and methods thereof
US20130073715A1 (en) * 2011-09-16 2013-03-21 Tripwire, Inc. Methods and apparatus for remediating policy test failures, including correlating changes to remediation processes
US20130083669A1 (en) * 2011-09-30 2013-04-04 Nokia Siemens Networks Corporation Oy Fault management traffic reduction in heterogeneous networks
US20130318391A1 (en) * 2012-05-24 2013-11-28 Stec, Inc. Methods for managing failure of a solid state device in a caching storage
CN104378246A (en) * 2014-12-09 2015-02-25 福建星网锐捷网络有限公司 Network equipment fault positioning system, method and device
US9304850B1 (en) 2011-09-16 2016-04-05 Tripwire, Inc. Methods and apparatus for remediation workflow
US9509554B1 (en) 2011-09-16 2016-11-29 Tripwire, Inc. Methods and apparatus for remediation execution
CN106293984A (en) * 2016-08-11 2017-01-04 浪潮(北京)电子信息产业有限公司 A kind of computer glitch automatically processes mode and device
US9940235B2 (en) 2016-06-29 2018-04-10 Oracle International Corporation Method and system for valid memory module configuration and verification
US20200092160A1 (en) * 2018-09-18 2020-03-19 Electronics And Telecommunications Research Institute Fault event management method for controller-based restoration
CN112383027A (en) * 2020-11-03 2021-02-19 中国航空工业集团公司西安航空计算技术研究所 Motor operation safety protection control method based on state machine strategy

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4554661A (en) * 1983-10-31 1985-11-19 Burroughs Corporation Generalized fault reporting system
US4633467A (en) * 1984-07-26 1986-12-30 At&T Bell Laboratories Computer system fault recovery based on historical analysis
US4727548A (en) * 1986-09-08 1988-02-23 Harris Corporation On-line, limited mode, built-in fault detection/isolation system for state machines and combinational logic
US4745593A (en) * 1986-11-17 1988-05-17 American Telephone And Telegraph Company, At&T Bell Laboratories Arrangement for testing packet switching networks
US4979174A (en) * 1988-12-29 1990-12-18 At&T Bell Laboratories Error correction and detection apparatus and method
US5157667A (en) * 1990-04-30 1992-10-20 International Business Machines Corporation Methods and apparatus for performing fault isolation and failure analysis in link-connected systems
US5226150A (en) * 1990-10-01 1993-07-06 Digital Equipment Corporation Apparatus for suppressing an error report from an address for which an error has already been reported
US5383201A (en) * 1991-12-23 1995-01-17 Amdahl Corporation Method and apparatus for locating source of error in high-speed synchronous systems
US5394407A (en) * 1993-07-01 1995-02-28 Motorola, Inc. Method of transferring error correcting code and circuit therefor
US5448725A (en) * 1991-07-25 1995-09-05 International Business Machines Corporation Apparatus and method for error detection and fault isolation
US5740357A (en) * 1990-04-26 1998-04-14 Digital Equipment Corporation Generic fault management of a computer system
US5790779A (en) * 1995-03-10 1998-08-04 Microsoft Corporation Method and system for consolidating related error reports in a computer system
US5864686A (en) * 1996-11-19 1999-01-26 International Business Machines Corporation Method for dynamic address coding for memory mapped commands directed to a system bus and/or secondary bused
US5881069A (en) * 1997-12-12 1999-03-09 Motorola, Inc. Method and apparatus for error correction processing in a radio communication device
US5968189A (en) * 1997-04-08 1999-10-19 International Business Machines Corporation System of reporting errors by a hardware element of a distributed computer system
US5978938A (en) * 1996-11-19 1999-11-02 International Business Machines Corporation Fault isolation feature for an I/O or system bus
US6070072A (en) * 1997-07-16 2000-05-30 Motorola, Inc. Method and apparatus for intelligently generating an error report in a radio communication system

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4554661A (en) * 1983-10-31 1985-11-19 Burroughs Corporation Generalized fault reporting system
US4633467A (en) * 1984-07-26 1986-12-30 At&T Bell Laboratories Computer system fault recovery based on historical analysis
US4727548A (en) * 1986-09-08 1988-02-23 Harris Corporation On-line, limited mode, built-in fault detection/isolation system for state machines and combinational logic
US4745593A (en) * 1986-11-17 1988-05-17 American Telephone And Telegraph Company, At&T Bell Laboratories Arrangement for testing packet switching networks
US4979174A (en) * 1988-12-29 1990-12-18 At&T Bell Laboratories Error correction and detection apparatus and method
US5740357A (en) * 1990-04-26 1998-04-14 Digital Equipment Corporation Generic fault management of a computer system
US5157667A (en) * 1990-04-30 1992-10-20 International Business Machines Corporation Methods and apparatus for performing fault isolation and failure analysis in link-connected systems
US5226150A (en) * 1990-10-01 1993-07-06 Digital Equipment Corporation Apparatus for suppressing an error report from an address for which an error has already been reported
US5448725A (en) * 1991-07-25 1995-09-05 International Business Machines Corporation Apparatus and method for error detection and fault isolation
US5383201A (en) * 1991-12-23 1995-01-17 Amdahl Corporation Method and apparatus for locating source of error in high-speed synchronous systems
US5394407A (en) * 1993-07-01 1995-02-28 Motorola, Inc. Method of transferring error correcting code and circuit therefor
US5790779A (en) * 1995-03-10 1998-08-04 Microsoft Corporation Method and system for consolidating related error reports in a computer system
US5864686A (en) * 1996-11-19 1999-01-26 International Business Machines Corporation Method for dynamic address coding for memory mapped commands directed to a system bus and/or secondary bused
US5978938A (en) * 1996-11-19 1999-11-02 International Business Machines Corporation Fault isolation feature for an I/O or system bus
US5968189A (en) * 1997-04-08 1999-10-19 International Business Machines Corporation System of reporting errors by a hardware element of a distributed computer system
US6070072A (en) * 1997-07-16 2000-05-30 Motorola, Inc. Method and apparatus for intelligently generating an error report in a radio communication system
US5881069A (en) * 1997-12-12 1999-03-09 Motorola, Inc. Method and apparatus for error correction processing in a radio communication device

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6560738B1 (en) * 1999-07-06 2003-05-06 Nec Electronics Corporation Fault propagation path estimating method, fault propagation path estimating apparatus and recording media
US6877105B1 (en) * 1999-09-29 2005-04-05 Hitachi, Ltd. Method for sending notice of failure detection
US20030163764A1 (en) * 1999-10-06 2003-08-28 Sun Microsystems, Inc. Mechanism to improve fault isolation and diagnosis in computers
US6823476B2 (en) * 1999-10-06 2004-11-23 Sun Microsystems, Inc. Mechanism to improve fault isolation and diagnosis in computers
US6738696B2 (en) * 2000-12-13 2004-05-18 Denso Corporation Controller for vehicle with information providing function and recording medium
US7447157B2 (en) * 2001-01-16 2008-11-04 Ericsson Ab Alarm signal suppression in telecommunications networks
US20040114526A1 (en) * 2001-01-16 2004-06-17 Barker Andrew James Alarm signal suppression in telecommunications networks
US7096387B2 (en) * 2001-03-23 2006-08-22 Sun Microsystems, Inc. Method and apparatus for locating a faulty device in a computer system
US20020138791A1 (en) * 2001-03-23 2002-09-26 Paul Durrant Computer system
US7137039B2 (en) * 2001-03-23 2006-11-14 Sun Microsystems, Inc. Device drivers configured to monitor device status
US20020138782A1 (en) * 2001-03-23 2002-09-26 Paul Durrant Computer system
US6983401B2 (en) * 2002-05-01 2006-01-03 Bellsouth Intellectual Property Corporation System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US20060085693A1 (en) * 2002-05-01 2006-04-20 Taylor William Scott System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US20030208705A1 (en) * 2002-05-01 2003-11-06 Taylor William Scott System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US7441157B2 (en) * 2002-05-01 2008-10-21 At&T Intellectual Property I, L.P. System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US8145954B2 (en) 2002-05-01 2012-03-27 At&T Intellectual Property I, Lp System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US20090049027A1 (en) * 2002-05-01 2009-02-19 William Scott Taylor System and method for generating a chronic circuit report for use in proactive maintenance of a communication network
US7096391B2 (en) * 2003-04-29 2006-08-22 Hewlett-Packard Development Company, L.P. Error message suppression system and method
US20040221204A1 (en) * 2003-04-29 2004-11-04 Johnson Ted C. Error message suppression system and method
US20060271354A1 (en) * 2005-05-31 2006-11-30 Microsoft Corporation Audio codec post-filter
US7607043B2 (en) 2006-01-04 2009-10-20 International Business Machines Corporation Analysis of mutually exclusive conflicts among redundant devices
US20070174663A1 (en) * 2006-01-04 2007-07-26 International Business Machines Corporation Analysis of mutually exclusive conflicts among redundant devices
US7895471B2 (en) * 2006-06-30 2011-02-22 Unisys Corporation Fault isolation system and method
US7613949B1 (en) * 2006-06-30 2009-11-03 Boone Lewis A Fault isolation system and method
US20090210747A1 (en) * 2006-06-30 2009-08-20 Boone Lewis A Fault isolation system and method
US7676695B2 (en) * 2007-06-05 2010-03-09 Compuware Corporation Resolution of computer operations problems using fault trend analysis
US7934126B1 (en) 2007-06-05 2011-04-26 Compuware Corporation Resolution of computer operations problems using fault trend analysis
US20080307269A1 (en) * 2007-06-05 2008-12-11 Compuware Corporation Resolution of Computer Operations Problems Using Fault Trend Analysis
US8151141B1 (en) * 2007-06-05 2012-04-03 Compuware Corporation Resolution of computer operations problems using fault trend analysis
US20100313074A1 (en) * 2007-10-04 2010-12-09 Robert Bosch Gmbh Method for describing a behavior of a technical apparatus
US20100205486A1 (en) * 2009-02-06 2010-08-12 Inventec Corporation System and method of error reporting
US8539285B2 (en) * 2010-06-22 2013-09-17 International Business Machines Corporation Systems for agile error determination and reporting and methods thereof
US20110314339A1 (en) * 2010-06-22 2011-12-22 International Business Machines Corporation Systems for agile error determination and reporting and methods thereof
US20130073715A1 (en) * 2011-09-16 2013-03-21 Tripwire, Inc. Methods and apparatus for remediating policy test failures, including correlating changes to remediation processes
US10291471B1 (en) 2011-09-16 2019-05-14 Tripwire, Inc. Methods and apparatus for remediation execution
US10235236B1 (en) 2011-09-16 2019-03-19 Tripwire, Inc. Methods and apparatus for remediation workflow
US9026646B2 (en) * 2011-09-16 2015-05-05 Tripwire, Inc. Methods and apparatus for remediating policy test failures, including correlating changes to remediation processes
US9304850B1 (en) 2011-09-16 2016-04-05 Tripwire, Inc. Methods and apparatus for remediation workflow
US9509554B1 (en) 2011-09-16 2016-11-29 Tripwire, Inc. Methods and apparatus for remediation execution
US20130083669A1 (en) * 2011-09-30 2013-04-04 Nokia Siemens Networks Corporation Oy Fault management traffic reduction in heterogeneous networks
US9538402B2 (en) * 2011-09-30 2017-01-03 Nokia Solutions And Networks Oy Fault management traffic reduction in heterogeneous networks
US9218257B2 (en) * 2012-05-24 2015-12-22 Stec, Inc. Methods for managing failure of a solid state device in a caching storage
US20130318391A1 (en) * 2012-05-24 2013-11-28 Stec, Inc. Methods for managing failure of a solid state device in a caching storage
US10452473B2 (en) 2012-05-24 2019-10-22 Western Digital Technologies, Inc. Methods for managing failure of a solid state device in a caching storage
CN104378246B (en) * 2014-12-09 2018-04-06 福建星网锐捷网络有限公司 A kind of network equipment failure alignment system, method and device
CN104378246A (en) * 2014-12-09 2015-02-25 福建星网锐捷网络有限公司 Network equipment fault positioning system, method and device
US9940235B2 (en) 2016-06-29 2018-04-10 Oracle International Corporation Method and system for valid memory module configuration and verification
CN106293984A (en) * 2016-08-11 2017-01-04 浪潮(北京)电子信息产业有限公司 A kind of computer glitch automatically processes mode and device
US20200092160A1 (en) * 2018-09-18 2020-03-19 Electronics And Telecommunications Research Institute Fault event management method for controller-based restoration
CN112383027A (en) * 2020-11-03 2021-02-19 中国航空工业集团公司西安航空计算技术研究所 Motor operation safety protection control method based on state machine strategy

Similar Documents

Publication Publication Date Title
US6385665B1 (en) System and method for managing faults in a data transmission system
US7289436B2 (en) System and method for providing management of fabric links for a network element
US5023873A (en) Method and apparatus for communication link management
US7233568B2 (en) System and method for selection of redundant control path links in a multi-shelf network element
US6678839B2 (en) Troubleshooting method of looped interface and system provided with troubleshooting function
US7397385B1 (en) Predicting cable failure through remote failure detection of error signatures
US6778491B1 (en) Method and system for providing redundancy for signaling link modules in a telecommunication system
US4228535A (en) Dual TDM switching apparatus
EP0455442A2 (en) Fault detection in link-connected systems
DK167333B1 (en) PROCEDURE FOR OPERATING A ERROR-PROTECTED HIGH AVAILABLE MULTIPROCESSOR CENTER CONTROL UNIT IN A DISTRIBUTION SYSTEM
EP0416943A2 (en) Method for controlling failover between redundant network interface modules
EP0570882A2 (en) A distributed control methodology and mechanism for implementing automatic protection switching
JPH04242463A (en) State-change informing mechanism and method in data processing input/output system
US5923840A (en) Method of reporting errors by a hardware element of a distributed computer system
US5513312A (en) Method for system-prompted fault clearance of equipment in communcation systems
RU2142159C1 (en) Methods for checking processor condition in electronic commutation systems
US6058120A (en) System and apparatus for controlling telecommunications components
FI105133B (en) Error management and recovery in a data communication system
JPH0795223A (en) Execution of route identification inside communication system
JP3892998B2 (en) Distributed processing device
GB2282935A (en) Data network switch which identifies and rectifies faults
US8111625B2 (en) Method for detecting a message interface fault in a communication device
CN111181764A (en) Main/standby switching method and system based on OVS
Miller et al. Common channel interoffice signaling: Signaling network
EP1331759B1 (en) System and method for providing management of communication links connecting components in a network element

Legal Events

Date Code Title Description
AS Assignment

Owner name: ALCATEL USA SOURCING, L.P., TEXAS

Free format text: CHANGE OF NAME;ASSIGNOR:DSC TELECOM L.P.;REEL/FRAME:009665/0101

Effective date: 19980908

Owner name: DSC TELECOM L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CANADY, KEVIN E.;BUTTERFIELD, BYRON T.;DOSS, DWIGHT W.;AND OTHERS;REEL/FRAME:009665/0135;SIGNING DATES FROM 19980325 TO 19981202

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20060507