US5949759A - Fault correlation system and method in packet switching networks - Google Patents

Fault correlation system and method in packet switching networks Download PDF

Info

Publication number
US5949759A
US5949759A US08/752,404 US75240496A US5949759A US 5949759 A US5949759 A US 5949759A US 75240496 A US75240496 A US 75240496A US 5949759 A US5949759 A US 5949759A
Authority
US
United States
Prior art keywords
failure
network
node
correlation
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/752,404
Inventor
Andre Cretegny
Catherine Gallian
Laurent Nicolas
Yves Ouvry
Benoit Sirot
Gilles Wozelka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cisco Technology Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Assigned to IBM CORPORATION reassignment IBM CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GALLIAN, C., CRETEGNY, A., NICOLAS, L., QUVRY, Y., SIROT, B., WOZELKA, G.
Application granted granted Critical
Publication of US5949759A publication Critical patent/US5949759A/en
Assigned to CISCO TECHNOLOGY, INC., A CORPORATION OF CALIFORNIA reassignment CISCO TECHNOLOGY, INC., A CORPORATION OF CALIFORNIA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CISCO SYSTEMS, INC., A CORPORATION OF CALIFORNIA
Assigned to CISCO SYSTEMS, INC. reassignment CISCO SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection

Definitions

  • the present invention relates to Fault Management in large packet switching networks and more particularly to an apparatus and a method for correlating failures and itdentifying the resources affected by said failure.
  • the Fault Management discipline in network management systems comprises sets of functions enabling the detection, isolation and correction of abnormal operations in the communication network and its environment.
  • Abnormal operations may relate to events such as physical resource failures (e.g. link outage), communication failures or security violations occurring in the interconnected nodes forming the network.
  • Functions associated with Fault Management provide, among other ones, the report of alarms which requires on one hand the detection of failures and the report of alarms by the nodes, and on the other hand the presentation of the information related to said failures to network operators.
  • Network operators are responsible for ensuring that the network provides services that users are expecting. This responsibility depends on real time advertisement of network abnormal operations so that appropriate recovery actions can be taken.
  • network operators rely on Fault Management, first to be informed on the failure occurrence, and secondly to have correlated fault information on that failure. Fault correlation requires that those resources which are functionally affected by the failure are registered together with the failure and that this correlated information is accessible by the network operators
  • Error correlation functions in this environment can be based on:
  • a posteriori i.e. after failure occurrence
  • retrieval of information on a given resource e.g. verification of the status of a resource
  • the classical network architecture associates physical media support with the physical protocol layer, failure detection and some corrections at the link protocol layer and above in higher layers, and network management in applications, whereas emerging networks tend to distribute some network management functions into protocol layers, such as: physical media backup decision and operation by the physical protocol layer, connectivity backup decision and operation by link protocol layers.
  • the solution for the amount of asynchronous notifications related to the failed resources is provided by:
  • One aspect of the method according to the present invention involves the following steps:
  • the network operator configures each node in the network to enable logging of required information in a memory of the Network Element
  • the physical resource triggers an alarm when it is affected by a failure
  • each affected logical resource logs failure information on reception of the physical alarm
  • the operator requests log retrieval for analysis.
  • the efficiency of this solution can be further increased by building a "correlation key" which is representative of the location of the failure, and by using the flooding mechanism which is used in the distributed networks to transport information which is useful to associate the root cause of the failure with the effects of the failure.
  • the correlation key is built comprising information on the affected link as it is seen from the respective neighbouring node;
  • the physical alarm is triggered; it contains the correlation key with location information, which forms part of the alarm data;
  • the failure information and the correlation key are transmitted to the access nodes of the network
  • the affected logical resources are associated to the physical failure
  • the alarm related to the logical resource is built with the correlation key.
  • FIG. 1 represents the operation of a prior art network
  • FIG. 2 shows the steps of the method according to the present invention
  • FIG. 3 is a representation of the first step of the method according to the present invention.
  • FIG. 4 is a representation of the second step of the method according to the present invention.
  • FIG. 5 is a representation of the third step of the method according to the present invention.
  • FIG. 6 is a representation of the fourth step of the method according to the present invention.
  • FIG. 7 is a representation of the fifth step of the method according to the present invention.
  • FIG. 8 is a representation of the seventh step of the method according to the present invention.
  • FIG. 1 The state-of-the-art is illustrated in FIG. 1.
  • a network comprising 7 nodes (i.e. 10, 11, 12, 13, 14, 15 and 16) with 3 end-to-end connections:
  • the network operator configures each node in the network to enable required information logging (step 301).
  • the correlation keys are built (step 302).
  • the physical alarms are triggered; they contain the correlation keys as part of the alarm data (step 303).
  • the failure information and the correlation keys are transported to the access nodes of the network (step 304).
  • the affected logical resources are associated to the physical failure and the alarm related to the logical resource is built with the correlation keys (step 305).
  • the logical alarm and the correlation key are kept in the log of the node, according to the log configuration criteria (step 306).
  • the operator may request log retrieval for analysis (step 307).
  • the logging characteristics are configured by the network operator as explained in combination with FIG. 3.
  • the network operator will configure an Event Discriminator (i.e. a filter, the filtering criteria of which can be configured) and a Log for each node in the network with the purpose of keeping information needed for error correlation.
  • Event Discriminator i.e. a filter, the filtering criteria of which can be configured
  • object class i.e. class of resources having the same characteristics
  • event type i.e. type of asynchronous notification
  • the network operator will apply such configuration operations referred to as operations 401, 402, 403, 404,and 405 in FIG. 3, onto each node (i.e. 10, 11, 12, 13, 14, 15, 16) in the network
  • the correlation keys must be built. This process illustrated in FIG. 4 involves each Network Element (i.e. 12 and 13) which detects the physical failure (i.e. failure 17).
  • the node 12 identifies the link to the node 13 as link t1, and the node 13 identifies the link to the Network Element 12 as link t2.
  • the correlation keys identify the broken link as it is seen by the neighbouring node respectively.
  • the node 12 builds the correlation key by identifying the link t1 to the node 13 as it is seen from the node 12, and the node 13 builds the correlation key by specifying the link t2 as the node 13 sees it to connect itself to the node 12.
  • the correlation key that is built in the node 12 contains the t1 information
  • the correlation key that is built in the node 13 contains the t2 information.
  • Each node (i.e. 12 and 13) discovering the physical resource failure triggers the physical alarm containing a correlation vector comprising firstly the correlation key of the own node, and secondly the correlation key of the node on the other side of the broken link.
  • the physical alarm 601 of the node 12 contains therefore a correlation vector (t1, t2) with the correlation key of the node 12 (t1) and the correlation key of the node 13 (t2), whereas the physical alarm 602 of the node 13 contains a correlation vector (t2, t1) comprising the correlation key of node 13 (t2) and the correlation key of the node 12 (t1).
  • the failure information and the correlation keys 703, 707 are transported to the access nodes, as illustrated in FIG. 6. From the node 12, the correlation key 703 (made of t1) will be transported to the nodes 10 and 11. From the node 13, the correlation key 707 (made of t2) will be transported to the nodes 14, 15 and 16.
  • FIG. 7 represents a method to perform this combination:
  • the component 802 is in charge of maintaining the knowledge on the network topology, receives the correlation key 801 and sends it to the component 803,
  • the component 803 in charge of maintaining the knowledge on logical resources performs the association of the physical failure with the affected logical resources it handles,
  • an alarm which is related with the network topology, the logical resources and which also includes the correlation key 801 that has been delivered to the access nodes, is then built in step 804.
  • the logical alarm and the correlation key are logged in the access node, according to the log configuration characteristics. If the Event Discriminator has been configured to process notifications related to connections and to Communication alarms, then the notification will be sent to the log storage of the Network Management.
  • the Network Management log will contain: identification of logical resources affected by the failure, and correlation vector associated with the broken physical resource.
  • the operator will see 2 physical alarms (i.e. 914 and 915) each one containing an correlation vector with the information on the broken resource as identified by the local nodes, e.g. (t1, t2) for node 12, and (t2, t1) for node 13.
  • the log information will be retrieved for correlation process as illustrated by 901, 902, 903, 904, 905 from nodes 10, 11, 14, 15 and 16.
  • the correlation process can then correlate affected logical resources based on the correlation keys (t1, t2) as found in the log storage.

Abstract

In case of a failure in a high speed packet switching network, the failure information provided by the multiplicity of resources is registered in the access nodes of the network. The failure information can be retrieved by the network management on request for fault correlation. A plurality of alarms flooding the network management when a failure occurs is thus avoided.

Description

TECHNICAL FIELD
The present invention relates to Fault Management in large packet switching networks and more particularly to an apparatus and a method for correlating failures and itdentifying the resources affected by said failure.
BACKGROUND ART
The Fault Management discipline in network management systems comprises sets of functions enabling the detection, isolation and correction of abnormal operations in the communication network and its environment. Abnormal operations may relate to events such as physical resource failures (e.g. link outage), communication failures or security violations occurring in the interconnected nodes forming the network.
Functions associated with Fault Management provide, among other ones, the report of alarms which requires on one hand the detection of failures and the report of alarms by the nodes, and on the other hand the presentation of the information related to said failures to network operators. Network operators are responsible for ensuring that the network provides services that users are expecting. This responsibility depends on real time advertisement of network abnormal operations so that appropriate recovery actions can be taken. In order to fulfill this duty, network operators rely on Fault Management, first to be informed on the failure occurrence, and secondly to have correlated fault information on that failure. Fault correlation requires that those resources which are functionally affected by the failure are registered together with the failure and that this correlated information is accessible by the network operators
In current networks both, different characteristics and different solutions can be found. Essentially two characteristics of the prior networks have evolved and obstructed the approach of Error correlation:
the current bandwidth available on a given network interface has limited de facto the amount of logical resources served by the physical media, and
the logical resources were tightly linked to physical resources which made the network topology very static.
Error correlation functions in this environment can be based on:
information on all the resources affected by a failure in the network with asynchronous notifications raised from the network to the network management system,
a posteriori (i.e. after failure occurrence) retrieval of information on a given resource (e.g. verification of the status of a resource),
a posteriori verification of the valid connections.
Networking evolves to higher speeds, thus offering appropriate infrastructure for emerging multimedia applications. High speed networking provides physical media transport over 2,000 kilobits per second. When the network provides such a very high bandwidth, then also the number of granular or elementary accesses is very high. Thus, such speeds lead to an increasing number of logical resources (e.g. protocol interfaces, connections) that the physical media can serve. The additional complexity introduced by the large number of supported resources in these new generation networks, requires developing the classical network architecture to a distributed network structure. The classical network architecture associates physical media support with the physical protocol layer, failure detection and some corrections at the link protocol layer and above in higher layers, and network management in applications, whereas emerging networks tend to distribute some network management functions into protocol layers, such as: physical media backup decision and operation by the physical protocol layer, connectivity backup decision and operation by link protocol layers.
Two consequences on Fault Management derive from current network evolution:
one failure will have disruptive effects on a larger amount of applications and users, and
one failure will trigger many alarms in the network, related with the affected logical resources
The following new requirements derive from this consequence:
the need to correlate a physical resource failure with the logical resources which were previously served by this failed entity, and
the need to restrict the overall fault management flow to avoid excessive network bandwidth utilization for network management purpose.
The application of the current solutions to high speed and dynamic networks would lead to a network flooded by network management traffic (mainly due to asynchronous events), and retrieval of wrong information as the network may potentially have decided to redistribute logical resources to new, healthy physical resources. Therefore, when a physical resource failure occurs, the associated alarm would be triggered. Then, each affected logical resource would trigger an alarm and the network operator would be flooded by hundreds of alarms due to one failure without any analysis tool to use.
This demonstrates, that usual correlation algorithms are no longer appropriate to current high speed networks, the main inhibitors being the amount of logical resources and the topology dynamias.
The following new requirements derive from this consequence: the need to correlate a physical resource failure with the logical resources which were previously served by this failed entity, and the need to restrict the overall fault management flow to avoid excessive network bandwidth utilization for network management purposes.
SUMMARY OF THE INVENTION
The solution for the amount of asynchronous notifications related to the failed resources is provided by:
informing the operator of the failure of the physical resources, and
keeping the information inside the node for on-request retrieval.
One aspect of the method according to the present invention involves the following steps:
the network operator configures each node in the network to enable logging of required information in a memory of the Network Element,
the physical resource triggers an alarm when it is affected by a failure,
each affected logical resource logs failure information on reception of the physical alarm, and
the operator requests log retrieval for analysis.
The efficiency of this solution can be further increased by building a "correlation key" which is representative of the location of the failure, and by using the flooding mechanism which is used in the distributed networks to transport information which is useful to associate the root cause of the failure with the effects of the failure.
Therefore, another aspect of the method according to the present invention involves the steps of:
on the physical resource failure occurrence, the correlation key is built comprising information on the affected link as it is seen from the respective neighbouring node;
the physical alarm is triggered; it contains the correlation key with location information, which forms part of the alarm data;
the failure information and the correlation key are transmitted to the access nodes of the network;
in the access nodes, the affected logical resources are associated to the physical failure;
the alarm related to the logical resource is built with the correlation key.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be better understood in the following description in combination with the Figures, wherein:
FIG. 1 represents the operation of a prior art network;
FIG. 2 shows the steps of the method according to the present invention;
FIG. 3 is a representation of the first step of the method according to the present invention;
FIG. 4 is a representation of the second step of the method according to the present invention;
FIG. 5 is a representation of the third step of the method according to the present invention;
FIG. 6 is a representation of the fourth step of the method according to the present invention;
FIG. 7 is a representation of the fifth step of the method according to the present invention;
FIG. 8 is a representation of the seventh step of the method according to the present invention;
DESCRIPTION OF THE PREFERRED EMBODIMENT
The state-of-the-art is illustrated in FIG. 1. As example, a network comprising 7 nodes (i.e. 10, 11, 12, 13, 14, 15 and 16) with 3 end-to-end connections:
one flowing from 10, via 12, 13 to 14,
one flowing from 11, via 12, 13 to 15,
one flowing from 11, via 12, 13 to 16.
is shown.
When a failure occurs on a link (e.g. failure 17 on link between nodes 12 and 13) affecting the 3 connections served by the link and referenced previously, 7 alarms are triggered: 2 physical alarms, one per failure neighbour node, i.e. alarm 203 and alarm 204 and alarms related with logical resources, i.e. alarms 201, 202, 205, 206 and 207.
The solution according to the present invention for this situation is to provide the method which is shown in FIG. 2.
The network operator configures each node in the network to enable required information logging (step 301). On the physical resource failure occurrence, the correlation keys are built (step 302). The physical alarms are triggered; they contain the correlation keys as part of the alarm data (step 303). The failure information and the correlation keys are transported to the access nodes of the network (step 304). In the access nodes, the affected logical resources are associated to the physical failure and the alarm related to the logical resource is built with the correlation keys (step 305). The logical alarm and the correlation key are kept in the log of the node, according to the log configuration criteria (step 306). On reception of the physical alarm, the operator may request log retrieval for analysis (step 307).
Each step of this method is now explained in detail:
1. The logging characteristics are configured by the network operator as explained in combination with FIG. 3. The network operator will configure an Event Discriminator (i.e. a filter, the filtering criteria of which can be configured) and a Log for each node in the network with the purpose of keeping information needed for error correlation.
The characteristics which can be configured for the Event Discriminator are for example:
object class (i.e. class of resources having the same characteristics), and
event type (i.e. type of asynchronous notification) in order to allow alarms related to connection failures to be logged.
The characteristics which can be configured for the Log are for example:
action when the log is full, either set to `wrap` or to `halt` (i.e. to overwrite the former content of the log memory or to stop writing information into the log memory),
threshold on maximum capacity for a capacity alarm to be triggered before the log completion, and
the stop and start logging times in order to have a log ready to receive the notifications.
The network operator will apply such configuration operations referred to as operations 401, 402, 403, 404,and 405 in FIG. 3, onto each node (i.e. 10, 11, 12, 13, 14, 15, 16) in the network
2. On occurrence of a physical failure, the correlation keys must be built. This process illustrated in FIG. 4 involves each Network Element (i.e. 12 and 13) which detects the physical failure (i.e. failure 17). The node 12 identifies the link to the node 13 as link t1, and the node 13 identifies the link to the Network Element 12 as link t2.
The correlation keys identify the broken link as it is seen by the neighbouring node respectively. E.g. the node 12 builds the correlation key by identifying the link t1 to the node 13 as it is seen from the node 12, and the node 13 builds the correlation key by specifying the link t2 as the node 13 sees it to connect itself to the node 12. Thus the correlation key that is built in the node 12 contains the t1 information and the correlation key that is built in the node 13 contains the t2 information.
3. Physical alarms are then triggered with the correlation key in them, as described in FIG. 5.
Each node (i.e. 12 and 13) discovering the physical resource failure, triggers the physical alarm containing a correlation vector comprising firstly the correlation key of the own node, and secondly the correlation key of the node on the other side of the broken link. The physical alarm 601 of the node 12 contains therefore a correlation vector (t1, t2) with the correlation key of the node 12 (t1) and the correlation key of the node 13 (t2), whereas the physical alarm 602 of the node 13 contains a correlation vector (t2, t1) comprising the correlation key of node 13 (t2) and the correlation key of the node 12 (t1).
4. The failure information and the correlation keys 703, 707 are transported to the access nodes, as illustrated in FIG. 6. From the node 12, the correlation key 703 (made of t1) will be transported to the nodes 10 and 11. From the node 13, the correlation key 707 (made of t2) will be transported to the nodes 14, 15 and 16.
5. In the access nodes, the information of the correlation key is combined with information of the affected logical resources. FIG. 7 represents a method to perform this combination:
the incoming correlation key 801 enters the access node, the component 802 is in charge of maintaining the knowledge on the network topology, receives the correlation key 801 and sends it to the component 803,
the component 803 in charge of maintaining the knowledge on logical resources performs the association of the physical failure with the affected logical resources it handles,
an alarm, which is related with the network topology, the logical resources and which also includes the correlation key 801 that has been delivered to the access nodes, is then built in step 804.
6. The logical alarm and the correlation key are logged in the access node, according to the log configuration characteristics. If the Event Discriminator has been configured to process notifications related to connections and to Communication alarms, then the notification will be sent to the log storage of the Network Management.
Thus, the Network Management log will contain: identification of logical resources affected by the failure, and correlation vector associated with the broken physical resource.
7. On reception of the physical alarm, the operator can ask to retrieve the log information from the access nodes for analysis as illustrated in FIG. 8.
The operator will see 2 physical alarms (i.e. 914 and 915) each one containing an correlation vector with the information on the broken resource as identified by the local nodes, e.g. (t1, t2) for node 12, and (t2, t1) for node 13. Either when the content of the log storage has reached its configured threshold or on explicit request from the operator, the log information will be retrieved for correlation process as illustrated by 901, 902, 903, 904, 905 from nodes 10, 11, 14, 15 and 16. The correlation process can then correlate affected logical resources based on the correlation keys (t1, t2) as found in the log storage.

Claims (3)

We claim:
1. A method of identifying a connection failure in a network having a plurality of network nodes interconnected by a plurality of links over which connections between source and destination nodes are established, the method being implemented in each of the network nodes adjacent the connection failure and comprising the steps of:
upon detection of the failure, storing failure-related information in local memory;
generating a correlation key based on the stored failure-related information to uniquely identify the connection failure; and
transmitting the correlation key to each node in the network still accessible through a connection affected by the failure.
2. A method as set forth in claim 1 including the additional step of sending a failure alarm to a network management system.
3. A method as set forth in claim 2 wherein the correlation key includes a two part link identifier, the first part identifying the link affected by the failure from the perspective of the local node and the second part identifying the link affected by the failure from the perspective of the remote node also adjacent the connection failure, the two parts providing in combination a unique identifier of the connection failure.
US08/752,404 1995-12-20 1996-11-19 Fault correlation system and method in packet switching networks Expired - Lifetime US5949759A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP95480193 1995-12-20
EP95480193 1995-12-20

Publications (1)

Publication Number Publication Date
US5949759A true US5949759A (en) 1999-09-07

Family

ID=8221632

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/752,404 Expired - Lifetime US5949759A (en) 1995-12-20 1996-11-19 Fault correlation system and method in packet switching networks

Country Status (1)

Country Link
US (1) US5949759A (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000025527A2 (en) * 1998-10-28 2000-05-04 Telefonaktiebolaget Lm Ericsson (Publ) Alarm correlation in a large communications network
GB2363488A (en) * 1999-10-28 2001-12-19 Ibm Referencing failure information representative of multiple related failures in a distributed computing environment
US20020069199A1 (en) * 2000-12-01 2002-06-06 Young-Hyun Kang Method for managing alarm information in a network management system
EP1257087A1 (en) * 2001-05-10 2002-11-13 Hewlett-Packard Company Method and system for network monitoring
US20020169870A1 (en) * 2001-05-10 2002-11-14 Frank Vosseler Method, system and computer program product for monitoring objects in an it network
US20030014693A1 (en) * 2001-07-13 2003-01-16 International Business Machines Corporation Failure isolation in a distributed processing system employing relative location information
EP1328085A1 (en) * 2002-01-15 2003-07-16 Evolium S.A.S. Method and apparatus for healing of failures for chained boards with SDH interfaces
EP1330070A1 (en) * 2002-01-15 2003-07-23 Evolium S.A.S. Method and apparatus for healing of failures for chained boards with SDH interfaces
US6707795B1 (en) * 1999-04-26 2004-03-16 Nortel Networks Limited Alarm correlation method and system
US6766482B1 (en) 2001-10-31 2004-07-20 Extreme Networks Ethernet automatic protection switching
US20040193705A1 (en) * 2001-09-17 2004-09-30 To Hing Wing Method and apparatus for determining and resolving missing topology features of a network for improved topology accuracy
US6862698B1 (en) 2002-01-22 2005-03-01 Cisco Technology, Inc. Method of labeling alarms to facilitate correlating alarms in a telecommunications network
US20050099953A1 (en) * 2003-11-07 2005-05-12 Tropic Networks Inc. Method and system for network wide fault isolation in an optical network
WO2005060158A1 (en) * 2003-12-17 2005-06-30 Siemens Aktiengesellschaft Method used to report a malfunction in a communication network
US6966015B2 (en) 2001-03-22 2005-11-15 Micromuse, Ltd. Method and system for reducing false alarms in network fault management systems
US7383191B1 (en) 2000-11-28 2008-06-03 International Business Machines Corporation Method and system for predicting causes of network service outages using time domain correlation
US7434109B1 (en) * 2002-09-26 2008-10-07 Computer Associates Think, Inc. Network fault manager for maintaining alarm conditions
US20100156622A1 (en) * 2008-12-23 2010-06-24 Andras Veres Poll-based alarm handling system and method
US7792036B2 (en) 2007-01-23 2010-09-07 Cisco Technology, Inc. Event processing in rate limited network devices
US20110035628A1 (en) * 2009-08-05 2011-02-10 Martin Daniel J System And Method For Correlating Carrier Ethernet Connectivity Fault Management Events
CN101360013B (en) * 2008-09-25 2011-05-04 烽火通信科技股份有限公司 General fast fault locating method for transmission network based on correlativity analysis
US20110161741A1 (en) * 2009-12-28 2011-06-30 International Business Machines Corporation Topology based correlation of threshold crossing alarms
CN101431448B (en) * 2008-10-22 2011-12-28 华为技术有限公司 Method, equipment and system for positioning fault of IP bearing network
CN102546243A (en) * 2011-12-23 2012-07-04 广东电网公司电力科学研究院 Fault simulation analysis method for SP Guru-based electric power dispatching data network
CN102571422A (en) * 2011-12-23 2012-07-11 广东电网公司电力科学研究院 SP Gura-based new business roll-out simulation previewing method for power scheduling data network
US20120224846A1 (en) * 2011-03-03 2012-09-06 Acacia Communications Inc. Fault Localization and Fiber Security in Optical Transponders
US20120287773A1 (en) * 2009-12-10 2012-11-15 Nokia Siemens Networks Oy Mechanism for alarm management of femto related systems to avoid alarm floods
US20150065122A1 (en) * 2012-03-15 2015-03-05 Nec Corporation Radio communication system, radio station, network operation management apparatus, and network healing method
US9930551B2 (en) 2012-03-15 2018-03-27 Nec Corporation Radio communication system, radio station, network operation management apparatus, and network healing method
US20180088809A1 (en) * 2016-09-23 2018-03-29 EMC IP Holding Company LLC Multipath storage device based on multi-dimensional health diagnosis
CN112887108A (en) * 2019-11-29 2021-06-01 中兴通讯股份有限公司 Fault positioning method, device, equipment and storage medium
TWI746512B (en) * 2016-03-10 2021-11-21 香港商阿里巴巴集團服務有限公司 Physical machine fault classification processing method and device, and virtual machine recovery method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4142069A (en) * 1977-06-20 1979-02-27 The United States Of America As Represented By The Secretary Of The Army Time reference distribution technique
US5276440A (en) * 1989-02-16 1994-01-04 International Business Machines Corporation Network device information exchange
WO1994019887A1 (en) * 1993-02-23 1994-09-01 British Telecommunications Public Limited Company Event correlation in telecommunications networks
US5500853A (en) * 1992-04-02 1996-03-19 Applied Digital Access, Inc. Relative synchronization system for a telephone network
US5768256A (en) * 1995-12-29 1998-06-16 Mci Corporation Communications system and method providing optimal restoration of failed-paths

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4142069A (en) * 1977-06-20 1979-02-27 The United States Of America As Represented By The Secretary Of The Army Time reference distribution technique
US5276440A (en) * 1989-02-16 1994-01-04 International Business Machines Corporation Network device information exchange
US5500853A (en) * 1992-04-02 1996-03-19 Applied Digital Access, Inc. Relative synchronization system for a telephone network
WO1994019887A1 (en) * 1993-02-23 1994-09-01 British Telecommunications Public Limited Company Event correlation in telecommunications networks
US5768256A (en) * 1995-12-29 1998-06-16 Mci Corporation Communications system and method providing optimal restoration of failed-paths

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Data Communications, vol. 19, No. 4, Mar. 21, 1990, pp. 45 48. *
Data Communications, vol. 19, No. 4, Mar. 21, 1990, pp. 45-48.
European Search Report. *
IEEE Trans. on Comm., vol. 42, No. 2/3/4, pp. 523 533, 1994. *
IEEE Trans. on Comm., vol. 42, No. 2/3/4, pp. 523-533, 1994.

Cited By (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000025527A2 (en) * 1998-10-28 2000-05-04 Telefonaktiebolaget Lm Ericsson (Publ) Alarm correlation in a large communications network
WO2000025527A3 (en) * 1998-10-28 2000-08-17 Ericsson Telefon Ab L M Alarm correlation in a large communications network
US6253339B1 (en) 1998-10-28 2001-06-26 Telefonaktiebolaget Lm Ericsson (Publ) Alarm correlation in a large communications network
US6707795B1 (en) * 1999-04-26 2004-03-16 Nortel Networks Limited Alarm correlation method and system
GB2363488B (en) * 1999-10-28 2004-07-14 Ibm Technique for referencing failure information representative of multiple related failures in a distributed computing environment
GB2363488A (en) * 1999-10-28 2001-12-19 Ibm Referencing failure information representative of multiple related failures in a distributed computing environment
US6651183B1 (en) 1999-10-28 2003-11-18 International Business Machines Corporation Technique for referencing failure information representative of multiple related failures in a distributed computing environment
US7383191B1 (en) 2000-11-28 2008-06-03 International Business Machines Corporation Method and system for predicting causes of network service outages using time domain correlation
US20020069199A1 (en) * 2000-12-01 2002-06-06 Young-Hyun Kang Method for managing alarm information in a network management system
US6966015B2 (en) 2001-03-22 2005-11-15 Micromuse, Ltd. Method and system for reducing false alarms in network fault management systems
EP1257087A1 (en) * 2001-05-10 2002-11-13 Hewlett-Packard Company Method and system for network monitoring
US20020169870A1 (en) * 2001-05-10 2002-11-14 Frank Vosseler Method, system and computer program product for monitoring objects in an it network
US6941367B2 (en) 2001-05-10 2005-09-06 Hewlett-Packard Development Company, L.P. System for monitoring relevant events by comparing message relation key
US20030014693A1 (en) * 2001-07-13 2003-01-16 International Business Machines Corporation Failure isolation in a distributed processing system employing relative location information
US6931564B2 (en) 2001-07-13 2005-08-16 International Business Machines Corporation Failure isolation in a distributed processing system employing relative location information
US20040193705A1 (en) * 2001-09-17 2004-09-30 To Hing Wing Method and apparatus for determining and resolving missing topology features of a network for improved topology accuracy
US7076564B2 (en) 2001-09-17 2006-07-11 Micromuse Ltd. Method and apparatus for determining and resolving missing topology features of a network for improved topology accuracy
US6766482B1 (en) 2001-10-31 2004-07-20 Extreme Networks Ethernet automatic protection switching
EP1328085A1 (en) * 2002-01-15 2003-07-16 Evolium S.A.S. Method and apparatus for healing of failures for chained boards with SDH interfaces
US20030133405A1 (en) * 2002-01-15 2003-07-17 Evolium S.A.S. Method and apparatus for healing of failures for chained boards with SDH interfaces
EP1330070A1 (en) * 2002-01-15 2003-07-23 Evolium S.A.S. Method and apparatus for healing of failures for chained boards with SDH interfaces
US8065409B2 (en) 2002-01-22 2011-11-22 Cisco Technology, Inc. Method of labeling alarms to facilitate correlating alarms in a telecommunications network
US20050166099A1 (en) * 2002-01-22 2005-07-28 Jackson Shyu Method of labeling alarms to facilitate correlating alarms in a telecommunications network
US6862698B1 (en) 2002-01-22 2005-03-01 Cisco Technology, Inc. Method of labeling alarms to facilitate correlating alarms in a telecommunications network
US8745435B2 (en) * 2002-09-26 2014-06-03 Ca, Inc. Network fault manager
US7434109B1 (en) * 2002-09-26 2008-10-07 Computer Associates Think, Inc. Network fault manager for maintaining alarm conditions
US20090070640A1 (en) * 2002-09-26 2009-03-12 Stabile Lawrence A Network fault manager for maintaining alarm conditions
US8015456B2 (en) 2002-09-26 2011-09-06 Computer Associates Think, Inc. Network fault manager for maintaining alarm conditions
US8448012B2 (en) 2002-09-26 2013-05-21 Ca, Inc. Network fault manager
US7406260B2 (en) 2003-11-07 2008-07-29 Alcatel-Lucent Canada Inc. Method and system for network wide fault isolation in an optical network
US20050099953A1 (en) * 2003-11-07 2005-05-12 Tropic Networks Inc. Method and system for network wide fault isolation in an optical network
WO2005060158A1 (en) * 2003-12-17 2005-06-30 Siemens Aktiengesellschaft Method used to report a malfunction in a communication network
US7792036B2 (en) 2007-01-23 2010-09-07 Cisco Technology, Inc. Event processing in rate limited network devices
CN101360013B (en) * 2008-09-25 2011-05-04 烽火通信科技股份有限公司 General fast fault locating method for transmission network based on correlativity analysis
CN101431448B (en) * 2008-10-22 2011-12-28 华为技术有限公司 Method, equipment and system for positioning fault of IP bearing network
US20100156622A1 (en) * 2008-12-23 2010-06-24 Andras Veres Poll-based alarm handling system and method
US8284044B2 (en) 2008-12-23 2012-10-09 Telefonaktiebolaget Lm Ericsson (Publ) Poll-based alarm handling system and method
US20110035628A1 (en) * 2009-08-05 2011-02-10 Martin Daniel J System And Method For Correlating Carrier Ethernet Connectivity Fault Management Events
US8572435B2 (en) * 2009-08-05 2013-10-29 International Business Machines Corporation System and method for correlating carrier ethernet connectivity fault management events
US9225587B2 (en) * 2009-12-10 2015-12-29 Nokia Solutions And Networks Oy Mechanism for alarm management of Femto related systems to avoid alarm floods
US20120287773A1 (en) * 2009-12-10 2012-11-15 Nokia Siemens Networks Oy Mechanism for alarm management of femto related systems to avoid alarm floods
US20110161741A1 (en) * 2009-12-28 2011-06-30 International Business Machines Corporation Topology based correlation of threshold crossing alarms
US8423827B2 (en) * 2009-12-28 2013-04-16 International Business Machines Corporation Topology based correlation of threshold crossing alarms
US20120224846A1 (en) * 2011-03-03 2012-09-06 Acacia Communications Inc. Fault Localization and Fiber Security in Optical Transponders
US10425154B2 (en) 2011-03-03 2019-09-24 Acacia Communications, Inc. Fault localization and fiber security in optical transponders
US11171728B2 (en) 2011-03-03 2021-11-09 Acacia Communications, Inc. Fault localization and fiber security in optical transponders
US9680567B2 (en) * 2011-03-03 2017-06-13 Acacia Communications, Inc. Fault localization and fiber security in optical transponders
CN102571422A (en) * 2011-12-23 2012-07-11 广东电网公司电力科学研究院 SP Gura-based new business roll-out simulation previewing method for power scheduling data network
CN102546243A (en) * 2011-12-23 2012-07-04 广东电网公司电力科学研究院 Fault simulation analysis method for SP Guru-based electric power dispatching data network
US20150065122A1 (en) * 2012-03-15 2015-03-05 Nec Corporation Radio communication system, radio station, network operation management apparatus, and network healing method
US9930551B2 (en) 2012-03-15 2018-03-27 Nec Corporation Radio communication system, radio station, network operation management apparatus, and network healing method
US9516523B2 (en) * 2012-03-15 2016-12-06 Nec Corporation Radio communication system, radio station, network operation management apparatus, and network healing method
TWI746512B (en) * 2016-03-10 2021-11-21 香港商阿里巴巴集團服務有限公司 Physical machine fault classification processing method and device, and virtual machine recovery method and system
US20180088809A1 (en) * 2016-09-23 2018-03-29 EMC IP Holding Company LLC Multipath storage device based on multi-dimensional health diagnosis
CN107870832A (en) * 2016-09-23 2018-04-03 伊姆西Ip控股有限责任公司 Multipath storage device based on various dimensions Gernral Check-up method
US10698605B2 (en) * 2016-09-23 2020-06-30 EMC IP Holding Company LLC Multipath storage device based on multi-dimensional health diagnosis
CN107870832B (en) * 2016-09-23 2021-06-18 伊姆西Ip控股有限责任公司 Multi-path storage device based on multi-dimensional health diagnosis method
CN112887108A (en) * 2019-11-29 2021-06-01 中兴通讯股份有限公司 Fault positioning method, device, equipment and storage medium
WO2021104269A1 (en) * 2019-11-29 2021-06-03 中兴通讯股份有限公司 Fault locating method, apparatus and device, and storage medium

Similar Documents

Publication Publication Date Title
US5949759A (en) Fault correlation system and method in packet switching networks
US5101348A (en) Method of reducing the amount of information included in topology database update messages in a data communications network
EP0753952B1 (en) Management of path routing in packet communication networks
CN101330370B (en) Node and communication method
US5862125A (en) Automated restoration of unrestored link and nodal failures
US6324161B1 (en) Multiple network configuration with local and remote network redundancy by dual media redirect
US5920257A (en) System and method for isolating an outage within a communications network
US6941362B2 (en) Root cause analysis in a distributed network management architecture
US20030133417A1 (en) Method and message therefor of monitoring the spare capacity of a dra network
US5864608A (en) System and method for formatting performance data in a telecommunications system
US20020141334A1 (en) Dynamic protection bandwidth allocation in BLSR networks
EP2213048A2 (en) Failure recovery method in non revertive mode of ethernet ring network
Jia et al. Rapid detection and localization of gray failures in data centers via in-band network telemetry
US7337209B1 (en) Large-scale network management using distributed autonomous agents
US7342893B2 (en) Path discovery in a distributed network management architecture
US8675667B1 (en) Systems and methods for forming and operating robust communication networks for an enterprise
JP4673532B2 (en) Comprehensive alignment process in a multi-manager environment
US20040213215A1 (en) IP telephony service system and accounting method
CN102340511A (en) Safety control method and device
GB2362230A (en) Delegated fault detection in a network by mutual node status checking
KR0173380B1 (en) Performance Management Method in Distributed Access Node System
Cisco CiscoMgmt Variables
Cisco CiscoMgmt Variables
Cisco CiscoMgmt Variables
Cisco CiscoMgmt Variables

Legal Events

Date Code Title Description
AS Assignment

Owner name: IBM CORPORATION, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CRETEGNY, A.;GALLIAN, C.;NICOLAS, L.;AND OTHERS;REEL/FRAME:008330/0839;SIGNING DATES FROM 19961014 TO 19961112

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: CISCO TECHNOLOGY, INC., A CORPORATION OF CALIFORNI

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CISCO SYSTEMS, INC., A CORPORATION OF CALIFORNIA;REEL/FRAME:010756/0122

Effective date: 20000405

AS Assignment

Owner name: CISCO SYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:010832/0878

Effective date: 20000126

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12