1
METHOD AND APPARATUS FOR TESTING
THE RESPONSIVENESS OF A NETWORK
DEVICE
FIELD OF THE INVENTION 5
This invention relates to fault management of computer networks and, more particularly, to a method and apparatus wherein a first network device employs a proxy or recruit network device to test the responsiveness of another network device.
BACKGROUND OF THE INVENTION
Networks provide increased computing power, sharing of resources and communications between users. A network is may include a number of computer devices within a room, building, or site that are interconnected by a high speed local data link to form a local area network (LAN), such as a token ring network, ethernet network, or the like. LANs in the same or different locations may be interconnected by dif- 20 ferent media and protocols such as packet switching, microwave links and satellite links to form a wide area network. There may be several hundred or more interconnected devices in a network.
As a network becomes larger and more complex, issues 25 arise as to the amount of traffic on the network, utilization of resources, security and the isolation of network faults. In U.S. Pat. No. 5,436,909, which issued to Roger Dev et al. on Jul. 25, 1995, and which is herein incorporated by reference in its entirety, a system for isolating network faults is 30 disclosed. In the '909 patent, a network management system models network devices and relations between network devices. A contact status of each device is contained in a corresponding model. Each model receives status updates from and/or regularly polls the corresponding network 35 device.
The '909 patent uses a technique known as "status suppression" in order to isolate network faults. When a first network device has lost contact with its corresponding model, the models which correspond to network devices 40 adjacent to the first network device are polled to see if they have also lost contact with their corresponding network devices. If the adjacent models cannot contact their corresponding network devices, then presumably the first network device is not the cause of the fault and a fault status in 45 the first model is suppressed or overridden. If it is determined that all adjacent network devices are not communicating, then the network fault can be more easily determined as something common to all of these devices.
It may be advantageous to focus the failure analysis on the first network device without polling all of the adjacent network devices. In some large networks, such polling could involve hundreds, possibly thousands, of network devices thereby increasing the amount of traffic on the network and 5J degrading network performance. In addition, there may be network devices that, although they have lost contact with the network management system, are still in contact with some other network device.
It is an object of the present invention to provide a method 60 to facilitate fault management in a network which can be used alone or together with other fault management services to deduce the location and/or cause of a network failure.
SUMMARY OF THE INVENTION
65
The present invention relates to a method and apparatus for determining the responsiveness of a network device
2
through the use of proxy or recruit network devices. More specifically, when a first network device has lost contact with a second network device, a proxy device is recruited to attempt to contact the second network device. Typically, this recruit utilizes a different physical path to the second network device and/or a different communication protocol for contacting the second device. The recruit then reports on whether the contact was successful. If it was successful, then the first network device can infer that the cause of its contact loss may lie with its path to the second network device or with the protocol the first device uses to contact the second device.
In one embodiment, a list of potential recruits is maintained at one or more locations in the network. Then, when a first network device loses contact with a second network device, one or more recruits from the list can be selected to attempt to contact the second network device. Where a plurality of recruits are selected, the recruits may attempt to contact the second device either in series or in parallel. The recruits then report back the results of their attempts, from which a better understanding of the location and/or cause of the network failure may be determined. This method may be used alone or in combination with other fault management services. It may advantageously be used in conjunction with a network management platform, such as the SPECTRUM® management system, available from Cabletron
Systems, Inc., Rochester, N.H., which models the various devices (i.e., physical devices and applications) on the network, and maintains a contact status for each such device.
These and other advantages of the present invention will be understood from the following drawings and detailed description of an exemplary embodiment.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a network management system overseeing a network, which management system may incorporate the present invention;
FIG. 2 is a flow chart illustrating an example of the operation of a network management system which utilizes the fault services of the present invention in accordance with one embodiment;
FIG. 3 is a flow chart of the fault management service according to another embodiment;
FIG. 4 is a schematic representation of a network illustrating the use of a recruit or proxy network device to contact a second network device which has lost contact with a first network is device (the network management system);
FIG. 5 is a schematic representation of a network for illustrating an exemplary use of the present invention; and
FIG. 6 shows a general purpose computer as one example of implementing the present invention.
DETAILED DESCRIPTION
A block diagram of an overall system according to the present invention is shown in FIG. 1. A network 106 includes a plurality of interconnected network devices (not shown). A network management system 100 communicates with the network 106 to maintain the network in operating condition and to monitor the operations of the network. The network management system 100 is coupled to a database manager 104 which manages the storage and retrieval of disk-based data relative to the network 106 and the network management system 100. A user interface 102 is coupled to the network management system 100 which allows a user, usually a network manager, to interface with the network