US20070005761A1 - Predictive monitoring and problem identification in an information technology (it) infrastructure - Google Patents
Predictive monitoring and problem identification in an information technology (it) infrastructure Download PDFInfo
- Publication number
- US20070005761A1 US20070005761A1 US11/530,477 US53047706A US2007005761A1 US 20070005761 A1 US20070005761 A1 US 20070005761A1 US 53047706 A US53047706 A US 53047706A US 2007005761 A1 US2007005761 A1 US 2007005761A1
- Authority
- US
- United States
- Prior art keywords
- historical
- infrastructure
- component
- gross
- indicator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
- H04L43/0888—Throughput
Definitions
- the present invention applies to the field of monitoring and maintaining an enterprise's information technology infrastructure and, in particular, to predicting performance irregularities in an enterprise's information technology infrastructure, and inferring possible problems causing the irregularities.
- IT information technology
- An enterprise's IT infrastructure needs to consistently perform to specification to ensure the success of the business.
- the IT infrastructure may be used for an enterprise's communication, database management, inventory tracking, shipment records, website management, business-to-business (B2B) ecommerce, business-to-consumer (B2C) ecommerce, accounting, billing, order tracking, customer support tracking, document management, and a possibly infinite number of other tasks.
- the IT infrastructure is made up of numerous components such as servers, hubs, switches, computers, the links connecting these devices, the software applications running these devices, and so on. Each of these components generally has numerous subcomponents such as a central processing unit (CPU), bus, memory, and similar high-level electrical components.
- CPU central processing unit
- memory memory
- a monitor may alter states depending on whether a computer is online or offline. If the monitor indicates the computer is offline, technicians may need to take action to put it back online, or switch to a backup computer. For relatively simple and small-scale IT infrastructures such monitoring may be adequate. However, some enterprises have IT infrastructures so large and complex that merely having information about individual components or subcomponents may not be very helpful.
- a server failure combination does cause a problem so that the enterprise's website stops functioning properly, it may not be quickly determined which server or servers of the failed servers are the critical servers causing the problem.
- the IT technicians receive no assistance from prior art monitors in how to prioritize technical involvement or in identifying what specific combination of components or subcomponents may be the cause of the particular problem.
- prior art monitors are incapable of predicting when problems or out-of-compliance conditions, i.e. performance irregularities may occur. Thus, IT technicians must wait until the IT infrastructure fails to function according to specification, and then they must find the cause of the failure without assistance from the monitors, other than the raw data they provide.
- the present invention may include receiving a plurality of component metrics, each component metric related to a corresponding component of an (IT) infrastructure of an enterprise, each component being associated with one or more gross-level rules, and generating an indicator set by comparing each received component metric to relevant historical values of the component metric.
- the present invention may also include determining that a gross-level rule is out of compliance, comparing the indicator set to one or more historical indicator sets to determine whether the indicator set resembles any of the one or more historical indicator sets, and performing an appropriate action based on the result of the comparison.
- FIG. 1 is a block diagram of an exemplary IT infrastructure that may be monitored using an embodiment of the invention
- FIG. 2 is an exemplary computer system on which embodiments of the present invention may be practiced
- FIG. 3 is a flow chart illustrating fingerprint creation according to one embodiment of the invention.
- FIG. 4 is a flow chart illustrating problem inference according to one embodiment of the invention.
- FIG. 5A is a pictorial representation of a fingerprint according to one embodiment of the present invention.
- FIG. 5B is a pictorial representation of a fingerprint according to one embodiment of the present invention.
- FIG. 6 is a flow chart illustrating out-of-compliance prediction according to one embodiment of the invention.
- Embodiments of the present invention combine measurements from numerous monitors. These monitors collect data from a number of components and subcomponents of an enterprise's IT infrastructure, and provide the data regarding each of the components. Using the combined measurements, historical values of the monitors, and statistical analysis, embodiments of the present invention can predict that the IT infrastructure will not perform up to specification in the future. If the IT infrastructure is already not performing up to specification, embodiments of the present invention can suggest possible problems causing the out-of-compliance condition, and may even suggest solutions to those problems. This may be accomplished by using combined measurements and statistical analysis in a process herein referred to as “fingerprinting.” As used in this description, a fingerprint is synonymous with the term “indicator set,” a more general if somewhat less colorful term. The generation of fingerprints—indicator sets—is described in detail further below.
- FIG. 1 shows an example IT infrastructure 100 .
- This infrastructure 100 includes various networks, such as local area network (LAN) 102 , servers, such as servers 104 , end-user workstations or clients, such as workstations 106 , and databases, such as databases 108 and 110 .
- the Servers 104 may come in many varieties for different functions, including message queuing servers, application servers, web servers, departmental servers, and enterprise software servers. This is a highly simplified IT infrastructure used here to ease understanding of the embodiments described.
- An actual infrastructure may contain thousands of components, and may contain numerous components not shown in FIG. 1 .
- Examples of some other components may include load balancing equipment, switching equipment, various communications devices such as wireless devices and antennas, and a multitude of other IT equipment. Some example subcomponents are also shown in FIG. 1 , such as CPU 112 on a workstation from the workstations 106 , and the level of memory use 114 on the database 108 . These and other IT components may be monitored in a number of different ways. Examples component monitoring may include measuring disk utilization, network volume and throughput, and error types and error volume. Furthermore, monitoring may be software application-specific, measuring such application performance metrics as cache performance, queue performance, and other application-specific metrics.
- Monitors may gather data related to many levels of the IT infrastructure.
- LAN 102 may have a monitor to measure performance of the LAN 102 as a whole, one or more monitors to measure the performance of various nodes 116 of the LAN 102 , and one or more monitors to measure the performance of the switching apparatus 118 used to connect the LAN 102 .
- monitors may be associated with each component of each node 116 and the switching apparatus 118 . The data collected by these monitors may then communicated to an Indicator Engine 120 via communications link 122 , represented in FIG. 1 as a dotted line. All other dotted lines in FIG.
- the Indicator Engine 120 represent a way for the monitors of the components, subcomponents, and sub-subcomponents to report the collected raw data and or processed data to the Indicator Engine 120 .
- the Indicator Engine may be located or implemented anywhere inside or outside of the IT infrastructure 100 .
- the Indicator Engine 120 may be resident on one of the servers 104 , on one of the end-user workstations 106 , or distributed among various servers 104 and workstations 106 .
- the Indicator Engine 120 is shown as being a separate entity in FIG. 1 to emphasize that it is used to monitor the IT infrastructure.
- the Indicator Engine 120 resides on a machine described in detail with reference to FIG. 2 .
- computer system 200 may be a personal computer.
- certain aspects of the embodiments may be carried out on a specialized device while other aspects may be carried out on a general-purpose computer coupled to the device.
- Computer system 200 comprises a bus or other communication means 201 for communicating information, and a processing means such as processor 202 coupled with bus 201 for processing information.
- processor 202 coupled with bus 201 for processing information.
- the tasks performed to practice embodiments of the invention are performed by the processor 202 either directly or indirectly.
- Computer system 200 further comprises a random access memory (RAM) or other dynamic storage device 204 (referred to as main memory), coupled to bus 201 for storing information and instructions to be executed by processor 202 .
- Main memory 204 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 202 .
- Computer system 200 also comprises a read only memory (ROM) and/or other static storage device 206 coupled to bus 201 for storing static information and instructions for processor 202 .
- ROM read only memory
- a data storage device 207 such as a magnetic disk or optical disc and its corresponding drive may also be coupled to bus 201 for storing information and instructions.
- Computer system 200 can also be coupled via bus 201 to a display device 221 , such as a cathode ray tube (CRT) or Liquid Crystal Display (LCD), for displaying information to a computer user.
- Computer system 200 can also be coupled via bus 201 to a printing device 224 , such as a laser printer, or any other printer.
- an alphanumeric input device 222 may be coupled to bus 201 for communicating information and/or command selections to processor 202 .
- cursor control 223 Another type of user input device is cursor control 223 , such as a mouse.
- a communication device 225 is also coupled to bus 201 for accessing remote servers or other servers via the Internet, for example.
- the communication device 225 may include a modem, a network interface card, or other well-known interface devices, such as those used for coupling to an Ethernet, token ring, or other types of networks.
- the computer system 200 may be coupled to a number of clients and/or servers via a conventional network infrastructure, such as a company's Intranet and/or the Internet, for example.
- the information contained in the aggregation of monitors may be more than the sum of the information reported by the individual monitors.
- information may be deduced about unmonitored components, future irregularities, present problems, and other aspects of the IT infrastructure.
- collective monitoring it may be possible to observe conditions conducive to IT infrastructure breakdowns.
- artificial intelligence AI
- collective monitoring it may be possible to self-heal the IT infrastructure either before a problem even occurs, or after it has occurred.
- Artificial intelligence (AI) may also be used to allow the monitoring system to learn to better heal the IT infrastructure over time.
- collective monitoring is achieved by using fingerprints or indicator sets.
- the specifications for an IT infrastructure may be provided in terms of gross-level rules.
- a gross-level rule such as a business rule, indicates the expected or desired functioning of the IT infrastructure.
- a gross-level rule may specify a maximum or average response time for a web server request.
- Another gross-level rule may similarly specify the number of order lines that should be processed per second.
- Other gross-level rules may give constraints on transaction throughput.
- the IT infrastructure may be said to be functioning according to specification if all, or at least a certain minimum number, of the gross-level rules are in compliance. If a gross-level rule is not in compliance, corrective action may be desired.
- An actual monitoring system may consider all gross-level rules. However, for simplicity, only compliance with one gross-level rule is considered in the following discussion. Also, since not all components are related to all gross-level rules, the following discussion assumes that a gross-level rule has been selected for analysis, and that the components and subcomponents that may affect the gross-level rule have been identified.
- an indicator set for the gross-level rule can be created.
- the generation of the indicator set, or fingerprint, is now described with reference to FIG. 3 .
- all or a critical number of monitors report their measurements, referred to herein as component metrics, and these component metrics are received 302 at the Indicator Engine. These measurements may be timestamped or synchronized, if temporal measurement synchronization is desired. The measurements may be collected periodically. There may be numerous triggering events to be used to commence the fingerprint creation process.
- the measurements received from the monitors may be in various forms and formats.
- a measurement may be raw data such as “memory utilization 70 percent.”
- the measurement may be statistically or otherwise preprocessed.
- the measurement may also be a monitor state.
- Such a state may be binary, e.g., “computer on/off-line,” or graduated, e.g. “CPU usage high/medium/low.”
- Other states are also possible, such as trend indicator states, for example, “CPU usage increasing/decreasing.”
- Statistical states may also be used, such as “CPU usage outside normal range.”
- each component metric may be compared 304 with historical values or statistics for the component metrics. For example, if a monitor that measures the usage of a memory unit reports the component metric as 70 percent utilization, the Indicator Engine may compare that metric to an average utilization for that memory unit of 50 percent. The comparison 304 , may thus result in the observation that the particular memory element is 20 percent more utilized than average. This intermediate processed measurement is herein referred to as a component statistic.
- the comparison 304 may be done in a time or date sensitive manner. That is, the comparison may only be of relevant historical measurements. For example, the average utilization of 50 percent given above may only be true for weekdays. On weekends, the memory utilization may be 20 percent on average. If the component metrics were collected on a weekday, they may be compared to historical weekday component performance only. Similarly, some components may be sensitive to the time of day, day of the week, week of the month, or month of the year. Other less conventional patterns may also be observed and used to change the relevant historical values and averages the component metrics are compared to.
- the results of the comparisons may be expressed 306 as a measurement relating the current state of the component metric to usual performance of the component in a way comparable with other components.
- This measurement is herein referred to as a component indicator.
- a component indicator may be some measure of normal variation. Such a measurement may be a standard deviation, difference in percentages, a distance from a mean or median, or various trend indicators, such as “increasing/decreasing.”
- the component indicators may also be a variance, a process control metric, some other statistical measurement, or a normalized version of any of the above measurements.
- the component indicators are numbers normalized to lie between zero and one, with zero indicating completely normal component performance, and one indicating completely irregular performance.
- the normalization may also be accomplished using a Fermi-Dirac probability distribution curve. This curve maps the number of standard deviations a metric is away from the mean to a number between 0 and 1.
- the Fermi-Dirac curve is adjusted to map one standard deviation to the value 0.5.
- a component metric of 60 percent may result in a component indicator around 0.4
- a component metric of 70 percent may result in a component indicator around 0.6, since 70 percent is more irregular than 60 percent for this particular component.
- the collection of component indicators associated with a gross-level rule is herein referred to either as an indicator set, or a fingerprint. If all components of the IT infrastructure being monitored are used to generate the indicator set, it is referred to as a global indicator set, or global fingerprint. Fingerprints for each individual gross-level rule may then be generated by taking only the component indicators from the global fingerprint that are associated with, i.e., contribute to, each gross-level rule.
- Embodiments of the present invention may be used to predict gross-level rule out-of-compliance, to infer a problem causing an already existing out-of-compliance condition, or a combination of both. How the embodiments may be used depends on whether the particular gross-level rule of interest is out of compliance at the time the fingerprint is generated. Exemplary processing triggered after a gross-level rule goes out of compliance, will now be described with reference to FIG. 4 . Since, the gross-level rule is already out of compliance, these embodiments may be referred to as non-predictive, even though they do infer problems that may be directly or indirectly causing or contributing to the out-of-compliance condition.
- an indicator set may be generated 404 . This can be done according to the process described above with reference to FIG. 3 . Once the indicator set is generated, it is compared to historical indicator sets 406 , to determine if the indicator set sufficiently resembles one or more of these historical indicator sets.
- the comparison 406 may be carried out in a variety of ways. In one embodiment of the present invention, weights, thresholds, or a combination of weights and thresholds may be used to decide resemblance, as explained below with reference to FIGS. 5A and 5B .
- the historical indicator sets may have been generated at some time in the past, and may be stored in some memory element or database somewhere in the IT infrastructure or the Indicator Engine.
- the historical indicator set may be associated with one or more problems that existed at the time the historical indicator set was generated.
- a historical indicator set may be associated with one or more fixes or solutions that were used at the time the historical indicator set was generated. For example, if the historical fingerprint was generated when the gross-level rule was out of compliance because a particular memory unit was overloaded, the historical fingerprint may now be associated with the memory unit being overloaded. Furthermore, if in the past reconfiguring a router eased the memory unit's loading, this fix may also be associated with the historical fingerprint.
- the historical fingerprint may be stored with some context.
- the historical fingerprint may be associated with a temporal context, i.e. a time and date of the occurrence of the fingerprint, to aid the comparison.
- Temporal context may be especially helpful if the component indicators of the historical indicator set are seasonal or time-sensitive, as discussed above.
- a rule identifier, a fingerprint name, a fingerprint description, and various problem resolution notes may also be stored with the historical indicator set.
- Another context such as a relative context, may be used to associate the historical fingerprint to above/below normal monitor conditions.
- the indicator set may resemble one or more of the historical indicator sets.
- one embodiment of the present invention may infer 408 one or more problems causing the gross-level rule to be out of compliance. This may consist of inferring that a problem that existed in the past when the gross-level rule was out of compliance at a time that the historical indicator set was generated has reoccurred. This inference is based on the statistical likelihood of the indicator set's resembling the historical indicator set.
- the technician may have been determined by a technician that a CPU was crashed somewhere in the IT infrastructure, causing the out-of-compliance condition.
- the technician using an embodiment of the present invention that included a user interface, may have associated the problem of the crashed CPU with this fingerprint. That is, the technician may have associated a solution with the historical fingerprint, the solution being to repair the particular crashed CPU.
- the CPU monitor alone would not indicate the critical nature of the CPU to the gross-level rule.
- the CPU may not even be monitored.
- the crashed CPU may create recurring monitor patterns even without being directly monitored.
- the problems may be ranked according to some statistical likelihood of reoccurrence.
- the problems associated with those fingerprints may also be ranked according to some statistical likelihood.
- the weights may be adjusted by increasing or decreasing the existing weight for each monitor depending on the historical standard deviation average for that monitor. For example, if the monitor is outside its historical standard deviation average, a counter c outside may be increased by one. Similarly, if the monitor is inside its historical standard deviation average, a counter c inside may be increased by one.
- the weights used to compare component indicators are adjusted 414 to reflect the possibly decreased statistical significance of some component indicators in the set and the problem. If the real cause of the problem is determined, it may be associated with the indicator set 416 for future reference at a time when the indicator has been stored as a historical indicator set. At first, the correlation between the problem and the indicator set may not be strong, but over time as the problem reoccurs and the weights get adjusted repeatedly, this correlation may increase significantly. For example, it may increase to a level where a new indicator set resembling the now historical indicator set means the same problem occurred with a 90 percent probability.
- the set may be stored 418 , and associated with a problem if one is found 420 , as described above.
- one task is to determine the correct pricing for an order line item.
- This generally may involve a synchronous method call through an Enterprise Application Integration (EAI) server that, in turn, invokes a transaction server method call in an Enterprise Resource Planning (ERP) system.
- EAI Enterprise Application Integration
- ERP Enterprise Resource Planning
- the ERP system accesses a database to retrieve the pricing information, perform appropriate calculations, and return the result.
- There may be two different historical fingerprints associated with poor response time in getting pricing information.
- the component indicators of one historical fingerprint may show correlation to CPU performance and memory utilization on the ERP server.
- the problem associated with this fingerprint may be: “Concurrent batch processes impacting real-time request performance.”
- the second historical fingerprint may show a correlation to the performance of a database using component indicators showing that the ERP CPU Input/Output wait time is abnormally high, the database server CPU utilization is abnormally low, the database server memory utilization is abnormally high, and the database cache hit ratio is lower than normal.
- This second fingerprint may be associated with a problem described by a user as: “Database is not performing well, necessary data not found in buffer cache.” If the new indicator set sufficiently resembles, i.e., matches, the one of the historical fingerprints, the problem described by the user may be inferred.
- FIG. 5A shows the indicator set 502 being compared with historical indicator set 552 .
- Each dot 504 in the indicator sets represents a component indicator.
- the component indicators may be processed measurements.
- the component indicators may be binary indicators based on whether the components are outside a historical normal range.
- the top-left component indicator is associated with the same monitor in both of the indicator sets.
- an X 506 indicates that a component indicator is outside its normal range. This may be the result of the component indicator being above a certain threshold.
- a box 508 around a component indicator dot 504 represents that the weight of the component indicator is above a certain threshold.
- These weights, used for comparing fingerprints, may be kept for each component indicator.
- the weights may be used to show the relevance of a component indicator in the indicator set. For example, the lower-left component indicator in FIG. 5A does not have a box around it. This may mean that this component indicator is not strongly correlated with a problem associated with fingerprint 552 . For example, if this component indicator indicates CPU speed, and about half the time a particular problem associated with fingerprint 552 arises, the CPU speed is normal, and the other half of the time it is not normal, then the CPU speed appears to have little effect on whether the problem has occurred yet again.
- the weight of that component indicator in that historical fingerprint may be low, indicating a low correlation.
- one condition may be that if two or more component indicators that have weights above a threshold, i.e., have a box around them, are out of normal range, i.e., have X's, on both the indicator set and the historical indicator set, then the two indicator sets sufficiently resemble each other.
- indicator set 506 resembles historical indicator set 552 in FIG. 5A .
- FIG. 5B shows an indicator set 514 that does not sufficiently resemble historical indicator set 552 according to the above resemblance condition.
- the above specification for resemblance is given by way of example only. There are many alternative ways in which such conditions may be defined or formulated. Any statistical comparison may be used to determine whether the indicator set resembles any of the historical indicator sets.
- Some embodiments of the invention that may predict a future out-of-compliance condition before it occurs are now described with reference to FIG. 6 . Since, the gross-level rule is not yet out of compliance at the time the monitor data is collected, these embodiments may be used to predict whether the gross-level rule is approaching an out-of-compliance condition.
- the prediction process is periodically performed for each gross-level rule. Thus, the prediction process according to one embodiment of the invention may begin with the selection 602 of a particular gross-level rule.
- an indicator set for the selected gross-level rule may be generated 604 in any of the ways described above with reference to FIG. 3 .
- the indicator set is compared 606 to historical indicator sets in the manner described with reference to FIG. 4 .
- the comparison 606 determines whether there is sufficient resemblance between the indicator set and at least one historical indicator set. This resemblance condition may be different from the resemblance condition used above with reference to FIG. 4 , or it may be the same resemblance condition.
- the historical indicator sets may have been generated at a time when the gross-level rule was out of compliance. If the indicator set does not sufficiently resemble any of the historical indicator sets, then no prediction is made about the gross-level rule. The process may begin again by a selection 602 of another gross-level rule.
- the monitoring system may then infer 610 potential problems, fixes, or solutions associated with the particular historical fingerprint or fingerprints, similar to the process described with reference to FIG. 4 .
- the indicator set generated at that time is stored and associated with the out of compliance condition.
- Indicator sets that are very similar may not need to be stored independently, but may be represented by weighing the historical indicator sets. For example, a historical indicator set whose statistical equivalents have been generated 90 percent of the time the gross-level rule went out of compliance may have a high weight. Statistical equivalents may be defined as those fingerprints judged sufficiently similar by the comparison process. Many alternative weighing schemes are possible that emphasize a historical indicator set. Stated another way, these weights may indicate the reliability or quality of the predictive indicator set. Such a high quality historical indicator set may be referred to as a predictive fingerprint. If fingerprints generated while the gross-level rule is in compliance begin to resemble the predictive fingerprint, the out-of-compliance condition associated with the fingerprint is predicted to be approaching.
- Embodiments of the present invention include various processes.
- the processes of the embodiments of the present invention may be performed by hardware components, or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes.
- the processes may be performed by a combination of hardware and software.
- Embodiments of the present invention may be provided as a computer program product, which may include a machine-readable medium having stored thereon instructions, which may be used to program a computer or other electronic device to perform a process according to an embodiment of the present invention.
- the machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions.
- embodiments of the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
- a communication link e.g., a modem or network connection
- the Indicator Engine has been described as receiving raw monitor data that it converts and processes into fingerprints.
- the monitor data may not be raw data, but may be pre-processed data.
- the monitors may perform some or all of the statistical analysis needed to create the component indicators.
- the Indicator Engine need not reside on a single machine, but may be distributed throughout the IT infrastructure.
- a historical fingerprint associated with an out-of-compliance condition is described as having been generated when the corresponding gross-level rule was out of compliance.
- the historical fingerprint may have been generated at any time before or after the beginning of the out-of-compliance condition, so long as the temporal relation implies a sufficient statistical relation.
- historical fingerprint chains corresponding to the procession of fingerprints that led to the out-of-compliance condition may also be stored, and used in the comparisons to determine whether a fingerprint resembles historical fingerprints, or whether a fingerprint chain currently occurring resembles a historical fingerprint chain.
Abstract
An information technology (IT) infrastructure may be monitored, and the data thus collected may be used to infer problems and predict future conditions. In one embodiment, the present invention may include receiving a plurality of component metrics, each component metric related to a corresponding component of an (IT) infrastructure of an enterprise, each component being associated with one or more gross-level rules, and generating an indicator set by comparing each received component metric to relevant historical values of the component metric. In one embodiment, the present invention may also include determining that a gross-level rule is out of compliance, comparing the indicator set to one or more historical indicator sets to determine whether the indicator set resembles any of the one or more historical indicator sets, and performing an appropriate action based on the result of the comparison.
Description
- This application is a Continuation of U.S. application Ser. No. 10/112,015 filed on Mar. 29, 2002, which said application claims the benefit of U.S. Provisional Application Nos. 60/281,991 and 60/282,363, both filed on Apr. 17, 2001, all of which are hereby incorporated by reference for all purposes.
- 1. Field of the Invention
- The present invention applies to the field of monitoring and maintaining an enterprise's information technology infrastructure and, in particular, to predicting performance irregularities in an enterprise's information technology infrastructure, and inferring possible problems causing the irregularities.
- 2. Description of the Related Art
- Modern businesses rely on information technology (IT) to assist in carrying out business tasks. An enterprise's IT infrastructure needs to consistently perform to specification to ensure the success of the business. The IT infrastructure may be used for an enterprise's communication, database management, inventory tracking, shipment records, website management, business-to-business (B2B) ecommerce, business-to-consumer (B2C) ecommerce, accounting, billing, order tracking, customer support tracking, document management, and a possibly infinite number of other tasks. The IT infrastructure is made up of numerous components such as servers, hubs, switches, computers, the links connecting these devices, the software applications running these devices, and so on. Each of these components generally has numerous subcomponents such as a central processing unit (CPU), bus, memory, and similar high-level electrical components. The proper functioning of the IT infrastructure in large part depends on the proper functioning of these components and subcomponents.
- Recognizing the importance of proper functioning, various monitoring tools have been developed to measure the performance of components and subcomponents. For example, a monitor may alter states depending on whether a computer is online or offline. If the monitor indicates the computer is offline, technicians may need to take action to put it back online, or switch to a backup computer. For relatively simple and small-scale IT infrastructures such monitoring may be adequate. However, some enterprises have IT infrastructures so large and complex that merely having information about individual components or subcomponents may not be very helpful.
- For example, in a large-scale IT infrastructure that may include many servers, having a particular combination of sever failures may not cause a problem. However, if a server failure combination does cause a problem so that the enterprise's website stops functioning properly, it may not be quickly determined which server or servers of the failed servers are the critical servers causing the problem. The IT technicians receive no assistance from prior art monitors in how to prioritize technical involvement or in identifying what specific combination of components or subcomponents may be the cause of the particular problem. Furthermore, prior art monitors are incapable of predicting when problems or out-of-compliance conditions, i.e. performance irregularities may occur. Thus, IT technicians must wait until the IT infrastructure fails to function according to specification, and then they must find the cause of the failure without assistance from the monitors, other than the raw data they provide.
- An information technology (IT) infrastructure may be monitored, and the data thus collected may be used to infer problems and predict future conditions. In one embodiment, the present invention may include receiving a plurality of component metrics, each component metric related to a corresponding component of an (IT) infrastructure of an enterprise, each component being associated with one or more gross-level rules, and generating an indicator set by comparing each received component metric to relevant historical values of the component metric. In one embodiment, the present invention may also include determining that a gross-level rule is out of compliance, comparing the indicator set to one or more historical indicator sets to determine whether the indicator set resembles any of the one or more historical indicator sets, and performing an appropriate action based on the result of the comparison.
- The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements and in which:
-
FIG. 1 is a block diagram of an exemplary IT infrastructure that may be monitored using an embodiment of the invention; -
FIG. 2 is an exemplary computer system on which embodiments of the present invention may be practiced; -
FIG. 3 is a flow chart illustrating fingerprint creation according to one embodiment of the invention; -
FIG. 4 is a flow chart illustrating problem inference according to one embodiment of the invention; -
FIG. 5A is a pictorial representation of a fingerprint according to one embodiment of the present invention; -
FIG. 5B is a pictorial representation of a fingerprint according to one embodiment of the present invention; and -
FIG. 6 is a flow chart illustrating out-of-compliance prediction according to one embodiment of the invention. - IT Infrastructure Monitoring
- Embodiments of the present invention combine measurements from numerous monitors. These monitors collect data from a number of components and subcomponents of an enterprise's IT infrastructure, and provide the data regarding each of the components. Using the combined measurements, historical values of the monitors, and statistical analysis, embodiments of the present invention can predict that the IT infrastructure will not perform up to specification in the future. If the IT infrastructure is already not performing up to specification, embodiments of the present invention can suggest possible problems causing the out-of-compliance condition, and may even suggest solutions to those problems. This may be accomplished by using combined measurements and statistical analysis in a process herein referred to as “fingerprinting.” As used in this description, a fingerprint is synonymous with the term “indicator set,” a more general if somewhat less colorful term. The generation of fingerprints—indicator sets—is described in detail further below.
- Embodiments of the present invention may be used to monitor an enterprise's IT infrastructure. The collection of monitor data is now described with reference to
FIG. 1 .FIG. 1 shows anexample IT infrastructure 100. Thisinfrastructure 100 includes various networks, such as local area network (LAN) 102, servers, such asservers 104, end-user workstations or clients, such asworkstations 106, and databases, such asdatabases Servers 104 may come in many varieties for different functions, including message queuing servers, application servers, web servers, departmental servers, and enterprise software servers. This is a highly simplified IT infrastructure used here to ease understanding of the embodiments described. An actual infrastructure may contain thousands of components, and may contain numerous components not shown inFIG. 1 . Examples of some other components may include load balancing equipment, switching equipment, various communications devices such as wireless devices and antennas, and a multitude of other IT equipment. Some example subcomponents are also shown inFIG. 1 , such asCPU 112 on a workstation from theworkstations 106, and the level of memory use 114 on thedatabase 108. These and other IT components may be monitored in a number of different ways. Examples component monitoring may include measuring disk utilization, network volume and throughput, and error types and error volume. Furthermore, monitoring may be software application-specific, measuring such application performance metrics as cache performance, queue performance, and other application-specific metrics. - Monitors may gather data related to many levels of the IT infrastructure. For example,
LAN 102 may have a monitor to measure performance of theLAN 102 as a whole, one or more monitors to measure the performance ofvarious nodes 116 of theLAN 102, and one or more monitors to measure the performance of theswitching apparatus 118 used to connect theLAN 102. Furthermore, monitors may be associated with each component of eachnode 116 and theswitching apparatus 118. The data collected by these monitors may then communicated to anIndicator Engine 120 viacommunications link 122, represented inFIG. 1 as a dotted line. All other dotted lines inFIG. 1 represent a way for the monitors of the components, subcomponents, and sub-subcomponents to report the collected raw data and or processed data to theIndicator Engine 120. The Indicator Engine may be located or implemented anywhere inside or outside of theIT infrastructure 100. For example, theIndicator Engine 120 may be resident on one of theservers 104, on one of the end-user workstations 106, or distributed amongvarious servers 104 andworkstations 106. In this example, theIndicator Engine 120 is shown as being a separate entity inFIG. 1 to emphasize that it is used to monitor the IT infrastructure. - In one embodiment of the present invention, the
Indicator Engine 120 resides on a machine described in detail with reference toFIG. 2 . In one embodiment,computer system 200 may be a personal computer. In some embodiments of the invention, certain aspects of the embodiments may be carried out on a specialized device while other aspects may be carried out on a general-purpose computer coupled to the device. -
Computer system 200 comprises a bus or other communication means 201 for communicating information, and a processing means such asprocessor 202 coupled withbus 201 for processing information. In one embodiment of the invention, the tasks performed to practice embodiments of the invention are performed by theprocessor 202 either directly or indirectly. -
Computer system 200 further comprises a random access memory (RAM) or other dynamic storage device 204 (referred to as main memory), coupled tobus 201 for storing information and instructions to be executed byprocessor 202.Main memory 204 also may be used for storing temporary variables or other intermediate information during execution of instructions byprocessor 202.Computer system 200 also comprises a read only memory (ROM) and/or otherstatic storage device 206 coupled tobus 201 for storing static information and instructions forprocessor 202. - A
data storage device 207 such as a magnetic disk or optical disc and its corresponding drive may also be coupled tobus 201 for storing information and instructions.Computer system 200 can also be coupled viabus 201 to adisplay device 221, such as a cathode ray tube (CRT) or Liquid Crystal Display (LCD), for displaying information to a computer user.Computer system 200 can also be coupled viabus 201 to aprinting device 224, such as a laser printer, or any other printer. - Typically, an
alphanumeric input device 222, including alphanumeric and other keys, may be coupled tobus 201 for communicating information and/or command selections toprocessor 202. Another type of user input device iscursor control 223, such as a mouse. Acommunication device 225 is also coupled tobus 201 for accessing remote servers or other servers via the Internet, for example. Thecommunication device 225 may include a modem, a network interface card, or other well-known interface devices, such as those used for coupling to an Ethernet, token ring, or other types of networks. In any event, in this manner, thecomputer system 200 may be coupled to a number of clients and/or servers via a conventional network infrastructure, such as a company's Intranet and/or the Internet, for example. - Problem Inference and Out-of-Compliance Prediction
- Using the data collected by the monitors deployed throughout the IT infrastructure, the information contained in the aggregation of monitors may be more than the sum of the information reported by the individual monitors. By learning about various collective monitor patterns exhibited by groups of monitors, information may be deduced about unmonitored components, future irregularities, present problems, and other aspects of the IT infrastructure. Using collective monitoring, it may be possible to observe conditions conducive to IT infrastructure breakdowns. Using artificial intelligence (AI), it may also be possible for a monitoring solution to improve these predictive powers based on the past performance of individual IT infrastructures. Furthermore, using collective monitoring, it may be possible to self-heal the IT infrastructure either before a problem even occurs, or after it has occurred. Artificial intelligence (AI) may also be used to allow the monitoring system to learn to better heal the IT infrastructure over time. In one embodiment, collective monitoring is achieved by using fingerprints or indicator sets.
- Generating the Indicator Set
- The specifications for an IT infrastructure may be provided in terms of gross-level rules. A gross-level rule, such as a business rule, indicates the expected or desired functioning of the IT infrastructure. For example, a gross-level rule may specify a maximum or average response time for a web server request. Another gross-level rule may similarly specify the number of order lines that should be processed per second. Other gross-level rules may give constraints on transaction throughput. There are many other various gross-level rules, some of which may be unique to the enterprise whose IT infrastructure is being monitored. The IT infrastructure may be said to be functioning according to specification if all, or at least a certain minimum number, of the gross-level rules are in compliance. If a gross-level rule is not in compliance, corrective action may be desired.
- An actual monitoring system according to embodiments of the present invention may consider all gross-level rules. However, for simplicity, only compliance with one gross-level rule is considered in the following discussion. Also, since not all components are related to all gross-level rules, the following discussion assumes that a gross-level rule has been selected for analysis, and that the components and subcomponents that may affect the gross-level rule have been identified.
- Once the components that are relevant to a gross-level rule are identified, an indicator set for the gross-level rule can be created. The generation of the indicator set, or fingerprint, is now described with reference to
FIG. 3 . First, all or a critical number of monitors report their measurements, referred to herein as component metrics, and these component metrics are received 302 at the Indicator Engine. These measurements may be timestamped or synchronized, if temporal measurement synchronization is desired. The measurements may be collected periodically. There may be numerous triggering events to be used to commence the fingerprint creation process. - The measurements received from the monitors may be in various forms and formats. A measurement may be raw data such as “memory utilization 70 percent.” The measurement may be statistically or otherwise preprocessed. The measurement may also be a monitor state. Such a state may be binary, e.g., “computer on/off-line,” or graduated, e.g. “CPU usage high/medium/low.” Other states are also possible, such as trend indicator states, for example, “CPU usage increasing/decreasing.” Statistical states may also be used, such as “CPU usage outside normal range.”
- When the raw monitor measurements, i.e., component metrics, are collected, each component metric may be compared 304 with historical values or statistics for the component metrics. For example, if a monitor that measures the usage of a memory unit reports the component metric as 70 percent utilization, the Indicator Engine may compare that metric to an average utilization for that memory unit of 50 percent. The
comparison 304, may thus result in the observation that the particular memory element is 20 percent more utilized than average. This intermediate processed measurement is herein referred to as a component statistic. - The
comparison 304 may be done in a time or date sensitive manner. That is, the comparison may only be of relevant historical measurements. For example, the average utilization of 50 percent given above may only be true for weekdays. On weekends, the memory utilization may be 20 percent on average. If the component metrics were collected on a weekday, they may be compared to historical weekday component performance only. Similarly, some components may be sensitive to the time of day, day of the week, week of the month, or month of the year. Other less conventional patterns may also be observed and used to change the relevant historical values and averages the component metrics are compared to. - Using further statistical and normalization methods, the results of the comparisons may be expressed 306 as a measurement relating the current state of the component metric to usual performance of the component in a way comparable with other components. This measurement is herein referred to as a component indicator. A component indicator may be some measure of normal variation. Such a measurement may be a standard deviation, difference in percentages, a distance from a mean or median, or various trend indicators, such as “increasing/decreasing.” The component indicators may also be a variance, a process control metric, some other statistical measurement, or a normalized version of any of the above measurements. In one embodiment of the invention, the component indicators are numbers normalized to lie between zero and one, with zero indicating completely normal component performance, and one indicating completely irregular performance. The normalization may also be accomplished using a Fermi-Dirac probability distribution curve. This curve maps the number of standard deviations a metric is away from the mean to a number between 0 and 1. In one embodiment, the Fermi-Dirac curve is adjusted to map one standard deviation to the value 0.5.
- For example, if the average utilization of the memory element monitored above is 50 percent, a component metric of 60 percent may result in a component indicator around 0.4, while a component metric of 70 percent may result in a component indicator around 0.6, since 70 percent is more irregular than 60 percent for this particular component. These numbers are given only for demonstration. The precise numbers may vary widely depending on the exact statistical and normalization calculations utilized. For example, the component indicators may be normalized to lie between zero and 100, or any other range, without any substantial alteration to the embodiments here discussed.
- The collection of component indicators associated with a gross-level rule is herein referred to either as an indicator set, or a fingerprint. If all components of the IT infrastructure being monitored are used to generate the indicator set, it is referred to as a global indicator set, or global fingerprint. Fingerprints for each individual gross-level rule may then be generated by taking only the component indicators from the global fingerprint that are associated with, i.e., contribute to, each gross-level rule.
- Embodiments of the present invention may be used to predict gross-level rule out-of-compliance, to infer a problem causing an already existing out-of-compliance condition, or a combination of both. How the embodiments may be used depends on whether the particular gross-level rule of interest is out of compliance at the time the fingerprint is generated. Exemplary processing triggered after a gross-level rule goes out of compliance, will now be described with reference to
FIG. 4 . Since, the gross-level rule is already out of compliance, these embodiments may be referred to as non-predictive, even though they do infer problems that may be directly or indirectly causing or contributing to the out-of-compliance condition. - Some time after the gross-level rule goes out of compliance, an indicator set may be generated 404. This can be done according to the process described above with reference to
FIG. 3 . Once the indicator set is generated, it is compared to historical indicator sets 406, to determine if the indicator set sufficiently resembles one or more of these historical indicator sets. Thecomparison 406 may be carried out in a variety of ways. In one embodiment of the present invention, weights, thresholds, or a combination of weights and thresholds may be used to decide resemblance, as explained below with reference toFIGS. 5A and 5B . - The historical indicator sets may have been generated at some time in the past, and may be stored in some memory element or database somewhere in the IT infrastructure or the Indicator Engine. The historical indicator set may be associated with one or more problems that existed at the time the historical indicator set was generated. Also, a historical indicator set may be associated with one or more fixes or solutions that were used at the time the historical indicator set was generated. For example, if the historical fingerprint was generated when the gross-level rule was out of compliance because a particular memory unit was overloaded, the historical fingerprint may now be associated with the memory unit being overloaded. Furthermore, if in the past reconfiguring a router eased the memory unit's loading, this fix may also be associated with the historical fingerprint.
- Also, the historical fingerprint may be stored with some context. For example, the historical fingerprint may be associated with a temporal context, i.e. a time and date of the occurrence of the fingerprint, to aid the comparison. Temporal context may be especially helpful if the component indicators of the historical indicator set are seasonal or time-sensitive, as discussed above. In addition to temporal context, a rule identifier, a fingerprint name, a fingerprint description, and various problem resolution notes may also be stored with the historical indicator set. Another context, such as a relative context, may be used to associate the historical fingerprint to above/below normal monitor conditions.
- The indicator set may resemble one or more of the historical indicator sets. In this case, one embodiment of the present invention may infer 408 one or more problems causing the gross-level rule to be out of compliance. This may consist of inferring that a problem that existed in the past when the gross-level rule was out of compliance at a time that the historical indicator set was generated has reoccurred. This inference is based on the statistical likelihood of the indicator set's resembling the historical indicator set.
- For example, at some time in the past when the gross-level rule was out of compliance and the historical fingerprint was generated, it may have been determined by a technician that a CPU was crashed somewhere in the IT infrastructure, causing the out-of-compliance condition. The technician, using an embodiment of the present invention that included a user interface, may have associated the problem of the crashed CPU with this fingerprint. That is, the technician may have associated a solution with the historical fingerprint, the solution being to repair the particular crashed CPU.
- In a complex IT infrastructure, the CPU monitor alone would not indicate the critical nature of the CPU to the gross-level rule. Alternatively, the CPU may not even be monitored. However, since embodiments of the present invention consider numerous monitors, the crashed CPU may create recurring monitor patterns even without being directly monitored. Thus, in this example, if the two fingerprints match—meet the resemblance condition,—then the problem that the CPU has again crashed will be inferred. If more than one problem is associated with a given fingerprint, the problems may be ranked according to some statistical likelihood of reoccurrence. For example, if a certain problem has occurred ten times, and each time a fingerprint resembling a historical fingerprint was generated, then that problem is more likely to have reoccurred than a problem that only occurred three times in a similar situation. Furthermore, if more than one historical fingerprint matches the fingerprint, the problems associated with those fingerprints may also be ranked according to some statistical likelihood.
- In the illustrated embodiment, a determination is made 410 as to whether the problem inferred was actually correct. This determination may be performed by a technician or in some automated manner. If the inferred problem was the actual problem, then the weights used to compare component indicators are adjusted 412 to reflect the statistical significance of the relationship between some component indicators in the set and the problem. Thus, the monitoring system is able to learn and better predict problems in the future.
- In one embodiment of the invention, the weights may be adjusted by increasing or decreasing the existing weight for each monitor depending on the historical standard deviation average for that monitor. For example, if the monitor is outside its historical standard deviation average, a counter coutside may be increased by one. Similarly, if the monitor is inside its historical standard deviation average, a counter cinside may be increased by one. Then, if
otherwise if
Numerous other calculations may be performed for adjusting the weights. - If the inferred problem was not the actual problem, then the weights used to compare component indicators are adjusted 414 to reflect the possibly decreased statistical significance of some component indicators in the set and the problem. If the real cause of the problem is determined, it may be associated with the indicator set 416 for future reference at a time when the indicator has been stored as a historical indicator set. At first, the correlation between the problem and the indicator set may not be strong, but over time as the problem reoccurs and the weights get adjusted repeatedly, this correlation may increase significantly. For example, it may increase to a level where a new indicator set resembling the now historical indicator set means the same problem occurred with a 90 percent probability.
- If the indicator set did not resemble any of the historical indicator sets to a sufficient degree, then the set may be stored 418, and associated with a problem if one is found 420, as described above.
- The following example further demonstrates the problem inference process. In an on-line B2B ordering process, one task is to determine the correct pricing for an order line item. This generally may involve a synchronous method call through an Enterprise Application Integration (EAI) server that, in turn, invokes a transaction server method call in an Enterprise Resource Planning (ERP) system. The ERP system then accesses a database to retrieve the pricing information, perform appropriate calculations, and return the result. There may be two different historical fingerprints associated with poor response time in getting pricing information. The component indicators of one historical fingerprint may show correlation to CPU performance and memory utilization on the ERP server. The problem associated with this fingerprint may be: “Concurrent batch processes impacting real-time request performance.” The second historical fingerprint may show a correlation to the performance of a database using component indicators showing that the ERP CPU Input/Output wait time is abnormally high, the database server CPU utilization is abnormally low, the database server memory utilization is abnormally high, and the database cache hit ratio is lower than normal. This second fingerprint may be associated with a problem described by a user as: “Database is not performing well, necessary data not found in buffer cache.” If the new indicator set sufficiently resembles, i.e., matches, the one of the historical fingerprints, the problem described by the user may be inferred.
- An exemplary method of determining resemblance between indicator sets is now described with reference to
FIGS. 5A and 5B .FIG. 5A shows the indicator set 502 being compared with historical indicator set 552. Each dot 504 in the indicator sets represents a component indicator. For simplicity, in this figure the fingerprints only include nine components. The component indicators may be processed measurements. Alternatively, the component indicators may be binary indicators based on whether the components are outside a historical normal range. The corresponding locations within the fingerprint—dots—represent corresponding component indicators. Thus, the top-left component indicator is associated with the same monitor in both of the indicator sets. InFIG. 5A , anX 506 indicates that a component indicator is outside its normal range. This may be the result of the component indicator being above a certain threshold. - In
FIG. 5A , abox 508 around a component indicator dot 504 represents that the weight of the component indicator is above a certain threshold. These weights, used for comparing fingerprints, may be kept for each component indicator. The weights may be used to show the relevance of a component indicator in the indicator set. For example, the lower-left component indicator inFIG. 5A does not have a box around it. This may mean that this component indicator is not strongly correlated with a problem associated withfingerprint 552. For example, if this component indicator indicates CPU speed, and about half the time a particular problem associated withfingerprint 552 arises, the CPU speed is normal, and the other half of the time it is not normal, then the CPU speed appears to have little effect on whether the problem has occurred yet again. Thus, in one embodiment, the weight of that component indicator in that historical fingerprint may be low, indicating a low correlation. - Using the weights and the normal range thresholds described above, various conditions for resemblance may be formulated. As an example, one condition may be that if two or more component indicators that have weights above a threshold, i.e., have a box around them, are out of normal range, i.e., have X's, on both the indicator set and the historical indicator set, then the two indicator sets sufficiently resemble each other. According to this condition, indicator set 506 resembles historical indicator set 552 in
FIG. 5A .FIG. 5B shows an indicator set 514 that does not sufficiently resemble historical indicator set 552 according to the above resemblance condition. The above specification for resemblance is given by way of example only. There are many alternative ways in which such conditions may be defined or formulated. Any statistical comparison may be used to determine whether the indicator set resembles any of the historical indicator sets. - Some embodiments of the invention that may predict a future out-of-compliance condition before it occurs are now described with reference to
FIG. 6 . Since, the gross-level rule is not yet out of compliance at the time the monitor data is collected, these embodiments may be used to predict whether the gross-level rule is approaching an out-of-compliance condition. In one embodiment of the invention, the prediction process is periodically performed for each gross-level rule. Thus, the prediction process according to one embodiment of the invention may begin with theselection 602 of a particular gross-level rule. - Then, an indicator set for the selected gross-level rule may be generated 604 in any of the ways described above with reference to
FIG. 3 . Next, the indicator set is compared 606 to historical indicator sets in the manner described with reference toFIG. 4 . As discussed with reference toFIG. 4 , thecomparison 606 determines whether there is sufficient resemblance between the indicator set and at least one historical indicator set. This resemblance condition may be different from the resemblance condition used above with reference toFIG. 4 , or it may be the same resemblance condition. The historical indicator sets may have been generated at a time when the gross-level rule was out of compliance. If the indicator set does not sufficiently resemble any of the historical indicator sets, then no prediction is made about the gross-level rule. The process may begin again by aselection 602 of another gross-level rule. - However, if the indicator set does match, i.e., sufficiently resemble, a historical fingerprint, then it is predicted that the IT infrastructure may soon experience an out-of-compliance condition regarding the gross-level rule. That is, it is predicted 608 that the gross-level rule may soon be out of compliance. In one embodiment of the invention, the monitoring system may then infer 610 potential problems, fixes, or solutions associated with the particular historical fingerprint or fingerprints, similar to the process described with reference to
FIG. 4 . - In one embodiment of the invention, whenever a gross-level rule goes out of compliance, the indicator set generated at that time is stored and associated with the out of compliance condition. Indicator sets that are very similar may not need to be stored independently, but may be represented by weighing the historical indicator sets. For example, a historical indicator set whose statistical equivalents have been generated 90 percent of the time the gross-level rule went out of compliance may have a high weight. Statistical equivalents may be defined as those fingerprints judged sufficiently similar by the comparison process. Many alternative weighing schemes are possible that emphasize a historical indicator set. Stated another way, these weights may indicate the reliability or quality of the predictive indicator set. Such a high quality historical indicator set may be referred to as a predictive fingerprint. If fingerprints generated while the gross-level rule is in compliance begin to resemble the predictive fingerprint, the out-of-compliance condition associated with the fingerprint is predicted to be approaching.
- General Matters
- In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form.
- Embodiments of the present invention include various processes. The processes of the embodiments of the present invention may be performed by hardware components, or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes. Alternatively, the processes may be performed by a combination of hardware and software.
- Embodiments of the present invention may be provided as a computer program product, which may include a machine-readable medium having stored thereon instructions, which may be used to program a computer or other electronic device to perform a process according to an embodiment of the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions. Moreover, embodiments of the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
- In the description above, embodiments of the Indicator Engine has been described as receiving raw monitor data that it converts and processes into fingerprints. However, the monitor data may not be raw data, but may be pre-processed data. For example, the monitors may perform some or all of the statistical analysis needed to create the component indicators. Furthermore, the Indicator Engine need not reside on a single machine, but may be distributed throughout the IT infrastructure.
- In the description above, in some embodiments, a historical fingerprint associated with an out-of-compliance condition is described as having been generated when the corresponding gross-level rule was out of compliance. However, the historical fingerprint may have been generated at any time before or after the beginning of the out-of-compliance condition, so long as the temporal relation implies a sufficient statistical relation. Furthermore, historical fingerprint chains corresponding to the procession of fingerprints that led to the out-of-compliance condition may also be stored, and used in the comparisons to determine whether a fingerprint resembles historical fingerprints, or whether a fingerprint chain currently occurring resembles a historical fingerprint chain.
- While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
Claims (12)
1. A method comprising:
(A) aggregating data collected from a plurality of monitors deployed throughout an information technology (IT) infrastructure to form collective monitor patterns by
(1) receiving a plurality of component metrics, each component metric corresponding to a component of a plurality of components of the IT infrastructure and being associated with one or more gross-level rules of a plurality of gross-level rules indicative of expected or desired functioning of the IT infrastructure, and
(2) generating an indicator set representative of a current state of the IT infrastructure based on the received component metrics and relevant historical values of component metrics corresponding to the plurality of components; and
(B) predicting a future condition within the IT infrastructure, the predicted future condition representing (i) inference of a future irregularity or problem within the IT infrastructure or (ii) inference of a future out-of-compliance condition with respect to a gross-level rule of the plurality of gross-level rules, which has occurred in the past and based on the collective monitor patterns has a statistical likelihood of reoccurring relatively soon by
(1) comparing the indicator set to one or more historical indicator sets representative of past states of the IT infrastructure, and
(2) determining whether the indicator set has sufficient resemblance to a historical indicator set of the one or more historical indicator sets that is associated with a previously observed irregularity or problem within the IT infrastructure or a previously observed out-of-compliance condition of the gross-level rule.
2. The method of claim 1 , further comprising:
receiving information confirming whether the predicted future condition was in fact imminent; and
based on the received information, adjusting a plurality of weights used for comparing the indicator set to the one or more historical indicator sets, each weight being associated with one component indicator of the indicator set.
3. The method of claim 1 , further comprising causing information regarding the predicted future condition to be communicated to a technician having responsibility for maintaining the IT infrastructure.
4. The method of claim 1 , further comprising programmatically taking corrective action to remedy the predicted future condition.
5. The method of claim 2 , wherein said adjusting a plurality of weights comprises increasing the statistical significance of one or more component indicators of the historical indicator set responsive to receiving confirmation regarding correctness of the predicted future condition.
6. The method of claim 2 , wherein said adjusting a plurality of weights comprises decreasing the statistical significance of one or more component indicators of the historical indicator set responsive to receiving confirmation regarding incorrectness of the predicted future condition.
7. A method comprising:
maintaining a database of a plurality of historical fingerprints, each of the plurality of historical fingerprints corresponding to a previous observation of an out-of-compliance condition of a gross-level rule of a plurality of gross-level rules, each of the plurality of gross-level rules representing an expected or desired functioning of an information technology (IT) infrastructure, each of the plurality of historical finger prints associated with an indication of one or more problems that existed in the IT infrastructure when the historical fingerprint was generated and that were believed to have been responsible for or contributed to the corresponding out-of-compliance condition of the gross-level rule;
generating a current fingerprint representative of a current state of the IT infrastructure by gathering data from a plurality of component monitors regarding a plurality of components of the IT infrastructure and processing the gathered data with reference to relevant historical values the plurality of components, each of the plurality of components being associated with a particular gross-level rule of the plurality of gross-level rules; and
inferring a future out-of-compliance condition with respect to the particular gross-level rule by comparing the current fingerprint to the plurality of historical fingerprints and identifying a historical fingerprint of the plurality of historical fingerprints that sufficiently resembles the current fingerprint by performing a weighted comparison of a plurality of component indicators of the current fingerprint and a plurality of component indicators of one or more of the plurality of historical fingerprints.
8. The method of claim 7 , further comprising:
receiving information confirming whether the inferred future out-of-compliance condition was in fact imminent; and
based on the received information, adjusting a plurality of weights used for comparing the identified historical fingerprint to current fingerprints, each weight being associated with one component indicator of the plurality of component indicators of the identified historical fingerprint.
9. The method of claim 7 , further comprising causing information regarding the inferred future out-of-compliance condition to be communicated to a technician having responsibility for maintaining the IT infrastructure.
10. The method of claim 7 , further comprising programmatically taking corrective action to remedy the inferred future out-of-compliance condition.
11. The method of claim 8 , wherein said adjusting a plurality of weights comprises increasing the statistical significance of one or more component indicators of the identified historical fingerprint responsive to receiving confirmation regarding correctness of the inferred future out-of-compliance condition.
12. The method of claim 8 , wherein said adjusting a plurality of weights comprises decreasing the statistical significance of one or more component indicators of the identified historical fingerprint responsive to receiving confirmation regarding incorrectness of the inferred future out-of-compliance condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/530,477 US20070005761A1 (en) | 2001-04-07 | 2006-09-10 | Predictive monitoring and problem identification in an information technology (it) infrastructure |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US28199101P | 2001-04-07 | 2001-04-07 | |
US28236301P | 2001-04-07 | 2001-04-07 | |
US10/112,015 US7107339B1 (en) | 2001-04-07 | 2002-03-29 | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
US11/530,477 US20070005761A1 (en) | 2001-04-07 | 2006-09-10 | Predictive monitoring and problem identification in an information technology (it) infrastructure |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/112,015 Continuation US7107339B1 (en) | 2001-04-07 | 2002-03-29 | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070005761A1 true US20070005761A1 (en) | 2007-01-04 |
Family
ID=36951937
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/112,015 Expired - Lifetime US7107339B1 (en) | 2001-04-07 | 2002-03-29 | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
US11/530,477 Abandoned US20070005761A1 (en) | 2001-04-07 | 2006-09-10 | Predictive monitoring and problem identification in an information technology (it) infrastructure |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/112,015 Expired - Lifetime US7107339B1 (en) | 2001-04-07 | 2002-03-29 | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
Country Status (1)
Country | Link |
---|---|
US (2) | US7107339B1 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030145083A1 (en) * | 2001-11-16 | 2003-07-31 | Cush Michael C. | System and method for improving support for information technology through collecting, diagnosing and reporting configuration, metric, and event information |
US20080077358A1 (en) * | 2006-09-27 | 2008-03-27 | Marvasti Mazda A | Self-Learning Integrity Management System and Related Methods |
US20080077687A1 (en) * | 2006-09-27 | 2008-03-27 | Marvasti Mazda A | System and Method for Generating and Using Fingerprints for Integrity Management |
EP2139024A1 (en) | 2008-04-14 | 2009-12-30 | General Electric Company | Methods for preventing or reducing Helium leakage through metal halide lamp envelopes |
US20100046809A1 (en) * | 2008-08-19 | 2010-02-25 | Marvasti Mazda A | System and Method For Correlating Fingerprints For Automated Intelligence |
US20120151277A1 (en) * | 2010-12-14 | 2012-06-14 | Electronics And Telecommunications Research Institute | Web service information processing method and web service compositing method and appartus using the same |
US8392558B1 (en) * | 2011-03-22 | 2013-03-05 | Amazon Technologies, Inc. | System and method for determining overload state for service requests |
US8812586B1 (en) * | 2011-02-15 | 2014-08-19 | Google Inc. | Correlating status information generated in a computer network |
US8874888B1 (en) | 2011-01-13 | 2014-10-28 | Google Inc. | Managed boot in a cloud system |
US8966198B1 (en) | 2011-09-01 | 2015-02-24 | Google Inc. | Providing snapshots of virtual storage devices |
WO2019161461A1 (en) * | 2018-02-26 | 2019-08-29 | OverIP | A method and system for monitoring the status of an it infrastructure |
US11200103B2 (en) | 2018-10-26 | 2021-12-14 | International Business Machines Corporation | Using a machine learning module to perform preemptive identification and reduction of risk of failure in computational systems |
US11200142B2 (en) | 2018-10-26 | 2021-12-14 | International Business Machines Corporation | Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module |
Families Citing this family (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7107339B1 (en) * | 2001-04-07 | 2006-09-12 | Webmethods, Inc. | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
US20020178035A1 (en) * | 2001-05-22 | 2002-11-28 | Lajouanie Yves Patrick | Performance management system and method |
US20040064731A1 (en) * | 2002-09-26 | 2004-04-01 | Nguyen Timothy Thien-Kiem | Integrated security administrator |
JP4089427B2 (en) * | 2002-12-26 | 2008-05-28 | 株式会社日立製作所 | Management system, management computer, management method and program |
US7293042B2 (en) * | 2003-05-12 | 2007-11-06 | Sun Microsystems, Inc. | Managing and predicting component failure based on pattern recognition of subcomponent exposure to failure |
US7933814B2 (en) * | 2003-09-26 | 2011-04-26 | Hewlett-Packard Development Company, L.P. | Method and system to determine if a composite service level agreement (SLA) can be met |
US7761874B2 (en) * | 2004-08-13 | 2010-07-20 | Intel Corporation | Managing processing system power and performance based on utilization trends |
US7457722B1 (en) | 2004-11-17 | 2008-11-25 | Symantec Operating Corporation | Correlation of application instance life cycle events in performance monitoring |
US20060282525A1 (en) * | 2005-06-10 | 2006-12-14 | Giles James R | Method and apparatus for delegating responses to conditions in computing systems |
US20070016824A1 (en) * | 2005-07-14 | 2007-01-18 | International Business Machines Corporation | Methods and apparatus for global systems management |
US8782201B2 (en) * | 2005-10-28 | 2014-07-15 | Bank Of America Corporation | System and method for managing the configuration of resources in an enterprise |
US20070106784A1 (en) * | 2005-11-08 | 2007-05-10 | Dickman David T | Systems, methods and apparatus to identify network maintenance zones |
US20070112951A1 (en) * | 2005-11-14 | 2007-05-17 | Fung Joseph B K | Automatic website workload management |
US8516104B1 (en) * | 2005-12-22 | 2013-08-20 | At&T Intellectual Property Ii, L.P. | Method and apparatus for detecting anomalies in aggregated traffic volume data |
US8230505B1 (en) | 2006-08-11 | 2012-07-24 | Avaya Inc. | Method for cooperative intrusion prevention through collaborative inference |
US8347268B2 (en) * | 2006-10-13 | 2013-01-01 | Infosys Limited | Automated performance monitoring |
US20080208647A1 (en) * | 2007-02-28 | 2008-08-28 | Dale Hawley | Information Technologies Operations Performance Benchmarking |
EP1983477A1 (en) * | 2007-04-16 | 2008-10-22 | Hewlett-Packard Development Company, L.P. | Apparatus and method for processing management information |
US8140454B2 (en) * | 2007-12-28 | 2012-03-20 | Software Ag | Systems and/or methods for prediction and/or root cause analysis of events based on business activity monitoring related data |
US20100235306A1 (en) * | 2008-08-11 | 2010-09-16 | Seth Wagoner | Adaptive timelog system |
US9858551B2 (en) * | 2011-09-02 | 2018-01-02 | Bbs Technologies, Inc. | Ranking analysis results based on user perceived problems in a database system |
US9646054B2 (en) | 2011-09-21 | 2017-05-09 | Hewlett Packard Enterprise Development Lp | Matching of cases based on attributes including an attribute relating to flow of activities |
US10404615B2 (en) | 2012-02-14 | 2019-09-03 | Airwatch, Llc | Controlling distribution of resources on a network |
US9680763B2 (en) | 2012-02-14 | 2017-06-13 | Airwatch, Llc | Controlling distribution of resources in a network |
US10102097B2 (en) * | 2012-07-12 | 2018-10-16 | International Business Machines Corporation | Transaction server performance monitoring using component performance data |
US20140280955A1 (en) | 2013-03-14 | 2014-09-18 | Sky Socket, Llc | Controlling Electronically Communicated Resources |
US10754966B2 (en) * | 2013-04-13 | 2020-08-25 | Airwatch Llc | Time-based functionality restrictions |
US9219741B2 (en) | 2013-05-02 | 2015-12-22 | Airwatch, Llc | Time-based configuration policy toggling |
CN105164647A (en) | 2013-06-20 | 2015-12-16 | 惠普发展公司,有限责任合伙企业 | Generating a fingerprint representing a response of an application to a simulation of a fault of an external service |
US9547834B2 (en) | 2014-01-08 | 2017-01-17 | Bank Of America Corporation | Transaction performance monitoring |
US9082282B1 (en) * | 2014-01-28 | 2015-07-14 | Domo, Inc. | Determining usefulness of a data alert |
US20150271026A1 (en) * | 2014-03-24 | 2015-09-24 | Microsoft Technology Licensing, Llc | End user performance analysis |
CN106302595B (en) | 2015-06-02 | 2020-03-17 | 阿里巴巴集团控股有限公司 | Method and equipment for carrying out health check on server |
EP3128466A1 (en) * | 2015-08-05 | 2017-02-08 | Wipro Limited | System and method for predicting an event in an information technology infrastructure |
US10621602B2 (en) * | 2015-09-22 | 2020-04-14 | Adobe Inc. | Reinforcement machine learning for personalized intelligent alerting |
US10366367B2 (en) | 2016-02-24 | 2019-07-30 | Bank Of America Corporation | Computerized system for evaluating and modifying technology change events |
US10366337B2 (en) | 2016-02-24 | 2019-07-30 | Bank Of America Corporation | Computerized system for evaluating the likelihood of technology change incidents |
US10387230B2 (en) * | 2016-02-24 | 2019-08-20 | Bank Of America Corporation | Technical language processor administration |
US10275182B2 (en) | 2016-02-24 | 2019-04-30 | Bank Of America Corporation | System for categorical data encoding |
US10430743B2 (en) | 2016-02-24 | 2019-10-01 | Bank Of America Corporation | Computerized system for simulating the likelihood of technology change incidents |
US10275183B2 (en) | 2016-02-24 | 2019-04-30 | Bank Of America Corporation | System for categorical data dynamic decoding |
US10366338B2 (en) | 2016-02-24 | 2019-07-30 | Bank Of America Corporation | Computerized system for evaluating the impact of technology change incidents |
US10067984B2 (en) * | 2016-02-24 | 2018-09-04 | Bank Of America Corporation | Computerized system for evaluating technology stability |
US10223425B2 (en) * | 2016-02-24 | 2019-03-05 | Bank Of America Corporation | Operational data processor |
US10216798B2 (en) * | 2016-02-24 | 2019-02-26 | Bank Of America Corporation | Technical language processor |
US10019486B2 (en) * | 2016-02-24 | 2018-07-10 | Bank Of America Corporation | Computerized system for analyzing operational event data |
US10489225B2 (en) | 2017-08-10 | 2019-11-26 | Bank Of America Corporation | Automatic resource dependency tracking and structure for maintenance of resource fault propagation |
CN109471783B (en) * | 2017-09-08 | 2022-07-05 | 北京京东尚科信息技术有限公司 | Method and device for predicting task operation parameters |
US11089084B2 (en) * | 2018-07-24 | 2021-08-10 | Machine Cover, Inc. | Website failure analysis |
US11196614B2 (en) | 2019-07-26 | 2021-12-07 | Cisco Technology, Inc. | Network issue tracking and resolution system |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5974457A (en) * | 1993-12-23 | 1999-10-26 | International Business Machines Corporation | Intelligent realtime monitoring of data traffic |
US5987442A (en) * | 1995-02-02 | 1999-11-16 | Cabeltron Systems, Inc. | Method and apparatus for learning network behavior trends and predicting future behavior of communications networks |
US6269401B1 (en) * | 1998-08-28 | 2001-07-31 | 3Com Corporation | Integrated computer system and network performance monitoring |
US6272110B1 (en) * | 1997-10-10 | 2001-08-07 | Nortel Networks Limited | Method and apparatus for managing at least part of a communications network |
US6308174B1 (en) * | 1998-05-05 | 2001-10-23 | Nortel Networks Limited | Method and apparatus for managing a communications network by storing management information about two or more configuration states of the network |
US6317786B1 (en) * | 1998-05-29 | 2001-11-13 | Webspective Software, Inc. | Web service |
US20010052087A1 (en) * | 1998-04-27 | 2001-12-13 | Atul R. Garg | Method and apparatus for monitoring a network environment |
US6393387B1 (en) * | 1998-03-06 | 2002-05-21 | Perot Systems Corporation | System and method for model mining complex information technology systems |
US6446123B1 (en) * | 1999-03-31 | 2002-09-03 | Nortel Networks Limited | Tool for monitoring health of networks |
US20020173997A1 (en) * | 2001-03-30 | 2002-11-21 | Cody Menard | System and method for business systems transactions and infrastructure management |
US6553403B1 (en) * | 1998-06-03 | 2003-04-22 | International Business Machines Corporation | System, method and computer program product for monitoring in a distributed computing environment |
US20030088663A1 (en) * | 1996-07-18 | 2003-05-08 | Reuven Battat | Method and apparatus for predictively and graphically administering a network system in a time dimension |
US6578077B1 (en) * | 1997-05-27 | 2003-06-10 | Novell, Inc. | Traffic monitoring tool for bandwidth management |
US20030149754A1 (en) * | 2002-02-06 | 2003-08-07 | Adtran, Inc. | System and method for managing elements of a communication network |
US20030202112A1 (en) * | 2002-04-30 | 2003-10-30 | Kevin Bowman | System and method for active call monitoring |
US6684120B1 (en) * | 1998-12-03 | 2004-01-27 | Bridgestone Corporation | Method of and device for collecting and combining FA information |
US6738811B1 (en) * | 2000-03-31 | 2004-05-18 | Supermicro Computer, Inc. | Method and architecture for monitoring the health of servers across data networks |
US20040128370A1 (en) * | 2002-12-31 | 2004-07-01 | Kris Kortright | System and method for synchronizing the configuration of distributed network management applications |
US6804714B1 (en) * | 1999-04-16 | 2004-10-12 | Oracle International Corporation | Multidimensional repositories for problem discovery and capacity planning of database applications |
US7107339B1 (en) * | 2001-04-07 | 2006-09-12 | Webmethods, Inc. | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7003781B1 (en) | 2000-05-05 | 2006-02-21 | Bristol Technology Inc. | Method and apparatus for correlation of events in a distributed multi-system computing environment |
-
2002
- 2002-03-29 US US10/112,015 patent/US7107339B1/en not_active Expired - Lifetime
-
2006
- 2006-09-10 US US11/530,477 patent/US20070005761A1/en not_active Abandoned
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5974457A (en) * | 1993-12-23 | 1999-10-26 | International Business Machines Corporation | Intelligent realtime monitoring of data traffic |
US5987442A (en) * | 1995-02-02 | 1999-11-16 | Cabeltron Systems, Inc. | Method and apparatus for learning network behavior trends and predicting future behavior of communications networks |
US20030088663A1 (en) * | 1996-07-18 | 2003-05-08 | Reuven Battat | Method and apparatus for predictively and graphically administering a network system in a time dimension |
US6578077B1 (en) * | 1997-05-27 | 2003-06-10 | Novell, Inc. | Traffic monitoring tool for bandwidth management |
US6272110B1 (en) * | 1997-10-10 | 2001-08-07 | Nortel Networks Limited | Method and apparatus for managing at least part of a communications network |
US6393387B1 (en) * | 1998-03-06 | 2002-05-21 | Perot Systems Corporation | System and method for model mining complex information technology systems |
US20010052087A1 (en) * | 1998-04-27 | 2001-12-13 | Atul R. Garg | Method and apparatus for monitoring a network environment |
US6308174B1 (en) * | 1998-05-05 | 2001-10-23 | Nortel Networks Limited | Method and apparatus for managing a communications network by storing management information about two or more configuration states of the network |
US6317786B1 (en) * | 1998-05-29 | 2001-11-13 | Webspective Software, Inc. | Web service |
US6553403B1 (en) * | 1998-06-03 | 2003-04-22 | International Business Machines Corporation | System, method and computer program product for monitoring in a distributed computing environment |
US6269401B1 (en) * | 1998-08-28 | 2001-07-31 | 3Com Corporation | Integrated computer system and network performance monitoring |
US6684120B1 (en) * | 1998-12-03 | 2004-01-27 | Bridgestone Corporation | Method of and device for collecting and combining FA information |
US6446123B1 (en) * | 1999-03-31 | 2002-09-03 | Nortel Networks Limited | Tool for monitoring health of networks |
US6804714B1 (en) * | 1999-04-16 | 2004-10-12 | Oracle International Corporation | Multidimensional repositories for problem discovery and capacity planning of database applications |
US6738811B1 (en) * | 2000-03-31 | 2004-05-18 | Supermicro Computer, Inc. | Method and architecture for monitoring the health of servers across data networks |
US20020173997A1 (en) * | 2001-03-30 | 2002-11-21 | Cody Menard | System and method for business systems transactions and infrastructure management |
US7107339B1 (en) * | 2001-04-07 | 2006-09-12 | Webmethods, Inc. | Predictive monitoring and problem identification in an information technology (IT) infrastructure |
US20030149754A1 (en) * | 2002-02-06 | 2003-08-07 | Adtran, Inc. | System and method for managing elements of a communication network |
US20030202112A1 (en) * | 2002-04-30 | 2003-10-30 | Kevin Bowman | System and method for active call monitoring |
US20040128370A1 (en) * | 2002-12-31 | 2004-07-01 | Kris Kortright | System and method for synchronizing the configuration of distributed network management applications |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030145083A1 (en) * | 2001-11-16 | 2003-07-31 | Cush Michael C. | System and method for improving support for information technology through collecting, diagnosing and reporting configuration, metric, and event information |
US7801703B2 (en) | 2006-09-27 | 2010-09-21 | Integrien Corporation | Self-learning integrity management system and related methods |
US20080077687A1 (en) * | 2006-09-27 | 2008-03-27 | Marvasti Mazda A | System and Method for Generating and Using Fingerprints for Integrity Management |
US7467067B2 (en) | 2006-09-27 | 2008-12-16 | Integrien Corporation | Self-learning integrity management system and related methods |
US20090063390A1 (en) * | 2006-09-27 | 2009-03-05 | Marvasti Mazda A | Self-learning integrity management system and related methods |
US7707285B2 (en) | 2006-09-27 | 2010-04-27 | Integrien Corporation | System and method for generating and using fingerprints for integrity management |
US20100131645A1 (en) * | 2006-09-27 | 2010-05-27 | Marvasti Mazda A | System and method for generating and using fingerprints for integrity management |
US20100318487A1 (en) * | 2006-09-27 | 2010-12-16 | Marvasti Mazda A | Self-learning integrity management system and related methods |
US8060342B2 (en) | 2006-09-27 | 2011-11-15 | Integrien Corporation | Self-learning integrity management system and related methods |
US20080077358A1 (en) * | 2006-09-27 | 2008-03-27 | Marvasti Mazda A | Self-Learning Integrity Management System and Related Methods |
US8266279B2 (en) | 2006-09-27 | 2012-09-11 | Vmware, Inc. | System and method for generating and using fingerprints for integrity management |
EP2139024A1 (en) | 2008-04-14 | 2009-12-30 | General Electric Company | Methods for preventing or reducing Helium leakage through metal halide lamp envelopes |
US20100046809A1 (en) * | 2008-08-19 | 2010-02-25 | Marvasti Mazda A | System and Method For Correlating Fingerprints For Automated Intelligence |
US8631117B2 (en) | 2008-08-19 | 2014-01-14 | Vmware, Inc. | System and method for correlating fingerprints for automated intelligence |
US20120151277A1 (en) * | 2010-12-14 | 2012-06-14 | Electronics And Telecommunications Research Institute | Web service information processing method and web service compositing method and appartus using the same |
US8874888B1 (en) | 2011-01-13 | 2014-10-28 | Google Inc. | Managed boot in a cloud system |
US8812586B1 (en) * | 2011-02-15 | 2014-08-19 | Google Inc. | Correlating status information generated in a computer network |
US9794144B1 (en) * | 2011-02-15 | 2017-10-17 | Google Inc. | Correlating status information generated in a computer network |
US8392558B1 (en) * | 2011-03-22 | 2013-03-05 | Amazon Technologies, Inc. | System and method for determining overload state for service requests |
US8966198B1 (en) | 2011-09-01 | 2015-02-24 | Google Inc. | Providing snapshots of virtual storage devices |
US9251234B1 (en) | 2011-09-01 | 2016-02-02 | Google Inc. | Providing snapshots of virtual storage devices |
US9501233B2 (en) | 2011-09-01 | 2016-11-22 | Google Inc. | Providing snapshots of virtual storage devices |
WO2019161461A1 (en) * | 2018-02-26 | 2019-08-29 | OverIP | A method and system for monitoring the status of an it infrastructure |
US11200103B2 (en) | 2018-10-26 | 2021-12-14 | International Business Machines Corporation | Using a machine learning module to perform preemptive identification and reduction of risk of failure in computational systems |
US11200142B2 (en) | 2018-10-26 | 2021-12-14 | International Business Machines Corporation | Perform preemptive identification and reduction of risk of failure in computational systems by training a machine learning module |
Also Published As
Publication number | Publication date |
---|---|
US7107339B1 (en) | 2006-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7107339B1 (en) | Predictive monitoring and problem identification in an information technology (IT) infrastructure | |
US7761730B2 (en) | Determination of impact of a failure of a component for one or more services | |
US7761745B2 (en) | Clustering process for software server failure prediction | |
US6792456B1 (en) | Systems and methods for authoring and executing operational policies that use event rates | |
US7076397B2 (en) | System and method for statistical performance monitoring | |
US20190378073A1 (en) | Business-Aware Intelligent Incident and Change Management | |
US8725844B2 (en) | Method and system for adjusting the relative value of system configuration recommendations | |
CN112162878B (en) | Database fault discovery method and device, electronic equipment and storage medium | |
Cohen et al. | Capturing, indexing, clustering, and retrieving system history | |
Jin et al. | Nevermind, the problem is already fixed: proactively detecting and troubleshooting customer dsl problems | |
KR101021411B1 (en) | Self-learning method and system for detecting abnormalities | |
US6397359B1 (en) | Methods, systems and computer program products for scheduled network performance testing | |
US7051244B2 (en) | Method and apparatus for managing incident reports | |
US8380838B2 (en) | Reduction of alerts in information technology systems | |
CN1604040B (en) | Dynamic transaction control method and system within a host transaction processing system | |
US20200366583A1 (en) | Method and apparatus for monitoring bandwidth condition | |
US7467145B1 (en) | System and method for analyzing processes | |
US7783605B2 (en) | Calculating cluster availability | |
US20060026467A1 (en) | Method and apparatus for automatically discovering of application errors as a predictive metric for the functional health of enterprise applications | |
US20020173997A1 (en) | System and method for business systems transactions and infrastructure management | |
CN101632093A (en) | Be used to use statistical analysis to come the system and method for management of performance fault | |
CN112162907A (en) | Health degree evaluation method based on monitoring index data | |
Tang et al. | An integrated framework for optimizing automatic monitoring systems in large IT infrastructures | |
US7469287B1 (en) | Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects | |
KR101365170B1 (en) | Management server, method for calculating service continuity score and management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |