US20070034206A1 - Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system - Google Patents

Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system Download PDF

Info

Publication number
US20070034206A1
US20070034206A1 US11/203,361 US20336105A US2007034206A1 US 20070034206 A1 US20070034206 A1 US 20070034206A1 US 20336105 A US20336105 A US 20336105A US 2007034206 A1 US2007034206 A1 US 2007034206A1
Authority
US
United States
Prior art keywords
electronic system
response
steady
multiparametric
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/203,361
Inventor
Aleksey Urmanov
Anton Bougaev
Kenny Gross
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority to US11/203,361 priority Critical patent/US20070034206A1/en
Assigned to SUN MICROSYSTEMS, INC. reassignment SUN MICROSYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BOUGAEV, ANTON A., GROSS, KENNY C., URMANOV, ALEKSY M.
Priority to PCT/US2006/025936 priority patent/WO2007021389A2/en
Publication of US20070034206A1 publication Critical patent/US20070034206A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/24Marginal checking or other specified testing methods not covered by G06F11/26, e.g. race tests

Definitions

  • the present invention relates to techniques for enhancing reliability within computer systems. More specifically, the present invention relates to a method and an apparatus for proactively monitoring computer system components for faults by using telemetric impulsional response fingerprints.
  • One approach to address this problem is to monitor all aspects of a customer's data center and to send the monitored signals to a central monitoring center. This enables system administrators at the monitoring center to identify problematic discrepancies in system performance parameters and, if necessary, to direct service personnel to handle discrepancies more efficiently.
  • One embodiment of the present invention provides a system for generating telemetric impulsional response fingerprints for an electronic system.
  • the system operates by first determining a steady-state response of the electronic system under specified initial conditions. Next, the system introduces a sudden impulse step change to a parameter of the electronic system and then measures the dynamic response of the electronic system to the sudden impulse step change. The system then generates a multiparametric representation from the steady-state response and the dynamic response wherein the multiparametric representation simultaneously displays the steady-state response and the dynamic response.
  • determining the steady-state response of the electronic system involves making measurements using a continuous system telemetry harness.
  • determining the steady-state response of the electronic system involves monitoring temperature, voltage, current, and/or vibration at multiple points within the electronic system.
  • introducing the sudden impulse step change involves changing a load, a temperature, a voltage, and/or a vibration within the electronic system.
  • measuring the dynamic response of the electronic system involves normalizing the dynamic response for measured system parameters.
  • generating the multiparametric representation involves creating a Kiviat diagram, which displays both the steady state response and the dynamic response.
  • the system detects incipient problems in the electronic system by comparing the multiparametric response representation with a standard multiparametric representation derived from a known good electronic system.
  • FIG. 1 illustrates an electronic system under test in accordance with an embodiment of the present invention.
  • FIG. 2 presents a flowchart illustrating the process of generating a multiparametric response representation in accordance with an embodiment of the present invention.
  • FIG. 3 illustrates a normalized temperature response in time to a step voltage change in accordance with an embodiment of the present invention.
  • FIG. 4 illustrates a multiparametric response representation for a voltage step change in accordance with an embodiment of the present invention.
  • a computer readable storage medium which may be any device or medium that can store code and/or data for use by a computer system.
  • the transmission medium may include a communications network, such as the Internet.
  • FRU computer field replaceable unit
  • TIRF telemetric impulsional response fingerprints
  • the TIRF provides a new and unique “active probe” machine-learning technique that leverages continuous system telemetry to provide dynamic, multivariate “fingerprints” for FRUs that can be (1) compared with previous TIRFs for the same FRU, or (2) compared with TIRFs for “Golden FRUs” generated from FRUs that are certified to be operating nearly perfectly. These fingerprints can be used to recognize very subtle failure precursors, such as aging processes, degrading sensors, delamination of bonded components, solder-joint cracking, deterioration of socket connectors, and other mechanisms that may not show up during conventional ongoing reliability testing (ORT) or reliability quality testing (RQT) test sequences.
  • ORT ongoing reliability testing
  • RQT reliability quality testing
  • a continuous system telemetry harness (CSTH) has been developed (see U.S. patent application Ser. No. 10/272,680 entitled “Method and Apparatus for Monitoring and Recording Computer System Performance Parameters,” filed 17 Oct. 2002).
  • the CSTH monitors temperatures, voltages, and currents throughout a system, as well as some discrete performance metrics extracted from the operating system.
  • the CSTH provides signals that can be used to enhance root cause analysis (RCA) following system failures. The signals can also be monitored in real-time for early warning of the onset of problems.
  • RCA root cause analysis
  • the telemetry is passive and does not disturb the monitored system in any way.
  • the present invention leverages the CSTH while extending significantly the range of its diagnostic coverage with a new dynamic probe technique.
  • the new dynamic probe technique described below provides a wealth of diagnostic information relating to the health of components, FRUs, and integrated systems.
  • the system described herein generates a TIRF for an FRU by:
  • the TIRF provides a unique and concise representation of the dynamic response of the FRU to a controlled perturbation under specified initial conditions.
  • the TIRF can be represented as a vector of signal values collected from sensors, arranged in a specific order, and normalized to represent the post-perturbation vs. pre-perturbation behavior as a multivariate “fingerprint” for that FRU.
  • the TIRF can be plotted in Kiviat diagram format as a human visualization aid to very readily highlight exactly where any problems appear.
  • the TIRF provides a dynamic perturbation response signature of a given FRU under unified values of initial conditions. Note that each FRU can have several TIRFs corresponding to different types of perturbations.
  • An FRU TIRF is a very concise multivariate descriptor of a given FRU under specified conditions. Moreover, the collection of TIRFs for FRUs may have great potential to increase availability of complex enterprise servers. Along with standard long-duration online tests of FRUs in ORT and RQT, TIRFs can be generated very quickly to represent important diagnostic information about the FRUs dynamic operability and, in most cases, can be obtained without taking the FRU out of service.
  • FIG. 1 illustrates an electronic system under test 102 in accordance with an embodiment of the present invention.
  • electronic system under test 102 receives soft variables 103 , physical variables 104 , and canary variables 105 .
  • electronic system under test 102 produces monitored output 106 .
  • Post-measurement operations 108 are performed on monitored output 106 to generate multiparametric response representation 110 .
  • Soft variables 103 can include metrics such as load, throughput, and transaction latencies. These variables are typically derived from the operating system of electronic system under test 102 .
  • Physical variables 104 include temperature, voltage, current, and vibration within electronic system under test 102 .
  • Canary variables 105 include synthetic user-transactions and quality of performance values for these synthetic transactions.
  • the testing methodology involves first establishing a steady-state for soft variables 103 , physical variables 104 , and canary variables 105 . After the steady state has been established, the system takes a pre-perturbation snapshot of the system parameters. Next, the system applies a sudden impulse change to one or more of the variables. For example, one or more voltages applied to the system can be changed, or the load applied from the canary variables might be stepped to a maximum value to stress the system.
  • the system measures the dynamic response of electronic system under test 102 , and takes a post-perturbation snapshot of the system parameters.
  • the system uses the pre-perturbation and the post-perturbation parameters to generate multiparametric response representation.
  • This representation can be in the form of a Kiviat diagram.
  • FIG. 2 presents a flowchart illustrating the process of generating a multiparametric response representation in accordance with an embodiment of the present invention.
  • the system starts by taking a pre-perturbation snapshot of system parameters after the system has been allowed to reach a steady-state (step 202 ).
  • the system introduces a sudden impulse step change to one or more of the variables (step 204 ).
  • the system measures the dynamic response of electronic system under test 102 (step 206 ).
  • the system then takes a post-perturbation snapshot of the system parameters (step 208 ).
  • the system generates a multiparametric response representation (a Kiviat diagram) from the pre-perturbation snapshot and the post perturbation snapshot (step 210 ).
  • Kiviat diagram can then be compared with a previous Kiviat diagram taken from electronic system under test 102 or it can be compared with a Kiviat diagram that was generated from a known good electronic system to determine if electronic system under test 102 has any incipient failures.
  • FIG. 3 illustrates a normalized temperature response in time to a step voltage change in accordance with an embodiment of the present invention.
  • the upper chart in FIG. 3 illustrates a normalized step voltage change in several voltages applied to, for example, a system board within a computer system.
  • the lower chart in FIG. 3 illustrates the normalized temperature change at various points related to the system board in response to these step voltage changes. Note that while useful, these charts, which plot normalized parameter changes with respect to time, can be difficult to interpret.
  • FIG. 4 illustrates a multiparametric response representation (Kiviat diagram) of a response to a voltage step change in accordance with an embodiment of the present invention.
  • the Kiviat diagrams in FIG. 4 represent the same inputs and responses illustrated in FIG. 3 above.
  • the upper diagram in FIG. 4 represents the step voltage change, while the lower diagram represents the temperature response to the step voltage change.
  • the inner and outer polygons in the Kiviat diagrams represent the minimum and maximum values for the monitored parameters. These polygons can be compared with polygons from previous test of the same electronic system, or can be compared with polygons from a test on a known good system to determine if there exist any incipient failures within the electronic system.

Abstract

One embodiment of the present invention provides a system for generating telemetric impulsional response fingerprints for an electronic system. The system operates by first determining a steady-state response of the electronic system under specified initial conditions. Next, the system introduces a sudden impulse step change to a parameter of the electronic system and then measures the dynamic response of the electronic system to the sudden impulse step change. The system then generates a multiparametric representation from the steady-state response and the dynamic response wherein the multiparametric representation simultaneously displays the steady-state response and the dynamic response.

Description

    RELATED APPLICATION
  • The subject matter of this application is related to the subject matter in a co-pending non-provisional application by Kenny C. Gross and Lawrence G. Votta Jr. entitled, “Method and Apparatus for Monitoring and Recording Computer System Performance Parameters,” having Ser. No. 10/272,680 and filing date 17 Oct. 2002, which is incorporated herein by reference; and to the subject matter in a co-pending non-provisional application by Kenny C. Gross, Lawrence G. Votta Jr., and Adam Porter entitled, “Detecting and Correcting a Failure Sequence in a Computer System Before a Failure Occurs,” having Ser. No. 10/777,532 and filing date 11 Feb. 2004, which is incorporated herein by reference.
  • BACKGROUND
  • 1. Field of the Invention
  • The present invention relates to techniques for enhancing reliability within computer systems. More specifically, the present invention relates to a method and an apparatus for proactively monitoring computer system components for faults by using telemetric impulsional response fingerprints.
  • 2. Related Art
  • As electronic commerce grows increasingly more prevalent, businesses are increasingly relying on enterprise computing systems to process ever-larger volumes of electronic transactions. A failure in one of these enterprise computing systems can be disastrous, potentially resulting in millions of dollars of lost business. More importantly, a failure can seriously undermine consumer confidence in a business, making customers less likely to purchase goods and services from the business. Hence, it is critically important to ensure high availability in such enterprise computing systems.
  • To achieve high availability in enterprise computing systems it is necessary to be able to capture unambiguous diagnostic information that can quickly pinpoint the source of defects in hardware or software. If systems have too little event monitoring, when problems crop up at a customer site, service engineers may be unable to quickly identify the source of the problem. This can lead to increased down time, which can adversely impact customer satisfaction and loyalty.
  • One approach to address this problem is to monitor all aspects of a customer's data center and to send the monitored signals to a central monitoring center. This enables system administrators at the monitoring center to identify problematic discrepancies in system performance parameters and, if necessary, to direct service personnel to handle discrepancies more efficiently.
  • Existing continuous telemetry systems perform proactive fault monitoring of computer systems through passive surveillance, which does not impact the monitored system in any way. This approach can catch many types of faults. However, there are other latent faults that may appear only during dynamic stimulation. An analogy of these latent faults is a car that may have a problem with acceleration. The problem may not reveal itself during idling or while cruising at a uniform speed.
  • Hence, what is needed is a method and an apparatus for proactive fault monitoring a computer system without the shortcomings described above.
  • SUMMARY
  • One embodiment of the present invention provides a system for generating telemetric impulsional response fingerprints for an electronic system. The system operates by first determining a steady-state response of the electronic system under specified initial conditions. Next, the system introduces a sudden impulse step change to a parameter of the electronic system and then measures the dynamic response of the electronic system to the sudden impulse step change. The system then generates a multiparametric representation from the steady-state response and the dynamic response wherein the multiparametric representation simultaneously displays the steady-state response and the dynamic response.
  • In a variation of this embodiment, determining the steady-state response of the electronic system involves making measurements using a continuous system telemetry harness.
  • In a further variation, determining the steady-state response of the electronic system involves monitoring temperature, voltage, current, and/or vibration at multiple points within the electronic system.
  • In a further variation, introducing the sudden impulse step change involves changing a load, a temperature, a voltage, and/or a vibration within the electronic system.
  • In a further variation, measuring the dynamic response of the electronic system involves normalizing the dynamic response for measured system parameters.
  • In a further variation, generating the multiparametric representation involves creating a Kiviat diagram, which displays both the steady state response and the dynamic response.
  • In a further variation, the system detects incipient problems in the electronic system by comparing the multiparametric response representation with a standard multiparametric representation derived from a known good electronic system.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 illustrates an electronic system under test in accordance with an embodiment of the present invention.
  • FIG. 2 presents a flowchart illustrating the process of generating a multiparametric response representation in accordance with an embodiment of the present invention.
  • FIG. 3 illustrates a normalized temperature response in time to a step voltage change in accordance with an embodiment of the present invention.
  • FIG. 4 illustrates a multiparametric response representation for a voltage step change in accordance with an embodiment of the present invention.
  • DETAILED DESCRIPTION
  • The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
  • The data structures and code described in this detailed description are typically stored on a computer readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) and DVDs (digital versatile discs or digital video discs), and computer instruction signals embodied in a transmission medium (with or without a carrier wave upon which the signals are modulated). For example, the transmission medium may include a communications network, such as the Internet.
  • Overview
  • Research has shown that a computer field replaceable unit (FRU) can exhibit a wide range of subtle, incipient problems which can be amplified and easily spotted if one examines the dynamic response of the FRU just before and just after a well defined dynamic-stimulus perturbation. The present invention provides a method and apparatus for creating telemetric impulsional response fingerprints (TIRF) which can be used to detect such incipient problems, and to thereby enhance reliability, availability, and serviceability of enterprise computer systems.
  • The TIRF provides a new and unique “active probe” machine-learning technique that leverages continuous system telemetry to provide dynamic, multivariate “fingerprints” for FRUs that can be (1) compared with previous TIRFs for the same FRU, or (2) compared with TIRFs for “Golden FRUs” generated from FRUs that are certified to be operating nearly perfectly. These fingerprints can be used to recognize very subtle failure precursors, such as aging processes, degrading sensors, delamination of bonded components, solder-joint cracking, deterioration of socket connectors, and other mechanisms that may not show up during conventional ongoing reliability testing (ORT) or reliability quality testing (RQT) test sequences.
  • A continuous system telemetry harness (CSTH) has been developed (see U.S. patent application Ser. No. 10/272,680 entitled “Method and Apparatus for Monitoring and Recording Computer System Performance Parameters,” filed 17 Oct. 2002). The CSTH monitors temperatures, voltages, and currents throughout a system, as well as some discrete performance metrics extracted from the operating system. The CSTH provides signals that can be used to enhance root cause analysis (RCA) following system failures. The signals can also be monitored in real-time for early warning of the onset of problems.
  • For the above listed types of reactive and proactive surveillance techniques for FRUs and systems, the telemetry is passive and does not disturb the monitored system in any way.
  • The present invention leverages the CSTH while extending significantly the range of its diagnostic coverage with a new dynamic probe technique. The new dynamic probe technique described below provides a wealth of diagnostic information relating to the health of components, FRUs, and integrated systems.
  • The system described herein generates a TIRF for an FRU by:
      • (1) introducing a sudden impulse step change in one or more operational parameters (e.g. load, temperature, voltage) associated with the FRU;
      • (2) measuring the dynamic response of all monitorable parameters following the impulse; and
      • (3) creating a multiparametric Kiviat diagram (also known as a spider plot) that contrasts the post-impulse behavior with the “reference” behavior. The Kiviat diagram provides a “dynamic fingerprint” for the FRU.
  • The TIRF provides a unique and concise representation of the dynamic response of the FRU to a controlled perturbation under specified initial conditions. The TIRF can be represented as a vector of signal values collected from sensors, arranged in a specific order, and normalized to represent the post-perturbation vs. pre-perturbation behavior as a multivariate “fingerprint” for that FRU. Furthermore, the TIRF can be plotted in Kiviat diagram format as a human visualization aid to very readily highlight exactly where any problems appear. As such, the TIRF provides a dynamic perturbation response signature of a given FRU under unified values of initial conditions. Note that each FRU can have several TIRFs corresponding to different types of perturbations.
  • An FRU TIRF is a very concise multivariate descriptor of a given FRU under specified conditions. Moreover, the collection of TIRFs for FRUs may have great potential to increase availability of complex enterprise servers. Along with standard long-duration online tests of FRUs in ORT and RQT, TIRFs can be generated very quickly to represent important diagnostic information about the FRUs dynamic operability and, in most cases, can be obtained without taking the FRU out of service.
  • Electronic System Under Test
  • FIG. 1 illustrates an electronic system under test 102 in accordance with an embodiment of the present invention. During operation, electronic system under test 102 receives soft variables 103, physical variables 104, and canary variables 105. In response to these input variables, electronic system under test 102 produces monitored output 106. Post-measurement operations 108 are performed on monitored output 106 to generate multiparametric response representation 110.
  • Soft variables 103 can include metrics such as load, throughput, and transaction latencies. These variables are typically derived from the operating system of electronic system under test 102. Physical variables 104 include temperature, voltage, current, and vibration within electronic system under test 102. Canary variables 105 include synthetic user-transactions and quality of performance values for these synthetic transactions.
  • The testing methodology involves first establishing a steady-state for soft variables 103, physical variables 104, and canary variables 105. After the steady state has been established, the system takes a pre-perturbation snapshot of the system parameters. Next, the system applies a sudden impulse change to one or more of the variables. For example, one or more voltages applied to the system can be changed, or the load applied from the canary variables might be stepped to a maximum value to stress the system.
  • After the sudden impulse has been applied, the system measures the dynamic response of electronic system under test 102, and takes a post-perturbation snapshot of the system parameters. The system then uses the pre-perturbation and the post-perturbation parameters to generate multiparametric response representation. This representation can be in the form of a Kiviat diagram.
  • Generating a Multiparametric Response Representation
  • FIG. 2 presents a flowchart illustrating the process of generating a multiparametric response representation in accordance with an embodiment of the present invention. The system starts by taking a pre-perturbation snapshot of system parameters after the system has been allowed to reach a steady-state (step 202). Next, the system introduces a sudden impulse step change to one or more of the variables (step 204).
  • After the sudden impulse step change, the system measures the dynamic response of electronic system under test 102 (step 206). The system then takes a post-perturbation snapshot of the system parameters (step 208). Finally, the system generates a multiparametric response representation (a Kiviat diagram) from the pre-perturbation snapshot and the post perturbation snapshot (step 210). This Kiviat diagram can then be compared with a previous Kiviat diagram taken from electronic system under test 102 or it can be compared with a Kiviat diagram that was generated from a known good electronic system to determine if electronic system under test 102 has any incipient failures.
  • Normalized Temperature Response
  • FIG. 3 illustrates a normalized temperature response in time to a step voltage change in accordance with an embodiment of the present invention. The upper chart in FIG. 3 illustrates a normalized step voltage change in several voltages applied to, for example, a system board within a computer system. The lower chart in FIG. 3 illustrates the normalized temperature change at various points related to the system board in response to these step voltage changes. Note that while useful, these charts, which plot normalized parameter changes with respect to time, can be difficult to interpret.
  • Multiparametric Response Representation
  • FIG. 4 illustrates a multiparametric response representation (Kiviat diagram) of a response to a voltage step change in accordance with an embodiment of the present invention. The Kiviat diagrams in FIG. 4 represent the same inputs and responses illustrated in FIG. 3 above. The upper diagram in FIG. 4 represents the step voltage change, while the lower diagram represents the temperature response to the step voltage change.
  • The inner and outer polygons in the Kiviat diagrams represent the minimum and maximum values for the monitored parameters. These polygons can be compared with polygons from previous test of the same electronic system, or can be compared with polygons from a test on a known good system to determine if there exist any incipient failures within the electronic system.
  • The foregoing descriptions of embodiments of the present invention have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.

Claims (20)

1. A method for generating telemetric impulsional response fingerprints for an electronic system, comprising:
determining a steady-state response of the electronic system under specified initial conditions;
introducing a sudden impulse step change to a parameter of the electronic system;
measuring a dynamic response of the electronic system to the sudden impulse step change; and
generating a multiparametric representation, which simultaneously displays the steady state response and the dynamic response.
2. The method of claim 1, wherein determining the steady-state response of the electronic system involves making measurements through a continuous system telemetry harness.
3. The method of claim 1, wherein determining of the steady-state response of the electronic system involves monitoring at least one of temperature, voltage, current, and vibration at multiple points within the electronic system.
4. The method of claim 1, wherein introducing the sudden impulse step change involves changing at least one of a load, a temperature, a voltage, and a vibration within the electronic system.
5. The method of claim 1, wherein measuring the dynamic response of the electronic system involves normalizing the dynamic response for measured system parameters.
6. The method of claim 1, wherein generating a multiparametric response representation involves creating a Kiviat diagram, which displays both the steady state response and the dynamic response.
7. The method of claim 1, further comprising detecting incipient problems in the electronic system by comparing the multiparametric representation with a standard multiparametric representation derived from a known good electronic system.
8. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for generating telemetric impulsional response fingerprints for an electronic system, the method comprising:
determining a steady-state response of the electronic system under specified initial conditions;
introducing a sudden impulse step change to a parameter of the electronic system;
measuring a dynamic response of the electronic system to the sudden impulse step change; and
generating a multiparametric representation, which simultaneously displays the steady state response and the dynamic response.
9. The computer-readable storage medium of claim 8, wherein determining the steady-state response of the electronic system involves making measurements through a continuous system telemetry harness.
10. The computer-readable storage medium of claim 8, wherein determining f the steady-state response of the electronic system involves monitoring at least one of temperature, voltage, current, and vibration at multiple points within the electronic system.
11. The computer-readable storage medium of claim 8, wherein introducing the sudden impulse step change involves changing at least one of a load, a temperature, a voltage, and a vibration within the electronic system.
12. The computer-readable storage medium of claim 8, wherein measuring the dynamic response of the electronic system involves normalizing the dynamic response for measured system parameters.
13. The computer-readable storage medium of claim 8, wherein generating a multiparametric response representation involves creating a Kiviat diagram, which displays both the steady state response and the dynamic response.
14. The computer-readable storage medium of claim 8, the method further comprising detecting incipient problems in the electronic system by comparing the multiparametric representation with a standard multiparametric representation derived from a known good electronic system.
15. An apparatus for generating telemetric impulsional response fingerprints for an electronic system, comprising:
a determining mechanism configured to determine a steady-state response of the electronic system under specified initial conditions;
a step-change mechanism configured to introduce a sudden impulse step change to a parameter of the electronic system;
a measuring mechanism configured to measure a dynamic response of the electronic system to the sudden impulse step change; and
a generating mechanism configured to generate a multiparametric representation, which simultaneously displays the steady state response and the dynamic response.
16. The apparatus of claim 15, wherein determining the steady-state response of the electronic system involves making measurements through a continuous system telemetry harness.
17. The apparatus of claim 15, wherein determining the steady-state response of the electronic system involves monitoring at least one of temperature, voltage, current, and vibration at multiple points within the electronic system.
18. The apparatus of claim 15, wherein introducing the sudden impulse step change involves changing at least one of a load, a temperature, a voltage, and a vibration within the electronic system.
19. The apparatus of claim 15, wherein measuring the dynamic response of the electronic system involves normalizing the dynamic response for measured system parameters.
20. The apparatus of claim 15, wherein generating a multiparametric response representation involves creating a Kiviat diagram, which displays both the steady state response and the dynamic response.
US11/203,361 2005-08-11 2005-08-11 Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system Abandoned US20070034206A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/203,361 US20070034206A1 (en) 2005-08-11 2005-08-11 Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system
PCT/US2006/025936 WO2007021389A2 (en) 2005-08-11 2006-06-30 Generating a telemetric impulsional response fingerprint for a computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/203,361 US20070034206A1 (en) 2005-08-11 2005-08-11 Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system

Publications (1)

Publication Number Publication Date
US20070034206A1 true US20070034206A1 (en) 2007-02-15

Family

ID=37654871

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/203,361 Abandoned US20070034206A1 (en) 2005-08-11 2005-08-11 Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system

Country Status (2)

Country Link
US (1) US20070034206A1 (en)
WO (1) WO2007021389A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080046559A1 (en) * 2006-08-16 2008-02-21 Sun Microsystems, Inc. Method and system for identification of decisive action state of server components via telemetric condition tracking
US20080120064A1 (en) * 2006-10-26 2008-05-22 Urmanov Aleksey M Detecting a failure condition in a system using three-dimensional telemetric impulsional response surfaces
US20110110401A1 (en) * 2008-04-18 2011-05-12 Astrium Limited Modular digital processing system for telecommunications satellite payloads

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4937763A (en) * 1988-09-06 1990-06-26 E I International, Inc. Method of system state analysis
US5223207A (en) * 1992-01-29 1993-06-29 The United States Of America As Represented By The United States Department Of Energy Expert system for online surveillance of nuclear reactor coolant pumps
US5731998A (en) * 1995-07-14 1998-03-24 Hewlett-Packard Company Method and apparatus for comparing a sample with a reference using a spider diagram
US20040078723A1 (en) * 2002-10-17 2004-04-22 Gross Kenny C. Method and apparatus for monitoring and recording computer system performance parameters
US20040074311A1 (en) * 2002-07-19 2004-04-22 Celerity Group, Inc. Methods and apparatus for pressure compensation in a mass flow controller

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4937763A (en) * 1988-09-06 1990-06-26 E I International, Inc. Method of system state analysis
US5223207A (en) * 1992-01-29 1993-06-29 The United States Of America As Represented By The United States Department Of Energy Expert system for online surveillance of nuclear reactor coolant pumps
US5731998A (en) * 1995-07-14 1998-03-24 Hewlett-Packard Company Method and apparatus for comparing a sample with a reference using a spider diagram
US20040074311A1 (en) * 2002-07-19 2004-04-22 Celerity Group, Inc. Methods and apparatus for pressure compensation in a mass flow controller
US20040078723A1 (en) * 2002-10-17 2004-04-22 Gross Kenny C. Method and apparatus for monitoring and recording computer system performance parameters

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080046559A1 (en) * 2006-08-16 2008-02-21 Sun Microsystems, Inc. Method and system for identification of decisive action state of server components via telemetric condition tracking
US8341260B2 (en) 2006-08-16 2012-12-25 Oracle America, Inc. Method and system for identification of decisive action state of server components via telemetric condition tracking
US20080120064A1 (en) * 2006-10-26 2008-05-22 Urmanov Aleksey M Detecting a failure condition in a system using three-dimensional telemetric impulsional response surfaces
US7548820B2 (en) * 2006-10-26 2009-06-16 Sun Microsystems, Inc. Detecting a failure condition in a system using three-dimensional telemetric impulsional response surfaces
US20110110401A1 (en) * 2008-04-18 2011-05-12 Astrium Limited Modular digital processing system for telecommunications satellite payloads

Also Published As

Publication number Publication date
WO2007021389A2 (en) 2007-02-22
WO2007021389A3 (en) 2007-06-14

Similar Documents

Publication Publication Date Title
US7890813B2 (en) Method and apparatus for identifying a failure mechanism for a component in a computer system
US9038030B2 (en) Methods for predicting one or more defects in a computer program and devices thereof
US8326680B2 (en) Business activity monitoring anomaly detection
US7680624B2 (en) Method and apparatus for performing a real-time root-cause analysis by analyzing degrading telemetry signals
US8046637B2 (en) Telemetry data filtering through sequential analysis
US9292473B2 (en) Predicting a time of failure of a device
US7162393B1 (en) Detecting degradation of components during reliability-evaluation studies
US20130138419A1 (en) Method and system for the assessment of computer system reliability using quantitative cumulative stress metrics
KR100803889B1 (en) Method and system for analyzing performance of providing services to client terminal
JP2002342128A (en) Method to extract health of service from host machine
US7076389B1 (en) Method and apparatus for validating sensor operability in a computer system
FR2933789A1 (en) METHODS OF IDENTIFYING FLIGHT PROFILES IN AIRCRAFT MAINTENANCE OPERATIONS
US10996861B2 (en) Method, device and computer product for predicting disk failure
US7751910B2 (en) High-accuracy virtual sensors for computer systems
CN111209153B (en) Abnormity detection processing method and device and electronic equipment
US7171586B1 (en) Method and apparatus for identifying mechanisms responsible for “no-trouble-found” (NTF) events in computer systems
BR102013000145A2 (en) system comprising a monitoring device, computer readable medium and method of updating a decision algorithm
US7668696B2 (en) Method and apparatus for monitoring the health of a computer system
US20070034206A1 (en) Method and apparatus for generating a telemetric impulsional response fingerprint for a computer system
US8140277B2 (en) Enhanced characterization of electrical connection degradation
US20090307668A1 (en) Software problem identification tool
US7085681B1 (en) Symbiotic interrupt/polling approach for monitoring physical sensors
EP3999983B1 (en) Time-series data condensation and graphical signature analysis
US8253588B2 (en) Facilitating power supply unit management using telemetry data analysis
US9164822B2 (en) Method and system for key performance indicators elicitation with incremental data decycling for database management system

Legal Events

Date Code Title Description
AS Assignment

Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:URMANOV, ALEKSY M.;BOUGAEV, ANTON A.;GROSS, KENNY C.;REEL/FRAME:016896/0498

Effective date: 20050725

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION