US20120246520A1 - Monitoring method, information processing apparatus, and computer-readable medium storing monitoring program - Google Patents

Monitoring method, information processing apparatus, and computer-readable medium storing monitoring program Download PDF

Info

Publication number
US20120246520A1
US20120246520A1 US13/348,831 US201213348831A US2012246520A1 US 20120246520 A1 US20120246520 A1 US 20120246520A1 US 201213348831 A US201213348831 A US 201213348831A US 2012246520 A1 US2012246520 A1 US 2012246520A1
Authority
US
United States
Prior art keywords
information
item
rule
examination
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/348,831
Other versions
US8904234B2 (en
Inventor
Masazumi Matsubara
Atsuji Sekiguchi
Kuniaki Shimada
Yuji Wada
Yasuhide Matsumoto
Shinji Kikuchi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Assigned to FUJITSU LIMITED reassignment FUJITSU LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIKUCHI, SHINJI, MATSUBARA, MASAZUMI, MATSUMOTO, YASUHIDE, SEKIGUCHI, ATSUJI, SHIMADA, KUNIAKI, WADA, YUJI
Publication of US20120246520A1 publication Critical patent/US20120246520A1/en
Application granted granted Critical
Publication of US8904234B2 publication Critical patent/US8904234B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0748Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a remote unit communicating with a single-box computer node experiencing an error/fault
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Definitions

  • the embodiments discussed herein are related to a monitoring method and an information processing apparatus for monitoring one or more apparatuses, and also related to a computer-readable medium which stores a monitoring program for monitoring one or more apparatuses.
  • a system administrator of the information processing system determines whether there is a failure in apparatuses, such as servers, storage devices, and communication apparatuses, and takes necessary measures when there is a failure. For example, if a hardware failure is found in an apparatus, the system administrator may stop the apparatus and change the hardware. In addition, if a failure is found in the execution state of software, the system administrator may stop processes of the software and investigate the cause of the failure. Further, if an overload on an apparatus is found, the system administrator may add more resources for information processing.
  • an information processing apparatus for operations management to collect information from monitored target apparatuses and examine the collected information to thereby automatically detect a failure (or a sign of a failure) in an apparatus.
  • the information processing apparatus may issue a warning to the system administrator, or may take necessary measures (for example, transmit a stop instruction to an apparatus in a failure state) according to a predetermined processing procedure.
  • a monitoring method used by an information processing system which monitors one or more apparatuses based on information on a plurality of items acquired from the one or more apparatuses, the monitoring method including: among a first item, a second item, and a third item whose information is associated with the information on the first item and the information on the second item, examining the information on the third item; omitting examination of the information on the first item and the information on the second item in a case where no failure is detected in the examination of the information on the third item; and examining the information on the first item and the information on the second item in a case where a failure is detected in the examination of the information on the third item.
  • FIG. 1 illustrates an information processing apparatus according to a first embodiment
  • FIG. 2 illustrates an information processing system according to a second embodiment
  • FIG. 3 is a block diagram of exemplary hardware of a terminal
  • FIG. 4 is a block diagram of exemplary software of the information processing system
  • FIG. 5 illustrates examples of configuration items
  • FIG. 6 illustrates an example of a description of configuration information
  • FIG. 7 illustrates an example of a failure propagation relationship
  • FIG. 8 illustrates an example of a propagation relationship table
  • FIG. 9 illustrates an example of a rule definition table
  • FIG. 10 illustrates an example of a reaction definition table
  • FIG. 11 is a flowchart illustrating a rule registration process
  • FIG. 12 is a flowchart illustrating the rule registration process, continued from FIG. 11 ;
  • FIG. 13 illustrates an example of a rule editing screen
  • FIG. 14 illustrates an example of a rule conversion
  • FIG. 15 illustrates an example of a workflow
  • FIG. 16 illustrates an example of a description of flow information
  • FIG. 17 is a flowchart illustrating a flow registration process
  • FIG. 18 is a flowchart illustrating a rule examination process
  • FIG. 19 is a first sequence diagram illustrating an example of an execution procedure of a workflow
  • FIG. 20 is a second sequence diagram illustrating an example of the execution procedure of a workflow.
  • FIG. 21 is a third sequence diagram illustrating an example of the execution procedure of a workflow.
  • FIG. 1 illustrates an information processing apparatus according to a first embodiment.
  • An information processing apparatus 10 according to the first embodiment is used in an information processing system for monitoring apparatuses 21 to 23 based on information on multiple items acquired from the apparatuses 21 to 23 .
  • the information processing system may monitor whether there is a failure during automatic execution of a process, such as a software update, according to a workflow definition, and then stop the process if there is a failure.
  • the apparatuses 21 to 23 are electronic devices, such as a server, a communication apparatus, and a storage apparatus.
  • the information processing apparatus 10 includes an examining unit 12 .
  • the examining unit 12 may be implemented as a program to be executed using a central processing unit (CPU) and a random access memory (RAM).
  • the examining unit 12 examines information on examination target items indicated by examination information 11 a stored in a storage unit 11 .
  • the examination information 11 a may include information indicating a criterion (determination rule) for determining the normal state (or the presence of a failure) with respect to each of the examination target items.
  • the storage unit 11 may be included in the information processing apparatus 10 , or may be a storage device included in another information processing apparatus.
  • information acquired for the item #3 is associated with both information on the item #1 and information on the item #2.
  • the information on the item #3 indicates a matter affected by both an apparatus status indicated by the item #1 and an apparatus status indicated by the item #2.
  • the information on the item #3 indicates a failure if at least one of the information on the item #1 and the information on the item #2 indicates a failure.
  • the examining unit 12 examines the information on the item #3 acquired from the apparatuses 21 to 23 . In the case where no failure is detected in the examination of the information on the item #3, the examining unit 12 omits examination of the information on the items #1 and #2. On the other hand, if a failure is detected in the examination of the information on the item #3, the examining unit 12 further examines the information on the items #1 and #2 acquired from the apparatuses 21 to 23 . Note that the examining unit 12 may examine the information on the item #3 when the information on the item #3 has been updated. Whether the information on the item #3 has been updated may be monitored by using a database for collecting information from the apparatuses 21 to 23 . In addition, the information on the items #1 and #2 may be acquired from the apparatuses 21 to 23 after a failure is detected in the examination of the information on the item #3.
  • the information processing apparatus 10 or another information processing apparatus may automatically add the item #3 as an examination target item when the items #1 and #2 are specified as examination target items.
  • the item #3 associated with both the items #1 and #2 is retrieved with reference to a storage device that stores relationship information.
  • the relationship information here indicates a relationship among multiple items (for example, a relationship in which information on one item has an effect on information on another item).
  • the information processing apparatus 10 or the other information processing apparatus specifies the retrieved item #3 as an item for prioritized examination.
  • the examination information 11 a may include information indicating the priority for the examination, in association with the items #1 to #3.
  • the information processing apparatus 10 examines the information of the item #3 among the items #1 and #2 and the item #3 whose information is associated with the information on the items #1 and #2. In the case where no failure is detected in the examination of the information on the item #3, examination of the information on the items #1 and #2 is omitted. On the other hand, in the case where a failure is detected in the examination of the information on the item #3, the information on both the items #1 and #2 is examined. This enables omitting the examination of the information on the items #1 and #2 if there is no failure in the apparatuses 21 to 23 , thereby reducing the number of items subject to continuous examination. As a result, it is possible to reduce the load of monitoring which uses information on multiple items.
  • FIG. 2 illustrates an information processing system according to a second embodiment.
  • the information processing system according to the second embodiment carries out operations management, such as application of an updated program, with respect to system resources 40 according to a workflow definition. Automation of operations management is sometimes called runbook automation (RBA).
  • the information processing system includes a terminal 100 , a flow controller 200 , a flow engine 300 , a rule engine 400 and a configuration management database (CMDB) server 500 .
  • Each apparatus of the information processing system is connected to a network 50 .
  • the information processing system is installed, for example, in a data center.
  • the system resources 40 include various electronic devices used for information processing.
  • the system resources 40 include a server 41 , a communication apparatus 42 , such as a switch, and a storage device 43 .
  • the server 41 executes application software using resources, such as a CPU, a RAM, a hard disk drive (HDD) and the like.
  • the communication apparatus 42 transfers data between apparatuses (for example, between the server 41 and the storage device 43 ).
  • the storage device 43 stores data to be used for information processing in a nonvolatile storage device, such as a HDD.
  • the terminal 100 is a computer operated by a user (for example, an administrator of the information processing system). Based on operations of the user, the terminal 100 generates flow information indicating an operations management workflow of the system resources 40 , and transmits the flow information to the flow controller 200 . In addition, the terminal 100 generates rule information indicating a rule for determining whether there is a failure in the system resources 40 during execution of the workflow, and transmits the rule information to the flow controller 200 . In addition, the terminal 100 generates reaction information indicating a correcting process (reaction) taken on the occurrence of a rule violation and registers the reaction information in the flow controller 200 .
  • reaction indicating a correcting process
  • the flow controller 200 is a computer for controlling the execution of the workflow.
  • the flow controller 200 registers the flow information in the flow engine 300 , and causes the flow engine 300 to execute a process defined in the flow information.
  • the flow controller 200 registers the rule information in the rule engine 400 , and causes the rule engine 400 to examine whether a rule violation has occurred. In the case where a rule violation is detected, the flow controller 200 causes the flow engine 300 to execute a process defined in the reaction information, and stops the workflow.
  • the flow controller 200 reports a result of the workflow execution to the terminal 100 .
  • the flow engine 300 is a computer for, in response to an instruction from the flow controller 200 , executing a process defined in the flow information with respect to the system resources 40 .
  • the flow engine 300 transmits a command, such as a stop command, a command for a program update, and a restart command, to an apparatus of the system resources 40 .
  • the rule engine 400 is a computer for examining whether a rule violation has occurred (whether there is a failure in the system resources 40 ) during the time the flow engine 300 executes a workflow.
  • the rule engine 400 acquires configuration information of the system resources 40 from the CMDB server 500 , and performs rule examination by cross-checking the configuration information and the rule information. In the case of detecting a rule violation, the rule engine 400 reports the rule violation to the flow controller 200 .
  • the CMDB server 500 is a computer functioning as a database server for collecting the configuration information from the system resources 40 .
  • the configuration information includes information indicating hardware used by each apparatus of the system resources 40 , software being executed by each of the apparatuses, the status of the hardware and software and the like.
  • the configuration information may be collected by the CMDB server 500 periodically making access to each of the apparatuses, or by each of the apparatuses periodically or irregularly (for example, at the time when the configuration information is updated) transmitting the configuration information to the CMDB server 500 .
  • the CMDB server 500 provides the rule engine 400 with configuration information to be used for the rule examination.
  • the CMDB server 500 may not collect configuration information not to be used for the rule examination.
  • the terminal 100 may be integrated into a single computer.
  • the flow controller 200 may be integrated into a single computer.
  • the flow engine 300 may be integrated into a single computer.
  • FIG. 3 is a block diagram of exemplary hardware of the terminal 100 .
  • the terminal 100 includes a CPU 101 , a RAM 102 , a HDD 103 , an image signal processor 104 , an input signal processor 105 , a disk drive 106 , and a communication unit 107 . These units are connected to a bus inside the terminal 100 .
  • each of the server 41 and other apparatuses of the system resources 40 , the flow controller 200 , the flow engine 300 , the rule engine 400 and the CMDB server 500 may be realized by similar hardware to that of the terminal 100 .
  • the CPU 101 is a processing device for controlling information processing in the terminal 100 .
  • the CPU 101 runs a program by reading at least a part of programs and data stored in the HDD 103 and deploying the read part in the RAM 102 .
  • the terminal 100 may include multiple processing devices and distribute the information processing across the processing devices.
  • the RAM 102 is a volatile memory for temporarily storing programs and data to be used by the CPU 101 .
  • the terminal 100 may include a different type of memory other than the RAM, or may include multiple memories.
  • the HDD 103 is a nonvolatile storage device for storing programs, such as an operating system (OS) program and application programs, and data to be used for information processing.
  • the HDD 103 reads from and writes to a built-in magnetic disk according to instructions from the CPU 101 .
  • the terminal 100 may include a different type of nonvolatile storage device (for example, a solid state drive (SSD)) other than the HDD, or may include multiple storage devices.
  • SSD solid state drive
  • the image signal processor 104 outputs an image to a display connected to the terminal 100 .
  • a display 31 a cathode ray tube (CRT) display or a liquid crystal display, for example, may be used.
  • CTR cathode ray tube
  • the input signal processor 105 acquires an input signal from an input device 32 connected to the terminal 100 and outputs the signal to the CPU 101 .
  • a pointing device such as a mouse and a touch panel, or a keyboard, for example, may be used.
  • the disk drive 106 is a drive apparatus for reading programs and data recorded in a recording medium 33 .
  • the following may be used as the recording medium 33 : a magnetic disk, such as a flexible disk (FD); an optical disk, such as a compact disc (CD) and a digital versatile disc (DVD); or a magneto-optical disk (MO).
  • the disk drive 106 stores the programs and data read from the recording medium 33 in the RAM 102 or the HDD 103 according to, for example, instructions from the CPU 101 .
  • the communication unit 107 is a communication interface connected to the network 50 to thereby perform communications.
  • the connection to the network 50 is established using either a wired or wireless connection. That is, the communication unit 107 may be either a wire communication interface or a wireless communication interface.
  • FIG. 4 is a block diagram of exemplary software of the information processing system. Each block may be implemented, for example, as a program module to be executed using a CPU and a RAM.
  • the terminal 100 includes a configuration information acquirer 110 , a rule editor 120 , and a flow editor 130 .
  • the flow controller 200 includes reaction information storage unit 210 and a flow control unit 220 .
  • the flow engine 300 includes flow information storage unit 310 and a flow executor 320 .
  • the rule engine 400 includes rule information storage unit 410 , a rule converter 420 , and a rule examining unit 430 .
  • the CMDB server 500 includes a configuration information storage unit 510 , relationship information storage unit 520 , a configuration information collector 530 , and an update monitor 540 .
  • the configuration information acquirer 110 acquires configuration information from the CMDB server 500 . Based on the configuration information acquired by the configuration information acquirer 110 , the rule editor 120 displays a screen for editing rules on the display. Then, the rule editor 120 generates rule information based on a user's input on the screen and transmits the rule information to the flow controller 200 . In addition, the rule editor 120 displays a screen for editing reaction on the display, generates reaction information based on a user's input, and transmits the reaction information to the flow controller 200 .
  • the flow editor 130 displays a screen for editing a workflow on the display, generates flow information based on a user's input, and transmits the flow information to the flow controller 200 .
  • the reaction information storage unit 210 stores the reaction information.
  • the flow control unit 220 receives the rule information from the terminal 100 , and transfers the rule information to the rule engine 400 .
  • the flow control unit 220 receives the reaction information from the terminal 100 , and stores the reaction information in the reaction information storage unit 210 .
  • the flow control unit 220 receives the flow information from the terminal 100 , and corrects the flow information so that reaction indicated by the reaction information is executed when a rule violation is detected during execution of the workflow. Subsequently, the flow controller 200 transmits the corrected flow information to the flow engine 300 .
  • the flow control unit 220 instructs the rule engine 400 to perform a rule examination, and instructs the flow engine 300 to continue or stop the workflow based on an examination result. Further, the flow control unit 220 reports a result of the workflow execution to the terminal 100 .
  • the flow information storage unit 310 stores the flow information.
  • the flow executor 320 receives the flow information from the flow controller 200 , and stores the flow information in the flow information storage unit 310 . In addition, based on an instruction from the flow controller 200 , the flow executor 320 executes processing (task) of one or more steps indicated by the flow information stored in the flow information storage unit 310 .
  • the flow executor 320 transmits a command, such as a stop command, a command for a program update, and a restart command, to the system resources 40 .
  • the flow executor 320 may refer to the configuration information held by the CMDB server 500 in order to execute a task, and update the configuration information based on a result of the task execution.
  • the rule information storage unit 410 stores the rule information.
  • the rule converter 420 receives the rule information from the flow controller 200 , corrects the rule information by referring to the configuration information and propagation relationship information held by the CMDB server 500 , and stores the corrected rule information in the rule information storage unit 410 .
  • configuration item or “CI”
  • the rule converter 420 acquires at least part of the configuration information from the CMDB server 500 , and develops the item classifications into actually existing items.
  • the rule converter 420 acquires the propagation relationship information from the CMDB server 500 , and converts the rules so as to reduce the number of items to be continuously examined (monitoring items). Details of the propagation relationship and the rule conversion are described later.
  • the rule examining unit 430 acquires at least part of the configuration information from the CMDB server 500 , and examines whether the configuration information violates a rule of the rule information stored in the rule information storage unit 410 . Subsequently, the rule examining unit 430 reports an examination result to the flow controller 200 . In addition, if receiving an instruction for an automatic examination from the flow controller 200 , the rule examining unit 430 registers, in the CMDB server 500 , monitoring items selected from the configuration items. Then, when reported by the CMDB server 500 that information on the registered items has been updated, the rule examining unit 430 acquires the information on the registered items from the CMDB server 500 and performs an examination with the information.
  • the configuration information storage unit 510 stores the configuration information collected from the system resources 40 .
  • the relationship information storage unit 520 stores propagation relationship information which indicates a propagation relationship among configuration items.
  • the propagation relationship includes a relationship of failure propagation among configuration items.
  • One example of such a relationship of failure propagation is that, if a failure is detected in an item of “HDD”, a failure is also detected in an item of “server” including the HDD.
  • the configuration information collector 530 collects the configuration information from the system resources 40 , and stores the configuration information in the configuration information storage unit 510 . In addition, upon request of the terminal 100 , the flow engine 300 , or the rule engine 400 , the configuration information collector 530 transmits at least part of the configuration information stored in the configuration information storage unit 510 to the requestor. Note that the configuration information collector 530 may not continuously collect information on items other than the monitoring items of the rule engine 400 . In this case, when a request is made for information on an uncollected item, the configuration information collector 530 collects information on the item from the system resources 40 and transmits the collected information to the requestor.
  • the update monitor 540 Upon request of the rule engine 400 , the update monitor 540 transmits, to the rule engine 400 , the propagation relationship information stored in the relationship information storage unit 520 . In addition, when the monitoring items are reported by the rule engine 400 , the update monitor 540 instructs the configuration information collector 530 to collect information on at least the reported target items. Then, the update monitor 540 monitors information on the target items stored in the configuration information storage unit 510 . When detecting an update of information, the update monitor 540 reports the detection of the update of configuration information to the rule engine 400 .
  • FIG. 5 illustrates examples of configuration items.
  • the examples of the configuration information depicted in FIG. 5 include one item “serviceA” whose classification is “Service”; two items “svr 1 ” and “svr 2 ” whose classification is “Server”; and two items “app 1 ” and “app 2 ” whose classification is “Application”. Further, the examples of FIG.
  • “serviceA” is provided by two servers “svr 1 ” and “svr 2 ”.
  • “svr 1 ” includes two CPUs “svr 1 _c 1 ” and “svr 1 _c 2 ”, one memory “svr 1 _m 1 ”, and one HDD “svr 1 _h 1 ”.
  • “svr 2 ” includes two CPUs “svr 2 _c 1 ” and “svr 2 _c 2 ”, one memory “svr 2 _m 1 ”, and one HDD “svr 2 _h 1 ”.
  • “app 1 ” is being implemented on “svr 1 ”
  • “app 2 ” is being implemented on “svr 2 ”.
  • “app 1 ” is a Web application
  • “app 2 ” is a database management system (DBMS).
  • DBMS database management system
  • Each of the information of “Service”, information of “Server”, information of “Application”, information of “Cpu”, information of “Memory”, and information of “Hdd”, includes information of a status.
  • the information of “Application” may additionally include information of a cache size, information of a path to a configuration file, and information of the number of transactions.
  • the configuration information may include information other than the above.
  • FIG. 6 illustrates an example of a description of the configuration information.
  • Configuration information 511 illustrated in FIG. 6 describes the items of FIG. 5 in an eXtensible Markup Language (XML) format.
  • the configuration information 511 is stored in the configuration information storage unit 510 .
  • the configuration information 511 includes item tags, ⁇ item>, and relationship tags, ⁇ relationship>.
  • An item tag indicates a configuration item (i.e., an item of configuration information), and includes a server tag, ⁇ Server>, or an application tag, ⁇ Application>.
  • the server tag include a CPU tag, ⁇ Cpu>, a memory tag, ⁇ Memory>, and a HDD tag, ⁇ Hdd>.
  • Each of the server tag, application tag, CPU tag, memory tag, and HDD tag corresponds to one of the items illustrated in FIG. 5 , and includes, as an attribute, a value indicating a status.
  • each application tag includes a parameter tag, ⁇ param>, indicating a parameter, such as a cache size and a path to a configuration file.
  • Each parameter tag includes a value of a corresponding parameter as an attribute.
  • a relationship tag indicates a relationship among items indicated by item tags, and includes, as an attribute, a value indicating a type of the relationship.
  • each relationship tag includes a source item tag, ⁇ sourceItem>, and a target item tag, ⁇ targetItem>.
  • a relationship tag whose source item is “Service”, target item is “Server”, and type is “consistOf” indicates a relationship in which “Service” is realized using “Server”.
  • a relationship tag whose source item is “Application”, target item is “Server”, and type is “installedOn” indicates a relationship in which “Application” is implemented on “Server”.
  • FIG. 7 illustrates an example of a failure propagation relationship.
  • the propagation relationship is a relationship of failure propagation among items of the configuration information, and has a propagation direction. For example, “HDD failure” and “memory error” lead to “server failure”. “Setting error” and “high load” lead to “application failure”. “Server failure” and “application failure” lead to “service failure”. Accordingly, in the configuration information, when the status of an “Hdd” item indicates an error, it is considered that the status of a corresponding “Server” item also indicates an error. In addition, when the status of a “Server” item indicates an error, the status of a corresponding “Service” item also indicates an error. Thus, the status of a higher-level item is affected by the status of a lower-level item.
  • FIG. 8 illustrates an example of a propagation relationship table.
  • the example of the propagation relationship table of FIG. 8 corresponds to the propagation relationship depicted in FIG. 7 .
  • a propagation relationship table 521 is stored in the relationship information storage unit 520 .
  • the propagation relationship table 521 includes items of “ID”, “failure”, “parent failure”, and “condition”.
  • the item “ID” is identification information used for identifying each failure.
  • the item “failure” indicates a failure factor, such as a service failure.
  • the item “parent failure” indicates another failure directly affected by a corresponding failure.
  • the parent failure of “HDD failure” is “server failure”
  • the parent failure of “server failure” is “service failure”.
  • the item “condition” takes the form of a formula for determining, from the configuration information, whether there is a status failure, and is described using an item classification name (such as “Service”). In the example of FIG. 8 , each condition is described in the form of a logical expression which results in TRUE when no status failure is present (normal).
  • [ATTR] indicates an arbitrary parameter name
  • [OP] indicates an arbitrary operator
  • [VAL] indicates an arbitrary fixed value
  • FIG. 9 illustrates an example of a rule definition table.
  • a rule definition table 411 is generated by the rule converter 420 , and then stored in the rule information storage unit 410 .
  • the rule definition table 411 includes items of “ID”, “rule”, and “parent rule”.
  • the item “ID” is identification information used for identifying each rule.
  • the item “rule” is described in a formula for determining, from the configuration information, whether there is a failure, and uses an item name (such as “serviceA”) in the description.
  • each rule is described in the form of a logical expression which results in TRUE when no failure is present (normal). However, the rule may be described in the form of a logical expression which results in TRUE when a failure is present.
  • the item “parent rule” indicates another rule assumed to similarly detect a failure when a failure is detected based on a corresponding rule (i.e., there is a rule violation).
  • the parent rule of rules R 1 and R 2 is a rule R 4 .
  • the rules R 1 and R 2 it is assumed that there would also be a violation of the rule R 4 .
  • the rules R 1 and R 2 it is assumed that there would also be no violation of the rules R 1 and R 2 . Accordingly, examination using the rules R 1 and R 2 can be omitted.
  • the user of the terminal 100 needs to define only the rules R 1 , R 2 , and a rule R 3 , as described later.
  • the rule R 4 is automatically added by the rule converter 420 based on the rules R 1 and R 2 .
  • FIG. 10 illustrates an example of a reaction definition table.
  • a reaction definition table 211 is stored in the reaction information storage unit 210 .
  • the reaction definition table 211 includes items of “ID”, “condition”, and “reaction”.
  • the item “ID” is identification information used for identifying each reaction.
  • the item “condition” indicates a condition for a corresponding reaction to be carried out.
  • R 1 OR R 3 indicates a condition in which a violation of at least one of the above-mentioned rules R 1 and R 3 is detected.
  • R 2 AND R 3 indicates a condition in which a violation of both the above-mentioned rules R 2 and R 3 is detected.
  • the item “reaction” indicates a specific movement of a corresponding reaction. Reactions are defined, such as stopping a service and adding a server to be used for providing a service.
  • FIG. 11 is a flowchart illustrating a rule registration process. Hereinbelow, the process of FIG. 11 is described according to the step numbers.
  • Step S 11 The configuration information acquirer 110 of the terminal 100 accesses the CMDB server 500 to acquire the configuration information (denoted as “C-INFO” in FIG. 11 ) 511 .
  • Step S 12 Based on the configuration information 511 acquired in Step S 11 , the rule editor 120 of the terminal 100 generates a rule editing screen which allows selection of an actually existing item or an item classification to thereby enable a user to input a rule, and displays the rule editing screen on the display. Then, the rule editor 120 generates rule information which indicates the rule input by the user, and transmits the rule information to the flow controller 200 .
  • the flow control unit 220 of the flow controller 200 transfers the rule information received from the terminal 100 to the rule engine 400 .
  • Step S 13 The rule converter 420 of the rule engine 400 determines whether the rule information received from the flow controller 200 includes a rule described using an item classification. Whether an item classification is included in a rule may be determined with reference to the configuration information 511 held by the CMDB server 500 . In the case where an item classification is included, the process proceeds to Step S 14 . In the case where no item classification is included, the process proceeds to Step S 15 .
  • Step S 14 The rule converter 420 accesses the CMDB server 500 to acquire the configuration information 511 . Then, based on the configuration information 511 , the rule converter 420 develops the item classification included in the rule information into an actually existing item.
  • Step S 15 The rule converter 420 accesses the CMDB server 500 to acquire the propagation relationship table 521 .
  • Step S 16 The rule converter 420 selects one rule included in the rule information received from the flow controller 200 in Step S 12 .
  • Step S 17 The rule converter 420 determines whether the rule selected in Step S 16 matches any of the conditions described in the propagation relationship table 521 acquired in Step S 15 . At the time of the determination, the rule converter 420 compares the rule with the condition by replacing an item included in the rule with a corresponding classification. The replacement of the item with the classification may be performed by referring to the configuration information 511 . When there is a matched condition, the process proceeds to Step S 18 . When there is no matched condition, the process proceeds to Step S 20 .
  • Step S 18 The rule converter 420 generates a subtree of a tree structure (tree structure as illustrated in FIG. 7 ) defined by the propagation relationship table 521 .
  • the generated subtree includes a path between a node corresponding to the condition matched in Step S 17 and a root node.
  • a subtree is generated in such a manner as to include a node corresponding to “service failure” (S 1 ) and a node corresponding to “server failure” (S 2 ).
  • Step S 19 With reference to the configuration information 511 , the rule converter 420 associates an item with each node of the subtree generated in Step S 18 . For example, in the case where the rule selected in Step S 16 includes the server item “svr 1 ”, the rule converter 420 associates the server item “svr 1 ” with the node of “server failure” (S 2 ) and the service item “serviceA” with the node of “service failure” (S 1 ). Then, the process proceeds to Step S 21 .
  • Step S 20 The rule converter 420 specifies the rule selected in Step S 16 as a monitoring rule (i.e., a rule having no parent rule).
  • Step S 21 The rule converter 420 determines whether all rules included in the rule information have been selected in Step S 16 . In the case where all rules have been selected, the process proceeds to Step S 22 . In the case where there is a rule which has not been selected, the process moves to Step S 16 .
  • FIG. 12 is a flowchart illustrating the rule registration process, continued from FIG. 11 .
  • Step S 22 The rule converter 420 searches the generated subtrees to see whether there are two or more subtrees including a node having the same failure factor and the same item. If such two or more subtrees are found, the rule converter 420 merges the two or more subtrees into one subtree.
  • Step S 23 The rule converter 420 selects one subtree from an integrated subtree formed by the merger in Step S 22 .
  • Step S 24 The rule converter 420 determines whether there is a branch (i.e., whether multiple leaf nodes are included) in the subtree selected in Step S 23 . In the case where there is a branch (i.e., multiple leaf nodes are included), the process proceeds to Step S 25 . In the case where there is no branch (i.e., only one leaf node is included), the process proceeds to Step S 27 .
  • Step S 25 Within the subtree selected in Step S 23 , the rule converter 420 selects, among nodes of branch sources, one located at the highest level (i.e., among nodes that cover all leaf nodes, one located at the lowest level).
  • Step S 26 From the propagation relationship table 521 acquired in Step S 15 , the rule converter 420 acquires a condition corresponding to the node selected in Step S 25 . Then, the rule converter 420 replaces an item classification included in the condition with an item associated with the selected node to thereby generate a higher-level rule. The rule converter 420 specifies the generated rule as a monitoring rule. Subsequently, the process proceeds to Step S 28 .
  • Step S 27 The rule converter 420 specifies, as a monitoring rule, a rule corresponding to a leaf node of the subtree selected in Step S 23 (i.e., an original rule included in the rule information received from the flow controller 200 ).
  • Step S 28 The rule converter 420 determines whether all subtrees have been selected in Step S 23 . In the case where all the subtrees have been selected, the rule converter 420 stores, in the rule information storage unit 410 , the specified monitoring rule and the rule definition table 411 including the original rule, and then ends the process. In the case where there is a subtree which has not been selected, the process moves to Step S 23 .
  • FIG. 13 illustrates an example of the rule editing screen.
  • the terminal 100 Based on the configuration information held by the CMDB server 500 , the terminal 100 generates a rule editing screen 121 for assisting rule creation.
  • the rule editing screen 121 is displayed on, for example, the display 31 .
  • the rule editing screen 121 includes columns for “classification”, “item”, “attribute”, and “rule”.
  • classifications such as “server” and “HDD” are described.
  • item column actually existing items included in the configuration information are described for each classification. The user is able to select a classification or an item of an examination target. Selecting a classification is treated as selecting all of actually existing items corresponding to the selected classification. For example, if the classification “server” is selected, it is treated as selecting both the items “svr 1 ” and “svr 2 ” corresponding to the classification “server”.
  • attribute column attributes included in the configuration information are described.
  • a formula with respect to a corresponding attribute may be input. The user specifies one or more attributes corresponding to the selected classification or item and inputs a formula for each of the specified attributes.
  • FIG. 14 illustrates an example of a rule conversion.
  • the rules R 1 and R 2 of FIG. 9 are included in the rule information generated at the terminal 100 .
  • the rule engine 400 Based on the rule R 1 , the rule engine 400 generates a subtree including a “server failure” node and a “service failure” node.
  • the item “svr 1 ” is associated with the “server failure” node, and the item “serviceA” related to the item “svr 1 ” is associated with the “service failure” node.
  • the rule engine 400 Based on the rule R 2 , the rule engine 400 generates a subtree including a “high load” node, an “application failure” node, and a “service failure” node.
  • the item “app 2 ” is associated with the “high load” node
  • the item “app 2 ” is associated with the “application failure” node
  • the item “serviceA” is associated with the “service failure” node.
  • the root nodes have the same failure factor (“service failure”) and item (“serviceA”). Accordingly, the rule engine 400 merges the two subtrees into one. Within the merged subtree, the rule engine 400 selects a branch node located at the highest level, namely, in this example, the “service failure” node. Then, the rule engine 400 generates the rule R 4 , which corresponds to the “service failure” node, and specifies the rule R 4 as a rule to be used for a continuous examination (i.e., monitoring rule). In this case, the original rules R 1 and R 2 are not specified as monitoring rules.
  • FIG. 15 illustrates an example of a workflow.
  • the flow controller 200 corrects the flow information received from the terminal 100 in such a manner that a rule examination is performed in the middle of the workflow. Then, the flow controller 200 registers the corrected flow information in the flow engine 300 .
  • the flow controller 200 inserts an examination task of a preliminary examination before the first normal task (task 1 ), and inserts an examination task of a post examination after the last normal task (task 2 ). In addition, the flow controller 200 inserts an examination task of an in-execution examination (in-execution examination 1 ) between consecutive normal tasks (in this case, between the tasks 1 and 2 ). Further, the flow controller 200 inserts, after each examination task, a branch corresponding to a result of the examination, and corrects inter-task transitions in such a manner that a transition is made to a normal task of stopping the workflow (cancel) in the case where a rule violation is detected.
  • FIG. 16 illustrates an example of a description of the flow information.
  • Flow information 311 illustrated in FIG. 16 describes the corrected workflow of FIG. 15 in an XML format.
  • the flow information 311 is stored in the flow information storage unit 310 of the flow engine 300 .
  • the flow information 311 includes, with respect to each workflow, a tag ⁇ process> with an attribute of a workflow name.
  • the flow information 311 includes, with respect to each examination task, a tag ⁇ receiveTask> with an attribute of an examination task name, and also includes, with respect to each normal task, a tag ⁇ scriptTask> with an attribute of a normal task name.
  • the flow information 311 includes a tag ⁇ exclusiveGateway> corresponding to a branch and a tag ⁇ sequenceFlow> indicating an inter-task transition or a transition between a task and a branch.
  • FIG. 17 is a flowchart illustrating a flow registration process. Hereinbelow, the process of FIG. 17 is described according to the step numbers.
  • Step S 31 In response to an input from the user, the flow editor 130 of the terminal 100 generates flow information indicating a workflow and including no examination tasks. Then, the flow editor 130 transmits the flow information to the flow controller 200 .
  • Step S 32 The flow controller 220 of the flow controller 200 adds examination tasks of a preliminary examination and a post examination to the workflow indicated by the flow information received from the terminal 100 .
  • Step S 33 The flow controller 220 inserts, in the workflow indicated by the flow information, an examination task of an in-execution examination between consecutive normal tasks.
  • Step S 34 The flow controller 220 inserts, in the workflow indicated by the flow information, a branch after each examination task.
  • Step S 35 The flow controller 220 adds, to the workflow indicated by the flow information, a normal task to be executed in the case where a rule violation is detected in each examination task. In addition, the flow controller 220 adds a transition from the branch inserted in Step S 34 to the normal task.
  • Step S 36 The flow controller 220 transmits the corrected flow information to the flow engine 300 .
  • the flow executor 320 of the flow engine 300 stores the flow information received from the flow controller 200 in the flow information storage unit 310 .
  • FIG. 18 is a flowchart illustrating the rule examination process.
  • Step S 41 The rule examining unit 430 of the rule engine 400 selects monitoring rules (i.e., rules having no parent rule) from rules which have been registered in the rule definition table 411 stored in the rule information storage unit 410 .
  • Step S 42 The rule examining unit 430 acquires, from the CMDB server 500 , configuration information of items included in the rules selected in Step S 41 .
  • Step S 43 The rule examining unit 430 evaluates the selected rules using the configuration information acquired in Step S 42 and determines whether there is a nonconforming rule (for example, whether a corresponding logical expression results in FALSE). In the case where there is at least one nonconforming rule, the process proceeds to Step S 44 . In the case where there is no nonconforming rule, the process proceeds to Step S 48 .
  • a nonconforming rule for example, whether a corresponding logical expression results in FALSE.
  • Step S 44 The rule examining unit 430 refers to the rule definition table 411 to determine whether the nonconforming rule has a lower-level rule (i.e., whether there is a rule having the nonconforming rule as its parent rule). In the case where at least one nonconforming rule has a lower-level rule, the process proceeds to Step S 45 . In the case where the nonconforming rule has no lower-level rule, the process proceeds to Step S 47 .
  • Step S 45 The rule examining unit 430 selects the lower-level rule of the nonconforming rule from rules which have been registered in the rule definition table 411 .
  • Step S 46 The rule examining unit 430 requests, from the CMDB server 500 , configuration information of items included in the rule selected in Step S 45 .
  • the configuration information collector 530 of the CMDB server 500 collects the requested configuration information from the system resources 40 and transmits the configuration information to the rule engine 400 . Note that in the case where the requested configuration information has been collected, the configuration information collector 530 transmits the configuration information stored in the configuration information storage unit 510 .
  • the rule examining unit 430 evaluates the rule selected in Step S 45 using the acquired configuration information.
  • Step S 47 The rule examining unit 430 determines that there is a rule violation and identifies a rule against which a violation is detected. Then, the rule examining unit 430 reports the violated rule to the flow controller 200 . Subsequently, the process ends.
  • Step S 48 The rule examining unit 430 determines that there is no rule violation. In the case of having started the rule examination based on an instruction from the flow controller 200 , the rule examining unit 430 reports to the flow controller 200 that there is no rule violation.
  • the rule examining unit 430 examines the item “serviceA” based on the rules R 3 and R 4 . In the case where a violation of the rule R 3 is detected, the rule examining unit 430 determines that a violation takes place only against the rule R 3 because the rule R 3 has no low-level rules. On the other hand, in the case where a violation of the rule R 4 is detected, the rule examining unit 430 examines the items “svr 1 ” and “app 2 ” based on the lower-level rules R 1 and R 2 . Then, the rule examining unit 430 identifies all rules, including lower-level rules, against which violations are detected.
  • FIG. 19 is a first sequence diagram illustrating an example of the execution procedure of a workflow.
  • Steps S 51 to S 56 are performed at the start of the workflow, and Steps S 57 to S 61 are performed at the time of executing an examination task defined in the workflow.
  • Step S 51 The terminal 100 generates rule information and flow information, which are transmitted to the flow controller 200 .
  • Step S 52 The flow controller 200 corrects the flow information received from the terminal 100 and converts the workflow so that a rule examination is performed. At the time of the workflow conversion, the flow controller 200 may refer to the reaction information which has been registered.
  • Step S 53 The flow controller 200 transmits the flow information corrected in Step S 52 to the flow engine 300 .
  • the flow engine 300 stores the flow information received from the terminal 100 .
  • Step S 54 The flow controller 200 transfers the rule information received from the terminal 100 to the rule engine 400 .
  • Step S 55 The rule engine 400 develops, into items, item classifications described in the rule information received from the flow controller 200 .
  • the rule engine 400 corrects the rule information by adding rules in such a manner as to reduce the number of monitoring rules (rules used for a continuous examination).
  • the rule engine 400 refers to the configuration information and the propagation relationship information held by the CMDB server 500 , and then stores the corrected rule information.
  • Step S 56 After confirming completion of registration of the flow information and the rule information, the flow controller 200 instructs the flow engine 300 to start the workflow.
  • the flow controller 200 sequentially executes tasks described in the flow information.
  • Step S 57 In the case where a task to be executed next is an examination task (a preliminary examination, an in-execution examination, or a post examination), the flow engine 300 interrupts the workflow and reports the interruption to the flow controller 200 .
  • Step S 58 The flow controller 200 instructs the rule engine 400 to perform an examination of the configuration information based on the rule information.
  • Step S 59 The rule engine 400 acquires the configuration information from the CMDB server 500 and evaluates monitoring rules using the configuration information. In the case where a violation of a monitoring rule is detected, the rule engine 400 also evaluates lower-level rules of the monitoring rule.
  • Step S 60 The rule engine 400 reports a result of the examination acquired in Step S 59 to the flow controller 200 . In the case where a rule violation is detected, the rule engine 400 also reports identification information of the rule against which a violation has been found to the flow controller 200 .
  • Step S 61 Based on the examination result reported by the rule engine 400 , the flow controller 200 instructs the flow engine 300 on the next operation.
  • the flow controller 200 may refer to the reaction information which has been registered. For example, when a rule violation is not detected, the flow controller 200 transmits “NEXT” (flow continuation). On the other hand, when a rule violation is detected, the flow controller 200 transmits “CANCEL” (flow termination).
  • the flow engine 300 resumes the interrupted workflow, and determines a branch direction described in the flow information in accordance with the instruction of the flow controller 200 .
  • FIG. 20 is a second sequence diagram illustrating an example of the execution procedure of a workflow.
  • Steps S 62 to S 65 are dedicated to a process enabling a rule examination to be performed in an appropriate manner at a timing other than a timing at which an examination task is performed. Those steps are carried out immediately after the preliminary examination. Steps S 66 to S 69 are performed at the time of executing a normal task.
  • Step S 62 After the examination task of the preliminary examination is completed, the flow engine 300 interrupts the workflow and reports the interruption to the flow controller 200 .
  • Step S 63 The flow controller 200 instructs the rule engine 400 to configure a setting for enabling an automatic rule examination.
  • Step S 64 The rule engine 400 extracts items included in monitoring rules from the rule information, and reports the extracted items to the CMDB server 500 .
  • the CMDB server 500 registers the items reported by the rule engine 400 as items to be monitored.
  • Step S 65 After confirming completion of the registration of the monitoring items, the flow controller 200 instructs the flow engine 300 on the next operation to continue the workflow. The flow engine 300 resumes the interrupted workflow.
  • Step S 66 In the case where a task to be executed next is a normal task, the flow controller 200 performs a process defined in the flow information. Processes defined in the flow information include, for example, transmission of a stop command to an apparatus of the system resources 40 , transmission of a command to install an updated program, and transmission of a restart command. At the time of executing a normal task, the flow controller 200 may refer to the configuration information held by the CMDB server 500 , and may then update the configuration information.
  • Step S 67 The CMDB server 500 monitors whether configuration information of the monitoring items registered in Step S 64 has been changed.
  • the configuration information held by the CMDB server 500 may be changed by the flow controller 200 , and may be changed based on information collected from the system resources 40 .
  • the CMDB server 500 reports the change to the rule engine 400 .
  • Step S 68 The rule engine 400 acquires the configuration information from the CMDB server 500 and evaluates the monitoring rules using the configuration information. In the case where a violation of a monitoring rule is detected, the rule engine 400 also evaluates lower-level rules of the monitoring rule.
  • Step S 69 When detecting a rule violation in Step S 68 , the rule engine 400 reports to the flow controller 200 that a rule violation has been detected, along with identification information of the rule against which a violation has been found. When not detecting a rule violation, the rule engine 400 may not make a report accordingly to the flow controller 200 .
  • the flow controller 200 instructs the flow engine 300 on termination of the workflow, for example, at a timing when the workflow is interrupted next time.
  • FIG. 21 is a third sequence diagram illustrating an example of the execution procedure of a workflow.
  • Steps S 70 to S 73 are dedicated to a process of canceling the setting of an automatic examination. Those steps are carried out immediately before the post examination. Steps S 74 to S 77 are performed at the end of the workflow.
  • Step S 70 Before the examination task of the post examination is executed, the flow engine 300 interrupts the workflow and reports the interruption to the flow controller 200 .
  • Step S 71 The flow controller 200 instructs the rule engine 400 to cancel the setting for enabling an automatic rule examination.
  • Step S 72 The rule engine 400 extracts items included in monitoring rules from the rule information, and reports the extracted items to the CMDB server 500 .
  • the CMDB server 500 deletes the registration of the items reported by the rule engine 400 .
  • Step S 73 After confirming completion of the deletion of the monitoring items, the flow controller 200 instructs the flow engine 300 on the next operation to continue the workflow. The flow engine 300 resumes the interrupted workflow.
  • Step S 74 Once the workflow is completed (for example, the post examination task is completed), the flow engine 300 reports the completion to the flow controller 200 .
  • the workflow completion may be a normal termination or an abnormal termination.
  • Step S 75 The flow controller 200 instructs the rule engine 400 to delete the rule information. In response to the instruction, the rule engine 400 deletes the rule information.
  • Step S 76 The flow controller 200 instructs the flow engine 300 to delete the flow information. In response to the instruction, the flow engine 300 deletes the flow information.
  • Step S 77 The flow controller 200 reports to the terminal 100 either a normal or an abnormal termination as a result of the workflow execution.
  • multiple rules are integrated in the light of the propagation relationship among items, and continuous examination is performed for the integrated rules.
  • it is possible to reduce the number of rule examinations, thereby reducing the examination load.
  • it is possible to determine the cause of a failure by examining multiple lower-level rules under the higher-level rule.
  • the workload of collecting the configuration information can be reduced by avoiding continuously collecting configuration information corresponding to the lower-level rules.
  • rule examination is increased by registering monitoring items in the CMDB server 500 , and then using a change in information of the registered items as a trigger for performing the rule examination.
  • the rule editing screen is generated with reference to the configuration information held by the CMDB server 500 so that rules are described by specifying actually existing items. Providing such a rule editing screen to the user prevents description of incorrect rules as a result of specifying non-existent items. In addition, allowing the user to specify item classifications to thereby describe rules prevents some omissions of rule description.
  • the workflow control and the rule examination according to the second embodiment are achieved by causing the terminal 100 , the flow controller 200 , the flow engine 300 , the rule engine 400 , and the CMDB server 500 , each of which is a computer, to execute a program individually.
  • the program may be recorded in a computer-readable recording medium (for example, the recording medium 33 ).
  • the recording medium are a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory.
  • the magnetic disk may be a FD or a HDD.
  • the optical disk may be a CD, a compact disc-recordable (CD-R), a compact disc-rewritable (CD-RW), a DVD, a digital versatile disc-recordable (DVD-R), or a digital versatile disc-rewritable (DVD-RW).
  • CD-R compact disc-recordable
  • CD-RW compact disc-rewritable
  • DVD-R digital versatile disc-recordable
  • DVD-RW digital versatile disc-rewritable
  • a portable recording medium storing the program thereon for example, is provided.
  • the program may be stored in a storage device of another computer and then distributed via the network 50 .
  • Each of the above-mentioned computers stores, in a storage device (for example, the HDD 103 ), the program recorded in the portable recording medium or received from another computer, and reads the program from the storage device and executes the program. Note, however, that the program read from the portable recording medium or received from another computer via the network 50 may be executed directly.

Abstract

An information processing apparatus monitors one or more apparatuses based on information on multiple items acquired from the apparatuses. Information on an item #3 is associated with information on items #1 and #2. The information processing apparatus examines the information on the item #3. In the case where no failure is detected in the examination of the information on the item #3, the information processing apparatus omits examination of the information on the items #1 and #2. On the other hand, in the case where a failure is detected in the examination of the information on the item #3, the information processing apparatus examines the information on each of the items #1 and #2.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2011-068133, filed on Mar. 25, 2011, the entire contents of which are incorporated herein by reference.
  • FIELD
  • The embodiments discussed herein are related to a monitoring method and an information processing apparatus for monitoring one or more apparatuses, and also related to a computer-readable medium which stores a monitoring program for monitoring one or more apparatuses.
  • BACKGROUND
  • In the operation of an information processing system, it is sometimes the case that a system administrator of the information processing system determines whether there is a failure in apparatuses, such as servers, storage devices, and communication apparatuses, and takes necessary measures when there is a failure. For example, if a hardware failure is found in an apparatus, the system administrator may stop the apparatus and change the hardware. In addition, if a failure is found in the execution state of software, the system administrator may stop processes of the software and investigate the cause of the failure. Further, if an overload on an apparatus is found, the system administrator may add more resources for information processing.
  • On the other hand, when the number of apparatuses in the information processing system becomes large, the burden on the system administrator for the monitoring operation is increased. One conceivable way to deal with the burden is for an information processing apparatus for operations management to collect information from monitored target apparatuses and examine the collected information to thereby automatically detect a failure (or a sign of a failure) in an apparatus. When detecting a failure, the information processing apparatus may issue a warning to the system administrator, or may take necessary measures (for example, transmit a stop instruction to an apparatus in a failure state) according to a predetermined processing procedure.
  • Note that a method has been proposed for determining whether to continue or stop autonomous control by collecting information from management target computers and cross-checking the collected information with stop determination rules in an operations management system which carries out autonomous operation and management of the computers according to a predefined workflow (see Japanese Laid-open Patent Publication No. 2007-4337, paragraphs [0028] and [0030]).
  • However, an increase in the number of items of information to be collected and examined leads to an increase in the monitoring load. Assume that continuous examination is carried out, with respect to each server, for information on specific items, for example, the status of a hard disk drive (HDD), the status of a memory, and the number of transactions being executed by the server. This causes an increase in the workload of an information processing apparatus for carrying out the examination.
  • SUMMARY
  • According to one aspect, there is provided a monitoring method used by an information processing system which monitors one or more apparatuses based on information on a plurality of items acquired from the one or more apparatuses, the monitoring method including: among a first item, a second item, and a third item whose information is associated with the information on the first item and the information on the second item, examining the information on the third item; omitting examination of the information on the first item and the information on the second item in a case where no failure is detected in the examination of the information on the third item; and examining the information on the first item and the information on the second item in a case where a failure is detected in the examination of the information on the third item.
  • The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
  • It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 illustrates an information processing apparatus according to a first embodiment;
  • FIG. 2 illustrates an information processing system according to a second embodiment;
  • FIG. 3 is a block diagram of exemplary hardware of a terminal;
  • FIG. 4 is a block diagram of exemplary software of the information processing system;
  • FIG. 5 illustrates examples of configuration items;
  • FIG. 6 illustrates an example of a description of configuration information;
  • FIG. 7 illustrates an example of a failure propagation relationship;
  • FIG. 8 illustrates an example of a propagation relationship table;
  • FIG. 9 illustrates an example of a rule definition table;
  • FIG. 10 illustrates an example of a reaction definition table;
  • FIG. 11 is a flowchart illustrating a rule registration process;
  • FIG. 12 is a flowchart illustrating the rule registration process, continued from FIG. 11;
  • FIG. 13 illustrates an example of a rule editing screen;
  • FIG. 14 illustrates an example of a rule conversion;
  • FIG. 15 illustrates an example of a workflow;
  • FIG. 16 illustrates an example of a description of flow information;
  • FIG. 17 is a flowchart illustrating a flow registration process;
  • FIG. 18 is a flowchart illustrating a rule examination process;
  • FIG. 19 is a first sequence diagram illustrating an example of an execution procedure of a workflow;
  • FIG. 20 is a second sequence diagram illustrating an example of the execution procedure of a workflow; and
  • FIG. 21 is a third sequence diagram illustrating an example of the execution procedure of a workflow.
  • DESCRIPTION OF EMBODIMENTS
  • Several embodiments will be described below with reference to the accompanying drawings, wherein like reference numerals refer to like elements throughout.
  • [a] First Embodiment
  • FIG. 1 illustrates an information processing apparatus according to a first embodiment. An information processing apparatus 10 according to the first embodiment is used in an information processing system for monitoring apparatuses 21 to 23 based on information on multiple items acquired from the apparatuses 21 to 23. The information processing system may monitor whether there is a failure during automatic execution of a process, such as a software update, according to a workflow definition, and then stop the process if there is a failure. The apparatuses 21 to 23 are electronic devices, such as a server, a communication apparatus, and a storage apparatus.
  • The information processing apparatus 10 includes an examining unit 12. The examining unit 12 may be implemented as a program to be executed using a central processing unit (CPU) and a random access memory (RAM). The examining unit 12 examines information on examination target items indicated by examination information 11 a stored in a storage unit 11. The examination information 11 a may include information indicating a criterion (determination rule) for determining the normal state (or the presence of a failure) with respect to each of the examination target items. The storage unit 11 may be included in the information processing apparatus 10, or may be a storage device included in another information processing apparatus.
  • Assume here that, in the examination information 11 a, items #1, #2, and #3 among multiple items of information available from the apparatuses 21 to 23 are specified as examination target items. Information acquired for the item #3 is associated with both information on the item #1 and information on the item #2. For example, the information on the item #3 indicates a matter affected by both an apparatus status indicated by the item #1 and an apparatus status indicated by the item #2. In this case, it is considered that the information on the item #3 indicates a failure if at least one of the information on the item #1 and the information on the item #2 indicates a failure.
  • The examining unit 12 examines the information on the item #3 acquired from the apparatuses 21 to 23. In the case where no failure is detected in the examination of the information on the item #3, the examining unit 12 omits examination of the information on the items #1 and #2. On the other hand, if a failure is detected in the examination of the information on the item #3, the examining unit 12 further examines the information on the items #1 and #2 acquired from the apparatuses 21 to 23. Note that the examining unit 12 may examine the information on the item #3 when the information on the item #3 has been updated. Whether the information on the item #3 has been updated may be monitored by using a database for collecting information from the apparatuses 21 to 23. In addition, the information on the items #1 and #2 may be acquired from the apparatuses 21 to 23 after a failure is detected in the examination of the information on the item #3.
  • Further, the information processing apparatus 10 or another information processing apparatus may automatically add the item #3 as an examination target item when the items #1 and #2 are specified as examination target items. For example, the item #3 associated with both the items #1 and #2 is retrieved with reference to a storage device that stores relationship information. The relationship information here indicates a relationship among multiple items (for example, a relationship in which information on one item has an effect on information on another item). The information processing apparatus 10 or the other information processing apparatus then specifies the retrieved item #3 as an item for prioritized examination. In this case, the examination information 11a may include information indicating the priority for the examination, in association with the items #1 to #3.
  • The information processing apparatus 10 according to the first embodiment examines the information of the item #3 among the items #1 and #2 and the item #3 whose information is associated with the information on the items #1 and #2. In the case where no failure is detected in the examination of the information on the item #3, examination of the information on the items #1 and #2 is omitted. On the other hand, in the case where a failure is detected in the examination of the information on the item #3, the information on both the items #1 and #2 is examined. This enables omitting the examination of the information on the items #1 and #2 if there is no failure in the apparatuses 21 to 23, thereby reducing the number of items subject to continuous examination. As a result, it is possible to reduce the load of monitoring which uses information on multiple items.
  • [b] Second Embodiment
  • FIG. 2 illustrates an information processing system according to a second embodiment. The information processing system according to the second embodiment carries out operations management, such as application of an updated program, with respect to system resources 40 according to a workflow definition. Automation of operations management is sometimes called runbook automation (RBA). The information processing system includes a terminal 100, a flow controller 200, a flow engine 300, a rule engine 400 and a configuration management database (CMDB) server 500. Each apparatus of the information processing system is connected to a network 50. The information processing system is installed, for example, in a data center.
  • The system resources 40 include various electronic devices used for information processing. For example, the system resources 40 include a server 41, a communication apparatus 42, such as a switch, and a storage device 43. The server 41 executes application software using resources, such as a CPU, a RAM, a hard disk drive (HDD) and the like. The communication apparatus 42 transfers data between apparatuses (for example, between the server 41 and the storage device 43). The storage device 43 stores data to be used for information processing in a nonvolatile storage device, such as a HDD.
  • The terminal 100 is a computer operated by a user (for example, an administrator of the information processing system). Based on operations of the user, the terminal 100 generates flow information indicating an operations management workflow of the system resources 40, and transmits the flow information to the flow controller 200. In addition, the terminal 100 generates rule information indicating a rule for determining whether there is a failure in the system resources 40 during execution of the workflow, and transmits the rule information to the flow controller 200. In addition, the terminal 100 generates reaction information indicating a correcting process (reaction) taken on the occurrence of a rule violation and registers the reaction information in the flow controller 200.
  • The flow controller 200 is a computer for controlling the execution of the workflow. The flow controller 200 registers the flow information in the flow engine 300, and causes the flow engine 300 to execute a process defined in the flow information. In addition, the flow controller 200 registers the rule information in the rule engine 400, and causes the rule engine 400 to examine whether a rule violation has occurred. In the case where a rule violation is detected, the flow controller 200 causes the flow engine 300 to execute a process defined in the reaction information, and stops the workflow. The flow controller 200 reports a result of the workflow execution to the terminal 100.
  • The flow engine 300 is a computer for, in response to an instruction from the flow controller 200, executing a process defined in the flow information with respect to the system resources 40. For example, the flow engine 300 transmits a command, such as a stop command, a command for a program update, and a restart command, to an apparatus of the system resources 40.
  • The rule engine 400 is a computer for examining whether a rule violation has occurred (whether there is a failure in the system resources 40) during the time the flow engine 300 executes a workflow. The rule engine 400 acquires configuration information of the system resources 40 from the CMDB server 500, and performs rule examination by cross-checking the configuration information and the rule information. In the case of detecting a rule violation, the rule engine 400 reports the rule violation to the flow controller 200.
  • The CMDB server 500 is a computer functioning as a database server for collecting the configuration information from the system resources 40. The configuration information includes information indicating hardware used by each apparatus of the system resources 40, software being executed by each of the apparatuses, the status of the hardware and software and the like. The configuration information may be collected by the CMDB server 500 periodically making access to each of the apparatuses, or by each of the apparatuses periodically or irregularly (for example, at the time when the configuration information is updated) transmitting the configuration information to the CMDB server 500. The CMDB server 500 provides the rule engine 400 with configuration information to be used for the rule examination. The CMDB server 500 may not collect configuration information not to be used for the rule examination.
  • Note that multiple functions of the terminal 100, the flow controller 200, the flow engine 300, the rule engine 400, and the CMDB server 500 may be integrated into a single computer. For example, the flow controller 200, the flow engine 300, and the rule engine 400 may be integrated into a single computer.
  • FIG. 3 is a block diagram of exemplary hardware of the terminal 100. The terminal 100 includes a CPU 101, a RAM 102, a HDD 103, an image signal processor 104, an input signal processor 105, a disk drive 106, and a communication unit 107. These units are connected to a bus inside the terminal 100. Note that each of the server 41 and other apparatuses of the system resources 40, the flow controller 200, the flow engine 300, the rule engine 400 and the CMDB server 500 may be realized by similar hardware to that of the terminal 100.
  • The CPU 101 is a processing device for controlling information processing in the terminal 100. The CPU 101 runs a program by reading at least a part of programs and data stored in the HDD 103 and deploying the read part in the RAM 102. Note that, the terminal 100 may include multiple processing devices and distribute the information processing across the processing devices.
  • The RAM 102 is a volatile memory for temporarily storing programs and data to be used by the CPU 101. Note that the terminal 100 may include a different type of memory other than the RAM, or may include multiple memories.
  • The HDD 103 is a nonvolatile storage device for storing programs, such as an operating system (OS) program and application programs, and data to be used for information processing. The HDD 103 reads from and writes to a built-in magnetic disk according to instructions from the CPU 101. Note that the terminal 100 may include a different type of nonvolatile storage device (for example, a solid state drive (SSD)) other than the HDD, or may include multiple storage devices.
  • According to an instruction from the CPU 101, the image signal processor 104 outputs an image to a display connected to the terminal 100. As the display 31, a cathode ray tube (CRT) display or a liquid crystal display, for example, may be used.
  • The input signal processor 105 acquires an input signal from an input device 32 connected to the terminal 100 and outputs the signal to the CPU 101. As the input device 32, a pointing device, such as a mouse and a touch panel, or a keyboard, for example, may be used.
  • The disk drive 106 is a drive apparatus for reading programs and data recorded in a recording medium 33. The following may be used as the recording medium 33: a magnetic disk, such as a flexible disk (FD); an optical disk, such as a compact disc (CD) and a digital versatile disc (DVD); or a magneto-optical disk (MO). The disk drive 106 stores the programs and data read from the recording medium 33 in the RAM 102 or the HDD 103 according to, for example, instructions from the CPU 101.
  • The communication unit 107 is a communication interface connected to the network 50 to thereby perform communications. The connection to the network 50 is established using either a wired or wireless connection. That is, the communication unit 107 may be either a wire communication interface or a wireless communication interface.
  • FIG. 4 is a block diagram of exemplary software of the information processing system. Each block may be implemented, for example, as a program module to be executed using a CPU and a RAM.
  • The terminal 100 includes a configuration information acquirer 110, a rule editor 120, and a flow editor 130. The flow controller 200 includes reaction information storage unit 210 and a flow control unit 220. The flow engine 300 includes flow information storage unit 310 and a flow executor 320. The rule engine 400 includes rule information storage unit 410, a rule converter 420, and a rule examining unit 430. The CMDB server 500 includes a configuration information storage unit 510, relationship information storage unit 520, a configuration information collector 530, and an update monitor 540.
  • The configuration information acquirer 110 acquires configuration information from the CMDB server 500. Based on the configuration information acquired by the configuration information acquirer 110, the rule editor 120 displays a screen for editing rules on the display. Then, the rule editor 120 generates rule information based on a user's input on the screen and transmits the rule information to the flow controller 200. In addition, the rule editor 120 displays a screen for editing reaction on the display, generates reaction information based on a user's input, and transmits the reaction information to the flow controller 200. The flow editor 130 displays a screen for editing a workflow on the display, generates flow information based on a user's input, and transmits the flow information to the flow controller 200.
  • The reaction information storage unit 210 stores the reaction information. The flow control unit 220 receives the rule information from the terminal 100, and transfers the rule information to the rule engine 400. In addition, the flow control unit 220 receives the reaction information from the terminal 100, and stores the reaction information in the reaction information storage unit 210. Further, the flow control unit 220 receives the flow information from the terminal 100, and corrects the flow information so that reaction indicated by the reaction information is executed when a rule violation is detected during execution of the workflow. Subsequently, the flow controller 200 transmits the corrected flow information to the flow engine 300. In addition, during execution of the workflow, the flow control unit 220 instructs the rule engine 400 to perform a rule examination, and instructs the flow engine 300 to continue or stop the workflow based on an examination result. Further, the flow control unit 220 reports a result of the workflow execution to the terminal 100.
  • The flow information storage unit 310 stores the flow information. The flow executor 320 receives the flow information from the flow controller 200, and stores the flow information in the flow information storage unit 310. In addition, based on an instruction from the flow controller 200, the flow executor 320 executes processing (task) of one or more steps indicated by the flow information stored in the flow information storage unit 310. The flow executor 320 transmits a command, such as a stop command, a command for a program update, and a restart command, to the system resources 40. The flow executor 320 may refer to the configuration information held by the CMDB server 500 in order to execute a task, and update the configuration information based on a result of the task execution.
  • The rule information storage unit 410 stores the rule information. The rule converter 420 receives the rule information from the flow controller 200, corrects the rule information by referring to the configuration information and propagation relationship information held by the CMDB server 500, and stores the corrected rule information in the rule information storage unit 410. In the case where item classifications are described in the rule information in place of items of the configuration information (hereinafter referred to as “configuration item” or “CI”), the rule converter 420 acquires at least part of the configuration information from the CMDB server 500, and develops the item classifications into actually existing items. In addition, in the case where multiple rules are included in the rule information, the rule converter 420 acquires the propagation relationship information from the CMDB server 500, and converts the rules so as to reduce the number of items to be continuously examined (monitoring items). Details of the propagation relationship and the rule conversion are described later.
  • In response to an instruction from the flow controller 200, the rule examining unit 430 acquires at least part of the configuration information from the CMDB server 500, and examines whether the configuration information violates a rule of the rule information stored in the rule information storage unit 410. Subsequently, the rule examining unit 430 reports an examination result to the flow controller 200. In addition, if receiving an instruction for an automatic examination from the flow controller 200, the rule examining unit 430 registers, in the CMDB server 500, monitoring items selected from the configuration items. Then, when reported by the CMDB server 500 that information on the registered items has been updated, the rule examining unit 430 acquires the information on the registered items from the CMDB server 500 and performs an examination with the information.
  • The configuration information storage unit 510 stores the configuration information collected from the system resources 40. The relationship information storage unit 520 stores propagation relationship information which indicates a propagation relationship among configuration items. The propagation relationship includes a relationship of failure propagation among configuration items. One example of such a relationship of failure propagation is that, if a failure is detected in an item of “HDD”, a failure is also detected in an item of “server” including the HDD.
  • The configuration information collector 530 collects the configuration information from the system resources 40, and stores the configuration information in the configuration information storage unit 510. In addition, upon request of the terminal 100, the flow engine 300, or the rule engine 400, the configuration information collector 530 transmits at least part of the configuration information stored in the configuration information storage unit 510 to the requestor. Note that the configuration information collector 530 may not continuously collect information on items other than the monitoring items of the rule engine 400. In this case, when a request is made for information on an uncollected item, the configuration information collector 530 collects information on the item from the system resources 40 and transmits the collected information to the requestor.
  • Upon request of the rule engine 400, the update monitor 540 transmits, to the rule engine 400, the propagation relationship information stored in the relationship information storage unit 520. In addition, when the monitoring items are reported by the rule engine 400, the update monitor 540 instructs the configuration information collector 530 to collect information on at least the reported target items. Then, the update monitor 540 monitors information on the target items stored in the configuration information storage unit 510. When detecting an update of information, the update monitor 540 reports the detection of the update of configuration information to the rule engine 400.
  • FIG. 5 illustrates examples of configuration items. The examples of the configuration information depicted in FIG. 5 include one item “serviceA” whose classification is “Service”; two items “svr1” and “svr2” whose classification is “Server”; and two items “app1” and “app2” whose classification is “Application”. Further, the examples of FIG. 5 also include four items “svr1_c1”, “svr1_c2”, “svr2_c1”, and “svr2_c2” whose classification is “Cpu”; two items “svr1_m1” and “svr2_m1” whose classification is “Memory”; and two items “svr1_h1” and “svr2_h1” whose classification is “Hdd”.
  • “serviceA” is provided by two servers “svr1” and “svr2”. “svr1” includes two CPUs “svr1_c1” and “svr1_c2”, one memory “svr1_m1”, and one HDD “svr1_h1”. Similarly, “svr2” includes two CPUs “svr2_c1” and “svr2_c2”, one memory “svr2_m1”, and one HDD “svr2_h1”. “app1” is being implemented on “svr1”, and “app2” is being implemented on “svr2”. For example, “app1” is a Web application, and “app2” is a database management system (DBMS).
  • Each of the information of “Service”, information of “Server”, information of “Application”, information of “Cpu”, information of “Memory”, and information of “Hdd”, includes information of a status. The information of “Application” may additionally include information of a cache size, information of a path to a configuration file, and information of the number of transactions. The configuration information may include information other than the above.
  • FIG. 6 illustrates an example of a description of the configuration information. Configuration information 511 illustrated in FIG. 6 describes the items of FIG. 5 in an eXtensible Markup Language (XML) format. The configuration information 511 is stored in the configuration information storage unit 510. The configuration information 511 includes item tags, <item>, and relationship tags, <relationship>.
  • An item tag indicates a configuration item (i.e., an item of configuration information), and includes a server tag, <Server>, or an application tag, <Application>. The server tag include a CPU tag, <Cpu>, a memory tag, <Memory>, and a HDD tag, <Hdd>. Each of the server tag, application tag, CPU tag, memory tag, and HDD tag corresponds to one of the items illustrated in FIG. 5, and includes, as an attribute, a value indicating a status. In addition, each application tag includes a parameter tag, <param>, indicating a parameter, such as a cache size and a path to a configuration file. Each parameter tag includes a value of a corresponding parameter as an attribute.
  • A relationship tag indicates a relationship among items indicated by item tags, and includes, as an attribute, a value indicating a type of the relationship. In addition, each relationship tag includes a source item tag, <sourceItem>, and a target item tag, <targetItem>. For example, a relationship tag whose source item is “Service”, target item is “Server”, and type is “consistOf” indicates a relationship in which “Service” is realized using “Server”. In addition, a relationship tag whose source item is “Application”, target item is “Server”, and type is “installedOn” indicates a relationship in which “Application” is implemented on “Server”.
  • FIG. 7 illustrates an example of a failure propagation relationship. The propagation relationship is a relationship of failure propagation among items of the configuration information, and has a propagation direction. For example, “HDD failure” and “memory error” lead to “server failure”. “Setting error” and “high load” lead to “application failure”. “Server failure” and “application failure” lead to “service failure”. Accordingly, in the configuration information, when the status of an “Hdd” item indicates an error, it is considered that the status of a corresponding “Server” item also indicates an error. In addition, when the status of a “Server” item indicates an error, the status of a corresponding “Service” item also indicates an error. Thus, the status of a higher-level item is affected by the status of a lower-level item.
  • FIG. 8 illustrates an example of a propagation relationship table. The example of the propagation relationship table of FIG. 8 corresponds to the propagation relationship depicted in FIG. 7. A propagation relationship table 521 is stored in the relationship information storage unit 520. The propagation relationship table 521 includes items of “ID”, “failure”, “parent failure”, and “condition”.
  • The item “ID” is identification information used for identifying each failure. The item “failure” indicates a failure factor, such as a service failure. The item “parent failure” indicates another failure directly affected by a corresponding failure. For example, the parent failure of “HDD failure” is “server failure”, and the parent failure of “server failure” is “service failure”. The item “condition” takes the form of a formula for determining, from the configuration information, whether there is a status failure, and is described using an item classification name (such as “Service”). In the example of FIG. 8, each condition is described in the form of a logical expression which results in TRUE when no status failure is present (normal). However, the condition may be described in the form of a logical expression which results in TRUE when a status failure is present. Note that, in FIG. 8, [ATTR] indicates an arbitrary parameter name, [OP] indicates an arbitrary operator, and [VAL] indicates an arbitrary fixed value.
  • FIG. 9 illustrates an example of a rule definition table. A rule definition table 411 is generated by the rule converter 420, and then stored in the rule information storage unit 410. The rule definition table 411 includes items of “ID”, “rule”, and “parent rule”.
  • The item “ID” is identification information used for identifying each rule. The item “rule” is described in a formula for determining, from the configuration information, whether there is a failure, and uses an item name (such as “serviceA”) in the description. In the example of FIG. 9, each rule is described in the form of a logical expression which results in TRUE when no failure is present (normal). However, the rule may be described in the form of a logical expression which results in TRUE when a failure is present. The item “parent rule” indicates another rule assumed to similarly detect a failure when a failure is detected based on a corresponding rule (i.e., there is a rule violation).
  • In the example of FIG. 9, the parent rule of rules R1 and R2 is a rule R4. When there is a violation of at least one of the rules R1 and R2, it is assumed that there would also be a violation of the rule R4. On the other hand, in the case where there is no violation of the rule R4, it is assumed that there would also be no violation of the rules R1 and R2. Accordingly, examination using the rules R1 and R2 can be omitted. Note that the user of the terminal 100 needs to define only the rules R1, R2, and a rule R3, as described later. The rule R4 is automatically added by the rule converter 420 based on the rules R1 and R2.
  • FIG. 10 illustrates an example of a reaction definition table. A reaction definition table 211 is stored in the reaction information storage unit 210. The reaction definition table 211 includes items of “ID”, “condition”, and “reaction”.
  • The item “ID” is identification information used for identifying each reaction. The item “condition” indicates a condition for a corresponding reaction to be carried out. For example, “R1 OR R3” indicates a condition in which a violation of at least one of the above-mentioned rules R1 and R3 is detected. In addition, “R2 AND R3” indicates a condition in which a violation of both the above-mentioned rules R2 and R3 is detected. The item “reaction” indicates a specific movement of a corresponding reaction. Reactions are defined, such as stopping a service and adding a server to be used for providing a service.
  • Next described is a process of registering rules in the information processing system. FIG. 11 is a flowchart illustrating a rule registration process. Hereinbelow, the process of FIG. 11 is described according to the step numbers.
  • (Step S11) The configuration information acquirer 110 of the terminal 100 accesses the CMDB server 500 to acquire the configuration information (denoted as “C-INFO” in FIG. 11) 511.
  • (Step S12) Based on the configuration information 511 acquired in Step S11, the rule editor 120 of the terminal 100 generates a rule editing screen which allows selection of an actually existing item or an item classification to thereby enable a user to input a rule, and displays the rule editing screen on the display. Then, the rule editor 120 generates rule information which indicates the rule input by the user, and transmits the rule information to the flow controller 200. The flow control unit 220 of the flow controller 200 transfers the rule information received from the terminal 100 to the rule engine 400.
  • (Step S13) The rule converter 420 of the rule engine 400 determines whether the rule information received from the flow controller 200 includes a rule described using an item classification. Whether an item classification is included in a rule may be determined with reference to the configuration information 511 held by the CMDB server 500. In the case where an item classification is included, the process proceeds to Step S14. In the case where no item classification is included, the process proceeds to Step S15.
  • (Step S14) The rule converter 420 accesses the CMDB server 500 to acquire the configuration information 511. Then, based on the configuration information 511, the rule converter 420 develops the item classification included in the rule information into an actually existing item.
  • (Step S15) The rule converter 420 accesses the CMDB server 500 to acquire the propagation relationship table 521.
  • (Step S16) The rule converter 420 selects one rule included in the rule information received from the flow controller 200 in Step S12.
  • (Step S17) The rule converter 420 determines whether the rule selected in Step S16 matches any of the conditions described in the propagation relationship table 521 acquired in Step S15. At the time of the determination, the rule converter 420 compares the rule with the condition by replacing an item included in the rule with a corresponding classification. The replacement of the item with the classification may be performed by referring to the configuration information 511. When there is a matched condition, the process proceeds to Step S18. When there is no matched condition, the process proceeds to Step S20.
  • (Step S18) The rule converter 420 generates a subtree of a tree structure (tree structure as illustrated in FIG. 7) defined by the propagation relationship table 521. The generated subtree includes a path between a node corresponding to the condition matched in Step S17 and a root node. For example, in the case where the rule selected in Step S16 matches the condition of “server failure” (S2), a subtree is generated in such a manner as to include a node corresponding to “service failure” (S1) and a node corresponding to “server failure” (S2).
  • (Step S19) With reference to the configuration information 511, the rule converter 420 associates an item with each node of the subtree generated in Step S18. For example, in the case where the rule selected in Step S16 includes the server item “svr1”, the rule converter 420 associates the server item “svr1” with the node of “server failure” (S2) and the service item “serviceA” with the node of “service failure” (S1). Then, the process proceeds to Step S21.
  • (Step S20) The rule converter 420 specifies the rule selected in Step S16 as a monitoring rule (i.e., a rule having no parent rule).
  • (Step S21) The rule converter 420 determines whether all rules included in the rule information have been selected in Step S16. In the case where all rules have been selected, the process proceeds to Step S22. In the case where there is a rule which has not been selected, the process moves to Step S16.
  • FIG. 12 is a flowchart illustrating the rule registration process, continued from FIG. 11.
  • (Step S22) The rule converter 420 searches the generated subtrees to see whether there are two or more subtrees including a node having the same failure factor and the same item. If such two or more subtrees are found, the rule converter 420 merges the two or more subtrees into one subtree.
  • (Step S23) The rule converter 420 selects one subtree from an integrated subtree formed by the merger in Step S22.
  • (Step S24) The rule converter 420 determines whether there is a branch (i.e., whether multiple leaf nodes are included) in the subtree selected in Step S23. In the case where there is a branch (i.e., multiple leaf nodes are included), the process proceeds to Step S25. In the case where there is no branch (i.e., only one leaf node is included), the process proceeds to Step S27.
  • (Step S25) Within the subtree selected in Step S23, the rule converter 420 selects, among nodes of branch sources, one located at the highest level (i.e., among nodes that cover all leaf nodes, one located at the lowest level).
  • (Step S26) From the propagation relationship table 521 acquired in Step S15, the rule converter 420 acquires a condition corresponding to the node selected in Step S25. Then, the rule converter 420 replaces an item classification included in the condition with an item associated with the selected node to thereby generate a higher-level rule. The rule converter 420 specifies the generated rule as a monitoring rule. Subsequently, the process proceeds to Step S28.
  • (Step S27) The rule converter 420 specifies, as a monitoring rule, a rule corresponding to a leaf node of the subtree selected in Step S23 (i.e., an original rule included in the rule information received from the flow controller 200).
  • (Step S28) The rule converter 420 determines whether all subtrees have been selected in Step S23. In the case where all the subtrees have been selected, the rule converter 420 stores, in the rule information storage unit 410, the specified monitoring rule and the rule definition table 411 including the original rule, and then ends the process. In the case where there is a subtree which has not been selected, the process moves to Step S23.
  • FIG. 13 illustrates an example of the rule editing screen. Based on the configuration information held by the CMDB server 500, the terminal 100 generates a rule editing screen 121 for assisting rule creation. The rule editing screen 121 is displayed on, for example, the display 31. The rule editing screen 121 includes columns for “classification”, “item”, “attribute”, and “rule”.
  • In the classification column, classifications, such as “server” and “HDD”, are described. In the item column, actually existing items included in the configuration information are described for each classification. The user is able to select a classification or an item of an examination target. Selecting a classification is treated as selecting all of actually existing items corresponding to the selected classification. For example, if the classification “server” is selected, it is treated as selecting both the items “svr1” and “svr2” corresponding to the classification “server”. In the attribute column, attributes included in the configuration information are described. In the rule column, a formula with respect to a corresponding attribute may be input. The user specifies one or more attributes corresponding to the selected classification or item and inputs a formula for each of the specified attributes.
  • FIG. 14 illustrates an example of a rule conversion. Assume here that the rules R1 and R2 of FIG. 9 are included in the rule information generated at the terminal 100. Based on the rule R1, the rule engine 400 generates a subtree including a “server failure” node and a “service failure” node. The item “svr1” is associated with the “server failure” node, and the item “serviceA” related to the item “svr1” is associated with the “service failure” node. In addition, based on the rule R2, the rule engine 400 generates a subtree including a “high load” node, an “application failure” node, and a “service failure” node. The item “app2” is associated with the “high load” node, the item “app2” is associated with the “application failure” node, and the item “serviceA” is associated with the “service failure” node.
  • In the above-mentioned two subtrees, the root nodes have the same failure factor (“service failure”) and item (“serviceA”). Accordingly, the rule engine 400 merges the two subtrees into one. Within the merged subtree, the rule engine 400 selects a branch node located at the highest level, namely, in this example, the “service failure” node. Then, the rule engine 400 generates the rule R4, which corresponds to the “service failure” node, and specifies the rule R4 as a rule to be used for a continuous examination (i.e., monitoring rule). In this case, the original rules R1 and R2 are not specified as monitoring rules.
  • Next described is a process of registering a workflow in the information processing system. FIG. 15 illustrates an example of a workflow. The flow controller 200 corrects the flow information received from the terminal 100 in such a manner that a rule examination is performed in the middle of the workflow. Then, the flow controller 200 registers the corrected flow information in the flow engine 300.
  • Assume here that flow information indicating a workflow for sequentially executing tasks 1 and 2 is generated at the terminal 100. The flow controller 200 inserts an examination task of a preliminary examination before the first normal task (task 1), and inserts an examination task of a post examination after the last normal task (task 2). In addition, the flow controller 200 inserts an examination task of an in-execution examination (in-execution examination 1) between consecutive normal tasks (in this case, between the tasks 1 and 2). Further, the flow controller 200 inserts, after each examination task, a branch corresponding to a result of the examination, and corrects inter-task transitions in such a manner that a transition is made to a normal task of stopping the workflow (cancel) in the case where a rule violation is detected.
  • FIG. 16 illustrates an example of a description of the flow information. Flow information 311 illustrated in FIG. 16 describes the corrected workflow of FIG. 15 in an XML format. The flow information 311 is stored in the flow information storage unit 310 of the flow engine 300. The flow information 311 includes, with respect to each workflow, a tag <process> with an attribute of a workflow name. In addition, the flow information 311 includes, with respect to each examination task, a tag <receiveTask> with an attribute of an examination task name, and also includes, with respect to each normal task, a tag <scriptTask> with an attribute of a normal task name. Further, the flow information 311 includes a tag <exclusiveGateway> corresponding to a branch and a tag <sequenceFlow> indicating an inter-task transition or a transition between a task and a branch.
  • FIG. 17 is a flowchart illustrating a flow registration process. Hereinbelow, the process of FIG. 17 is described according to the step numbers.
  • (Step S31) In response to an input from the user, the flow editor 130 of the terminal 100 generates flow information indicating a workflow and including no examination tasks. Then, the flow editor 130 transmits the flow information to the flow controller 200.
  • (Step S32) The flow controller 220 of the flow controller 200 adds examination tasks of a preliminary examination and a post examination to the workflow indicated by the flow information received from the terminal 100.
  • (Step S33) The flow controller 220 inserts, in the workflow indicated by the flow information, an examination task of an in-execution examination between consecutive normal tasks.
  • (Step S34) The flow controller 220 inserts, in the workflow indicated by the flow information, a branch after each examination task.
  • (Step S35) The flow controller 220 adds, to the workflow indicated by the flow information, a normal task to be executed in the case where a rule violation is detected in each examination task. In addition, the flow controller 220 adds a transition from the branch inserted in Step S34 to the normal task.
  • (Step S36) The flow controller 220 transmits the corrected flow information to the flow engine 300. The flow executor 320 of the flow engine 300 stores the flow information received from the flow controller 200 in the flow information storage unit 310.
  • Next described is a rule examination process performed during execution of a workflow. FIG. 18 is a flowchart illustrating the rule examination process.
  • Hereinbelow, the process of FIG. 18 is described according to the step numbers.
  • (Step S41) The rule examining unit 430 of the rule engine 400 selects monitoring rules (i.e., rules having no parent rule) from rules which have been registered in the rule definition table 411 stored in the rule information storage unit 410.
  • (Step S42) The rule examining unit 430 acquires, from the CMDB server 500, configuration information of items included in the rules selected in Step S41.
  • (Step S43) The rule examining unit 430 evaluates the selected rules using the configuration information acquired in Step S42 and determines whether there is a nonconforming rule (for example, whether a corresponding logical expression results in FALSE). In the case where there is at least one nonconforming rule, the process proceeds to Step S44. In the case where there is no nonconforming rule, the process proceeds to Step S48.
  • (Step S44) The rule examining unit 430 refers to the rule definition table 411 to determine whether the nonconforming rule has a lower-level rule (i.e., whether there is a rule having the nonconforming rule as its parent rule). In the case where at least one nonconforming rule has a lower-level rule, the process proceeds to Step S45. In the case where the nonconforming rule has no lower-level rule, the process proceeds to Step S47.
  • (Step S45) The rule examining unit 430 selects the lower-level rule of the nonconforming rule from rules which have been registered in the rule definition table 411.
  • (Step S46) The rule examining unit 430 requests, from the CMDB server 500, configuration information of items included in the rule selected in Step S45. The configuration information collector 530 of the CMDB server 500 collects the requested configuration information from the system resources 40 and transmits the configuration information to the rule engine 400. Note that in the case where the requested configuration information has been collected, the configuration information collector 530 transmits the configuration information stored in the configuration information storage unit 510. The rule examining unit 430 evaluates the rule selected in Step S45 using the acquired configuration information.
  • (Step S47) The rule examining unit 430 determines that there is a rule violation and identifies a rule against which a violation is detected. Then, the rule examining unit 430 reports the violated rule to the flow controller 200. Subsequently, the process ends.
  • (Step S48) The rule examining unit 430 determines that there is no rule violation. In the case of having started the rule examination based on an instruction from the flow controller 200, the rule examining unit 430 reports to the flow controller 200 that there is no rule violation.
  • For example, in the case where the rule definition table 411 of FIG. 9 is stored in the rule information storage unit 410, the rule examining unit 430 examines the item “serviceA” based on the rules R3 and R4. In the case where a violation of the rule R3 is detected, the rule examining unit 430 determines that a violation takes place only against the rule R3 because the rule R3 has no low-level rules. On the other hand, in the case where a violation of the rule R4 is detected, the rule examining unit 430 examines the items “svr1” and “app2” based on the lower-level rules R1 and R2. Then, the rule examining unit 430 identifies all rules, including lower-level rules, against which violations are detected.
  • Next, execution control of workflows is described with reference to three sequence diagrams of FIGS. 19 to 21.
  • FIG. 19 is a first sequence diagram illustrating an example of the execution procedure of a workflow. In FIG. 19, Steps S51 to S56 are performed at the start of the workflow, and Steps S57 to S61 are performed at the time of executing an examination task defined in the workflow.
  • (Step S51) The terminal 100 generates rule information and flow information, which are transmitted to the flow controller 200.
  • (Step S52) The flow controller 200 corrects the flow information received from the terminal 100 and converts the workflow so that a rule examination is performed. At the time of the workflow conversion, the flow controller 200 may refer to the reaction information which has been registered.
  • (Step S53) The flow controller 200 transmits the flow information corrected in Step S52 to the flow engine 300. The flow engine 300 stores the flow information received from the terminal 100.
  • (Step S54) The flow controller 200 transfers the rule information received from the terminal 100 to the rule engine 400.
  • (Step S55) The rule engine 400 develops, into items, item classifications described in the rule information received from the flow controller 200. In addition, the rule engine 400 corrects the rule information by adding rules in such a manner as to reduce the number of monitoring rules (rules used for a continuous examination). At this time, the rule engine 400 refers to the configuration information and the propagation relationship information held by the CMDB server 500, and then stores the corrected rule information.
  • (Step S56) After confirming completion of registration of the flow information and the rule information, the flow controller 200 instructs the flow engine 300 to start the workflow. The flow controller 200 sequentially executes tasks described in the flow information.
  • (Step S57) In the case where a task to be executed next is an examination task (a preliminary examination, an in-execution examination, or a post examination), the flow engine 300 interrupts the workflow and reports the interruption to the flow controller 200.
  • (Step S58) The flow controller 200 instructs the rule engine 400 to perform an examination of the configuration information based on the rule information.
  • (Step S59) The rule engine 400 acquires the configuration information from the CMDB server 500 and evaluates monitoring rules using the configuration information. In the case where a violation of a monitoring rule is detected, the rule engine 400 also evaluates lower-level rules of the monitoring rule.
  • (Step S60) The rule engine 400 reports a result of the examination acquired in Step S59 to the flow controller 200. In the case where a rule violation is detected, the rule engine 400 also reports identification information of the rule against which a violation has been found to the flow controller 200.
  • (Step S61) Based on the examination result reported by the rule engine 400, the flow controller 200 instructs the flow engine 300 on the next operation. When instructing the flow engine 300 on the next operation, the flow controller 200 may refer to the reaction information which has been registered. For example, when a rule violation is not detected, the flow controller 200 transmits “NEXT” (flow continuation). On the other hand, when a rule violation is detected, the flow controller 200 transmits “CANCEL” (flow termination). The flow engine 300 resumes the interrupted workflow, and determines a branch direction described in the flow information in accordance with the instruction of the flow controller 200.
  • FIG. 20 is a second sequence diagram illustrating an example of the execution procedure of a workflow. In FIG. 20, Steps S62 to S65 are dedicated to a process enabling a rule examination to be performed in an appropriate manner at a timing other than a timing at which an examination task is performed. Those steps are carried out immediately after the preliminary examination. Steps S66 to S69 are performed at the time of executing a normal task.
  • (Step S62) After the examination task of the preliminary examination is completed, the flow engine 300 interrupts the workflow and reports the interruption to the flow controller 200.
  • (Step S63) The flow controller 200 instructs the rule engine 400 to configure a setting for enabling an automatic rule examination.
  • (Step S64) The rule engine 400 extracts items included in monitoring rules from the rule information, and reports the extracted items to the CMDB server 500. The CMDB server 500 registers the items reported by the rule engine 400 as items to be monitored.
  • (Step S65) After confirming completion of the registration of the monitoring items, the flow controller 200 instructs the flow engine 300 on the next operation to continue the workflow. The flow engine 300 resumes the interrupted workflow.
  • (Step S66) In the case where a task to be executed next is a normal task, the flow controller 200 performs a process defined in the flow information. Processes defined in the flow information include, for example, transmission of a stop command to an apparatus of the system resources 40, transmission of a command to install an updated program, and transmission of a restart command. At the time of executing a normal task, the flow controller 200 may refer to the configuration information held by the CMDB server 500, and may then update the configuration information.
  • (Step S67) The CMDB server 500 monitors whether configuration information of the monitoring items registered in Step S64 has been changed. The configuration information held by the CMDB server 500 may be changed by the flow controller 200, and may be changed based on information collected from the system resources 40. When detecting a change in the configuration information, the CMDB server 500 reports the change to the rule engine 400.
  • (Step S68) The rule engine 400 acquires the configuration information from the CMDB server 500 and evaluates the monitoring rules using the configuration information. In the case where a violation of a monitoring rule is detected, the rule engine 400 also evaluates lower-level rules of the monitoring rule.
  • (Step S69) When detecting a rule violation in Step S68, the rule engine 400 reports to the flow controller 200 that a rule violation has been detected, along with identification information of the rule against which a violation has been found. When not detecting a rule violation, the rule engine 400 may not make a report accordingly to the flow controller 200. The flow controller 200 instructs the flow engine 300 on termination of the workflow, for example, at a timing when the workflow is interrupted next time.
  • FIG. 21 is a third sequence diagram illustrating an example of the execution procedure of a workflow. In FIG. 21, Steps S70 to S73 are dedicated to a process of canceling the setting of an automatic examination. Those steps are carried out immediately before the post examination. Steps S74 to S77 are performed at the end of the workflow.
  • (Step S70) Before the examination task of the post examination is executed, the flow engine 300 interrupts the workflow and reports the interruption to the flow controller 200.
  • (Step S71) The flow controller 200 instructs the rule engine 400 to cancel the setting for enabling an automatic rule examination.
  • (Step S72) The rule engine 400 extracts items included in monitoring rules from the rule information, and reports the extracted items to the CMDB server 500. The CMDB server 500 deletes the registration of the items reported by the rule engine 400.
  • (Step S73) After confirming completion of the deletion of the monitoring items, the flow controller 200 instructs the flow engine 300 on the next operation to continue the workflow. The flow engine 300 resumes the interrupted workflow.
  • (Step S74) Once the workflow is completed (for example, the post examination task is completed), the flow engine 300 reports the completion to the flow controller 200. The workflow completion may be a normal termination or an abnormal termination.
  • (Step S75) The flow controller 200 instructs the rule engine 400 to delete the rule information. In response to the instruction, the rule engine 400 deletes the rule information.
  • (Step S76) The flow controller 200 instructs the flow engine 300 to delete the flow information. In response to the instruction, the flow engine 300 deletes the flow information.
  • (Step S77) The flow controller 200 reports to the terminal 100 either a normal or an abnormal termination as a result of the workflow execution.
  • According to the information processing system of the second embodiment, multiple rules are integrated in the light of the propagation relationship among items, and continuous examination is performed for the integrated rules. With this, it is possible to reduce the number of rule examinations, thereby reducing the examination load. Additionally, in the case where a violation is found against a higher-level rule, it is possible to determine the cause of a failure by examining multiple lower-level rules under the higher-level rule. Further, the workload of collecting the configuration information can be reduced by avoiding continuously collecting configuration information corresponding to the lower-level rules.
  • In addition, efficiency of rule examination is increased by registering monitoring items in the CMDB server 500, and then using a change in information of the registered items as a trigger for performing the rule examination. In addition, the rule editing screen is generated with reference to the configuration information held by the CMDB server 500 so that rules are described by specifying actually existing items. Providing such a rule editing screen to the user prevents description of incorrect rules as a result of specifying non-existent items. In addition, allowing the user to specify item classifications to thereby describe rules prevents some omissions of rule description.
  • Note that, as mentioned above, the workflow control and the rule examination according to the second embodiment are achieved by causing the terminal 100, the flow controller 200, the flow engine 300, the rule engine 400, and the CMDB server 500, each of which is a computer, to execute a program individually. The program may be recorded in a computer-readable recording medium (for example, the recording medium 33). Examples of the recording medium are a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory. The magnetic disk may be a FD or a HDD. The optical disk may be a CD, a compact disc-recordable (CD-R), a compact disc-rewritable (CD-RW), a DVD, a digital versatile disc-recordable (DVD-R), or a digital versatile disc-rewritable (DVD-RW).
  • In the case of distributing the program, a portable recording medium storing the program thereon, for example, is provided. In addition, the program may be stored in a storage device of another computer and then distributed via the network 50. Each of the above-mentioned computers stores, in a storage device (for example, the HDD 103), the program recorded in the portable recording medium or received from another computer, and reads the program from the storage device and executes the program. Note, however, that the program read from the portable recording medium or received from another computer via the network 50 may be executed directly.
  • According to one embodiment, it is possible to reduce the load of monitoring which uses information of multiple items.
  • All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.

Claims (6)

1. A monitoring method used by an information processing system which monitors one or more apparatuses based on information on a plurality of items acquired from the one or more apparatuses, the monitoring method comprising:
among a first item, a second item, and a third item whose information is associated with the information on the first item and the information on the second item, examining the information on the third item;
omitting examination of the information on the first item and the information on the second item in a case where no failure is detected in the examination of the information on the third item; and
examining the information on the first item and the information on the second item in a case where a failure is detected in the examination of the information on the third item.
2. The monitoring method according to claim 1, further comprising:
when the first item and the second item are specified as examination targets, retrieving the third item based on the first item and the second item by referring to a storage device configured to store relationship information which indicates a relationship among the plural items, in which relationship the information on one item has an effect on the information on another item; and
adding the retrieved third item to the examination targets.
3. The monitoring method according to claim 1, further comprising:
acquiring the information on the third item from the one or more apparatuses; and
acquiring the information on the first item and the information on the second item from the one or more apparatuses after the failure is detected in the examination of the information on the third item.
4. The monitoring method according to claim 1, further comprising:
causing a database apparatus, which collects at least part of the information on the plural items from the one or more apparatuses, to collect the information on the third item, and
when an update of the information on the third item in the database apparatus is detected, examining the updated information on the third item.
5. An information processing apparatus for monitoring one or more apparatuses based on information on a plurality of items acquired from the one or more apparatuses, the information processing apparatus comprising:
a storage device configured to store examination information which indicates, as examination targets, a first item, a second item, and a third item whose information is associated with the information on the first item and the information on the second item; and
a processor configured to examine the information on the first item, the second item, and the third item of the examination targets indicated by the examination information,
wherein the processor examines the information on the third item, and:
omits examination of the information on the first item and the information on the second item in a case where no failure is detected in the examination of the information on the third item; and
examines the information on the first item and the information on the second item in a case where a failure is detected in the examination of the information on the third item.
6. A computer-readable, non-transitory recording medium storing a monitoring program for monitoring one or more apparatuses based on information on a plurality of items acquired from the one or more apparatuses, the program causing a computer to perform a procedure comprising:
among a first item, a second item, and a third item whose information is associated with the information on the first item and the information on the second item, examining the information of the third item;
omitting examination of the information on the first item and the information on the second item in a case where no failure is detected in the examination of the information on the third item; and
examining the information on the first item and the information on the second item in a case where a failure is detected in the examination of the information on the third item.
US13/348,831 2011-03-25 2012-01-12 Determination of items to examine for monitoring Expired - Fee Related US8904234B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011068133A JP5691723B2 (en) 2011-03-25 2011-03-25 Monitoring method, information processing apparatus, and monitoring program
JP2011-068133 2011-03-25

Publications (2)

Publication Number Publication Date
US20120246520A1 true US20120246520A1 (en) 2012-09-27
US8904234B2 US8904234B2 (en) 2014-12-02

Family

ID=46878362

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/348,831 Expired - Fee Related US8904234B2 (en) 2011-03-25 2012-01-12 Determination of items to examine for monitoring

Country Status (2)

Country Link
US (1) US8904234B2 (en)
JP (1) JP5691723B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160277447A1 (en) * 2015-03-17 2016-09-22 Solarflare Communications, Inc. System and apparatus for providing network security
US10079919B2 (en) 2016-05-27 2018-09-18 Solarflare Communications, Inc. Method, apparatus and computer program product for processing data
US10212135B2 (en) 2013-04-08 2019-02-19 Solarflare Communications, Inc. Locked down network interface
WO2020019405A1 (en) * 2018-07-26 2020-01-30 平安科技(深圳)有限公司 Database monitoring method, device and apparatus, and computer storage medium
US10742604B2 (en) 2013-04-08 2020-08-11 Xilinx, Inc. Locked down network interface
US10924483B2 (en) 2005-04-27 2021-02-16 Xilinx, Inc. Packet validation in virtual network interface architecture
CN116450465A (en) * 2023-06-14 2023-07-18 建信金融科技有限责任公司 Data processing method, device, equipment and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6447054B2 (en) * 2014-11-27 2019-01-09 富士通株式会社 Information processing method and information processing program
JPWO2019168167A1 (en) * 2018-03-02 2020-04-16 学校法人立命館 Verification method, verification device, computer program, and verification system
US10831605B2 (en) * 2018-04-27 2020-11-10 Rovi Guides, Inc. System and method for detection of, prevention of, and recovery from software execution failure

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5157668A (en) * 1989-07-05 1992-10-20 Applied Diagnostics, Inc. Method and apparatus for locating faults in electronic units
US5655071A (en) * 1994-04-08 1997-08-05 Telefonaktiebolaget Lm Ericsson Method and a system for distributed supervision of hardware
US5761429A (en) * 1995-06-02 1998-06-02 Dsc Communications Corporation Network controller for monitoring the status of a network
US20020054169A1 (en) * 1998-05-29 2002-05-09 Richardson David E. Method and apparatus for dynamically drilling-down through a health monitoring map to determine the health status and cause of health problems associated with network objects of a managed network environment
US20020113816A1 (en) * 1998-12-09 2002-08-22 Frederick H. Mitchell Method and apparatus providing a graphical user interface for representing and navigating hierarchical networks
US20030005486A1 (en) * 2001-05-29 2003-01-02 Ridolfo Charles F. Health monitoring display system for a complex plant
US6690274B1 (en) * 1998-05-01 2004-02-10 Invensys Systems, Inc. Alarm analysis tools method and apparatus
US20040139371A1 (en) * 2003-01-09 2004-07-15 Wilson Craig Murray Mansell Path commissioning analysis and diagnostic tool
US20060029085A1 (en) * 2004-01-16 2006-02-09 Booman Gordon A Methods and apparatus for information processing and display for network
US20080117068A1 (en) * 2006-11-16 2008-05-22 Mark Henrik Sandstrom Intelligent Network Alarm Status Monitoring
US20090313508A1 (en) * 2008-06-17 2009-12-17 Microsoft Corporation Monitoring data categorization and module-based health correlations
US7693042B1 (en) * 1999-06-23 2010-04-06 At&T Mobility Ii Llc Intelligent presentation network management system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0844583A (en) * 1994-07-27 1996-02-16 Oki Electric Ind Co Ltd Diagnostic system for information processor
JP4945935B2 (en) * 2005-06-22 2012-06-06 日本電気株式会社 Autonomous operation management system, autonomous operation management method and program
JP5146750B2 (en) * 2008-05-29 2013-02-20 オムロン株式会社 FT diagram creation program, FT diagram creation device, recording medium, and FT diagram creation method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5157668A (en) * 1989-07-05 1992-10-20 Applied Diagnostics, Inc. Method and apparatus for locating faults in electronic units
US5655071A (en) * 1994-04-08 1997-08-05 Telefonaktiebolaget Lm Ericsson Method and a system for distributed supervision of hardware
US5761429A (en) * 1995-06-02 1998-06-02 Dsc Communications Corporation Network controller for monitoring the status of a network
US6690274B1 (en) * 1998-05-01 2004-02-10 Invensys Systems, Inc. Alarm analysis tools method and apparatus
US20020054169A1 (en) * 1998-05-29 2002-05-09 Richardson David E. Method and apparatus for dynamically drilling-down through a health monitoring map to determine the health status and cause of health problems associated with network objects of a managed network environment
US20020113816A1 (en) * 1998-12-09 2002-08-22 Frederick H. Mitchell Method and apparatus providing a graphical user interface for representing and navigating hierarchical networks
US7693042B1 (en) * 1999-06-23 2010-04-06 At&T Mobility Ii Llc Intelligent presentation network management system
US20030005486A1 (en) * 2001-05-29 2003-01-02 Ridolfo Charles F. Health monitoring display system for a complex plant
US20040139371A1 (en) * 2003-01-09 2004-07-15 Wilson Craig Murray Mansell Path commissioning analysis and diagnostic tool
US20060029085A1 (en) * 2004-01-16 2006-02-09 Booman Gordon A Methods and apparatus for information processing and display for network
US20080117068A1 (en) * 2006-11-16 2008-05-22 Mark Henrik Sandstrom Intelligent Network Alarm Status Monitoring
US20090313508A1 (en) * 2008-06-17 2009-12-17 Microsoft Corporation Monitoring data categorization and module-based health correlations

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10924483B2 (en) 2005-04-27 2021-02-16 Xilinx, Inc. Packet validation in virtual network interface architecture
US10742604B2 (en) 2013-04-08 2020-08-11 Xilinx, Inc. Locked down network interface
US10212135B2 (en) 2013-04-08 2019-02-19 Solarflare Communications, Inc. Locked down network interface
US10999246B2 (en) 2013-04-08 2021-05-04 Xilinx, Inc. Locked down network interface
US9807117B2 (en) * 2015-03-17 2017-10-31 Solarflare Communications, Inc. System and apparatus for providing network security
US20190020690A1 (en) * 2015-03-17 2019-01-17 Solarflare Communications, Inc. System and Apparatus for Providing Network Security
US11489876B2 (en) 2015-03-17 2022-11-01 Xilinx, Inc. System and apparatus for providing network security
US10601874B2 (en) * 2015-03-17 2020-03-24 Xilinx, Inc. System and apparatus for providing network security
US20160277447A1 (en) * 2015-03-17 2016-09-22 Solarflare Communications, Inc. System and apparatus for providing network security
US10601873B2 (en) * 2015-03-17 2020-03-24 Xilinx, Inc. System and apparatus for providing network security
US10798228B2 (en) 2016-05-27 2020-10-06 Xilinx, Inc. Method, apparatus and computer program product for processing data
US10827044B2 (en) 2016-05-27 2020-11-03 Xilinx, Inc. Method, apparatus and computer program product for processing data
US10079919B2 (en) 2016-05-27 2018-09-18 Solarflare Communications, Inc. Method, apparatus and computer program product for processing data
US11425231B2 (en) 2016-05-27 2022-08-23 Xilinx, Inc. Method, apparatus and computer program product for processing data
WO2020019405A1 (en) * 2018-07-26 2020-01-30 平安科技(深圳)有限公司 Database monitoring method, device and apparatus, and computer storage medium
CN116450465A (en) * 2023-06-14 2023-07-18 建信金融科技有限责任公司 Data processing method, device, equipment and medium

Also Published As

Publication number Publication date
JP5691723B2 (en) 2015-04-01
JP2012203681A (en) 2012-10-22
US8904234B2 (en) 2014-12-02

Similar Documents

Publication Publication Date Title
US8904234B2 (en) Determination of items to examine for monitoring
US7051243B2 (en) Rules-based configuration problem detection
CA2763547C (en) Fix delivery system
US20120117226A1 (en) Monitoring system of computer and monitoring method
US20180060414A1 (en) Language tag management on international data storage
US11327742B2 (en) Affinity recommendation in software lifecycle management
US10135693B2 (en) System and method for monitoring performance of applications for an entity
US20140380280A1 (en) Debugging tool with predictive fault location
US9910726B2 (en) System dump analysis
US20100077257A1 (en) Methods for disaster recoverability testing and validation
US20210342146A1 (en) Software defect prediction model
CN115989483A (en) Automated root cause analysis and prediction for large dynamic process execution systems
US20150121145A1 (en) Synchronized debug information generation
US20150370619A1 (en) Management system for managing computer system and management method thereof
US20140067886A1 (en) Information processing apparatus, method of outputting log, and recording medium
US10977108B2 (en) Influence range specifying method, influence range specifying apparatus, and storage medium
US9864964B2 (en) Job monitoring support method and information processing apparatus
US20170109331A1 (en) Managing changes to a document in a revision control system
US20170085460A1 (en) Benchmarking servers based on production data
JP6336919B2 (en) Source code review method and system
US20160291803A1 (en) Information processing apparatus and storage system
JP2009134535A (en) Device for supporting software development, method of supporting software development, and program for supporting software development
US9985833B2 (en) Method and apparatus for software detection
US11567800B2 (en) Early identification of problems in execution of background processes
US20220164219A1 (en) Processing system, processing method, higher-level system, lower-level system, higher-level program, and lower-level program

Legal Events

Date Code Title Description
AS Assignment

Owner name: FUJITSU LIMITED, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MATSUBARA, MASAZUMI;SEKIGUCHI, ATSUJI;SHIMADA, KUNIAKI;AND OTHERS;SIGNING DATES FROM 20111207 TO 20111208;REEL/FRAME:027552/0814

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Expired due to failure to pay maintenance fee

Effective date: 20181202