US20010020282A1 - External storage - Google Patents

External storage Download PDF

Info

Publication number
US20010020282A1
US20010020282A1 US09/835,494 US83549401A US2001020282A1 US 20010020282 A1 US20010020282 A1 US 20010020282A1 US 83549401 A US83549401 A US 83549401A US 2001020282 A1 US2001020282 A1 US 2001020282A1
Authority
US
United States
Prior art keywords
controller
host system
failed
port
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/835,494
Other versions
US6412078B2 (en
Inventor
Akira Murotani
Toshio Nakano
Hidehiko Iwasaki
Kenji Muraoka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US09/835,494 priority Critical patent/US6412078B2/en
Publication of US20010020282A1 publication Critical patent/US20010020282A1/en
Application granted granted Critical
Publication of US6412078B2 publication Critical patent/US6412078B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • G06F11/2092Techniques of failing over between control units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1466Management of the backup or restore process to make the backup process non-disruptive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2017Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where memory access, memory control or I/O control functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2005Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication controllers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2007Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication media

Definitions

  • JP-A-4-364514 describes a system in which the controllers are arranged in a multiplex configuration such that I/O requests from a host apparatus to storages connected to the plural controllers are processed at a high speed.
  • the controllers are arranged in a multiplex configuration such that I/O requests from a host apparatus to storages connected to the plural controllers are processed at a high speed.
  • the host system alters the specification of the controller to execute the I/O request
  • the I/O request is processed by a normal controller.
  • considerations have not been given to a procedure in which when failure occurs in a controller, the process is transferred to a normal controller for the execution thereof without intervention of the host system.
  • the host system regards the state as a permanent error and hence does not thereafter issue any I/O request to the failed controller.
  • the host system Upon failure of a controller in the conventional system, when the host system recognizes the permanent error, the data process thereof is interrupted. Therefore, even when there are disposed a plurality of controllers, user intervention is required to continuously execute the data process of the host system when failure occurs in the Pertinent controller.
  • a normal controller has a function to receive control information of the failed controller and a function to reference the port address of the failed controller to add the contents thereof to its own port address. Furthermore, the normal controller possesses a function to reset the port address in the failed controller to thereby erase the port address.
  • the normal controller can receive the port address and control information of the failed controller and accept and execute the I/O request issued to the failed controller.
  • a method may be employed in which the port address is reset by the pertinent failed controller.
  • the normal controller monitors a bus such as an SCSI bus upon detection of the failure to thereby decide whether or not the failed controller has already received the I/O request from the host system.
  • the transfer of the port address and control information of the failed controller is terminated to prevent the host system from recognizing the permanent error so as to continue the process of the host system without any intervention by the user or host system.
  • a normal controller detects the state, completes reception of the port address and control information, and resets the failed controller within the I/O monitor time of the host system. This makes it possible that any subsequent I/O requests to the failed controller can received for execution thereof by the normal controller. As a result, the system can respond to the I/O request re-issued from the host system and hence the interruption of the process of the host system as well as the inhibition of issuance of I/O requests from the host system can be prevented.
  • the normal controller can suppress I/O requests from the host system to the failed controller. Therefore, when the failed controller has not yet received the I/O request, the host system need not recognize the error and any subsequent I/O requests can be received by the normal controller, thereby implementing the nonstop system operation.
  • FIG. 1 is a hardware configuration diagram showing an embodiment of the present invention
  • FIG. 2 is a diagram of processing sequence of host system at failure of a controller in the embodiment of FIG. 1;
  • FIG. 3 is a diagram briefly showing processing to be executed depending on states of the disk subsystem in the embodiment of FIG. 1;
  • FIG. 4 is a flowchart of processing executed upon detection of the controller failure, specifically, processing executed when the SCSI bus is in the bus free state in the embodiment of FIG. 1;
  • FIG. 5 is a flowchart of processing executed upon detection of the controller failure, specifically, processing executed when the bus is in use in the embodiment of FIG. 1;
  • FIG. 6 is a hardware configuration diagram of another embodiment according to the present invention.
  • FIG. 7 is a schematic diagram showing a method of implementing the SCSI-ID transfer in the configuration of the embodiment of FIG. 6.
  • reference numerals 10 and 20 indicate host systems as central processors to conduct data processing and a numeral 70 denotes a disk array subsystem as a peripheral unit in a dual controller structure.
  • a numeral 60 designates standalone disks for storing therein data of the host systems
  • numerals 30 and 40 are controllers to supervise data transfers between the host systems 10 , 20 and the standalone disks 60
  • numeral 50 stands for a shared memory to transmit information between the controllers 30 , 40 .
  • Reference numeral 71 indicates another peripheral unit including an input/output (I/O) device 72 and a controller 73 to control the I/O device 72 .
  • I/O input/output
  • the host systems 10 and 20 are connected via an SCSI bus to the controllers 30 , 40 , and 73 .
  • numeral 31 indicates an SCSI port to control an SCSI bus on the host system side
  • numeral 32 is a cache memory
  • numeral 33 denotes a device-side SCSI port to control the SCSI bus connecting the standalone disks to the controller 30
  • numeral 34 designates a microprocessor to control overall operations of the controller 30
  • numeral 35 is a port address resetting facility to reset the SCSI port of the controller 40
  • numeral 36 is a data transfer controller to execute a data transfer between the host system 10 and the cache memory 32
  • numeral 37 indicates an array data transfer controller to execute a data transfer between the cache memory 32 and the standalone disk 60 .
  • the data transfer controller 36 has a function to write, when transferring data from the host system 10 to the cache memory 32 , the contents of data in the cache memory 42 of the controller 40 as well.
  • the array data transfer controller 37 possesses a function to generate redundant data for data buffered in the cache memory 32 . This function can also be employed to restore data.
  • the controllers 30 and 40 mutually have the same configuration. Specifically, for each constituent element of the controller 30 , a reference number obtained by adding ten to the reference number of the constituent element indicates a partner or associated constituent element in the controller 40 .
  • the port address resetting facility 45 can reset the SCSI port 31 of the controller 30 .
  • the port address resetting facilities 35 and 45 reset port addresses, i.e., SCSI-IDs preserved by the SCSI ports 41 and 31 in the respective controllers 30 and 40 . According to the SCSI standards, the SCSI-IDs can be erased in the next arbitration phase.
  • the I/O process flow will be described according to an example in which the host system 10 achieves a data transfer via the controller 30 .
  • the host system 10 issues an I/O request with an SCSI-ID designating the controller 30 .
  • the SCSI port 31 keeping the SCSI-ID therein receives the I/O request and then passes the request to the microprocessor 34 .
  • the microprocessor 34 analyzes the I/O request and then instructs the data transfer controller 36 to execute a data transfer between the host system 10 and the disk 60 .
  • the transfer data is provisionally buffered in the cache memory 32 and is then written also in the cache memory 42 in contemplation of a possible failure in the controller 30 .
  • the SCSI-ID is set by the microprocessor 34 at initialization of the SCSI port 31 , for example, when the system is powered.
  • the SCSI-ID is saved in the shared memory 50 at the same time. Also stored in the shared memory 50 is control information so that the process can be continuously, executed by a normal controller when one of the controller system fails in the dual controller configuration.
  • numeral 81 is an application program for executing data processing to perform various requests from the user
  • numeral 82 denotes a file system for keeping therein data structure and controlling I/O requests
  • numeral 83 indicates a device driver for converting an I/O request into a request mode suitable for a peripheral unit
  • numeral 84 stands for an SCSI card for transmitting an I/O request to the SCSI bus
  • numeral 85 is a transfer I/O buffer
  • numeral 86 designates a system log in which failure information of the host systems is accumulated.
  • the device driver 83 issues a Request Sense command to receive Sense Data which is detailed failure information. According to the Sense Data, the device driver 83 recognizes the state of the controller 30 . As a result, the driver 83 issues again (retries) the same I/O request. Since the failed controller 30 cannot either execute the re-issued I/O request, the device driver 83 instructs an operation to discard the process associated with the I/O request and repeats the operation, for example, by Retry after an Abort message. After this operation, the driver 83 recognizes the state as a permanent error to notify the condition to the file system 82 .
  • the file system 82 receives the permanent error report, the file system 82 does not thereafter issue any I/O request to the disk subsystem 70 .
  • the file system 82 then erases non-reflection data of the I/O buffer 85 and records a failure occurrence in the system log, and then sends an error message via the application program 81 to the user. Consequently, the integrity of updated data cannot be preserved between the application program 81 , file stem 2 , and disk subsystem depending on cases. Consequently, in any case to which the present invention is not applied, the user is required to stop the application program and the like to restore the disk subsystem so as to thereafter execute again a sequence of processes possibly having caused the mismatching of data in the host system.
  • the controller 30 cannot report Check Condition to the device driver 83 even when failure occurs. Namely, the controller 30 does not notify the occurrence of the failure to the device driver 83 .
  • the device driver 83 checks the state of the disk subsystem by monitoring the state according to a fixed period of time indicated by a timer. When the response is not received within the fixed period of time, the device driver conducts, as in the example above, the process beginning at the re-issuance (retry) of the same I/O request.
  • controllers 30 and 40 update monitor information items of the respective controllers in the shared memory 50 at a fixed interval of time; moreover, the controllers mutually reference monitor information thereof.
  • the monitor information of the controller 30 in the shared memory 50 is updated by the controller 30 to information indicating the failure, or the information is not updated even when a fixed period of time lapses.
  • the controller 40 detects the failure of the controller 30 , reads the SCSI-ID of the SCSI port 31 and control information of the controller 30 from the shared memory 50 , and adds by the microprocessor 44 the SCSI-ID of the SCSI port 31 to the SCSI port 41 .
  • the controller 40 erases the SCSI-ID possessed by the SCSI port 31 . This enables the SCSI port 41 to accept an I/O request issued from the host system 20 and an I/O request issued from the host system 10 so thale' the retry of the host system 10 is received for execution thereof by the controller 40 .
  • control information includes transit information in relation to transfers of data from the cache memories 32 and 42 to standalone disks 60 . Consequently, upon receiving the control information, the controller 40 can transfer, in place of the controller 30 , the duplicated data written in the cache memory 42 , as alternative data of the Write data maintained as non-reflection data in the cache memory 32 .
  • the associated processing is required to be appropriately accomplished according to the state of the controller 30 . Otherwise, the transfers cannot be correctly carried out.
  • the status of the failed controller 30 more specifically, the state of reception by the failed controller 30 of the I/O request from the host system is determined on the basis of the usage state (signal state) of the SCSI bus.
  • the controller 40 monitors the utilization status (signal state) of the SCSI bus 80 to decide whether or not the controller 30 has already received the I/O request from the host system 10 , thereby executing a process associated with the decision.
  • the SCSI bus 80 is possibly in the bus free state when a failure is detected in the controller 30 .
  • the SCSI bus 80 is possibly in the bus free state. Since the controller 30 has not yet received the I/O request, the controller 40 executes a host operation (the initiator operation) such that the controller 40 selects the controller 30 to exclusively occupy the SCSI bus 80 . This makes it possible to suppress the issuance of an I/O request from the host system 10 such that the controller 40 conducts the transfer of the SCSI-ID during this period.
  • the controller 40 In one of the utilization statuses of the SCSI bus 80 , it may be possible that the controller 40 is executing an I/O process through the SCSI bus 80 when a failure is detected in the controller 30 . In this situation, it may be possible that the controller 40 is executing an I/O process through the SCSI bus 80 . On this occasion, the controller 30 has not received the I/O request and hence the SCSI bus 80 is set to the bus free state at termination of the I/O process and an I/O request may possibly be issued from the host system 10 . To overcome this difficulty, the controller 40 also completely executes the SCSI-ID transfer during the execution of the pertinent I/O process. If the SCSI-ID transfer is not completed during the execution of the pertinent I/O, the controller 40 does not send the report of the I/O termination status until the ID transfer is completely finished.
  • the SCSI bus is possibly being used when a failure is detected in the controller 30 .
  • the system is in a state in which the arbitration or selection is being executed according to the SCSI standards, a state in which another SCSI device connected to the SCSI bus 80 is using the SCSI bus 80 , or a state in which the controller 30 has already received the I/O request from the host system 10 .
  • the controller 40 monitors the BSY signal of the SCSI bus 80 .
  • the BSY signal continues for a period of time equal to or more than the period of time in which the arbitration phase is changed via the selection phase to the message out phase according to the SCSI standards, it can be decided that the signal is the BSY signal indicating an I/O process in execution, not the BSY signal of the bus mastership arbitration.
  • the controller 40 executes the SCSI-ID transfer process at a high speed.
  • the controller 30 has not received the I/O request. Therefore, the controller 40 achieves the transfer process at a high speed while another SCSI device is using the SCSI bus 80 .
  • the controller 40 If the controller 30 has already received the I/O request from the host system 10 , the failed controller 30 has already stopped its operation with the SCSI bus 80 exclusively possessed by the controller 30 . Since the device driver 83 is monitoring the I/O operation by the internal timer, the controller 40 is required to execute the SCSI-ID transfer before the host system 10 conducts the Bus Reset and Retry so that the controller 40 responds to the Retry.
  • the monitor period of the controller 40 to monitor the SCSI bus 80 is shorter than the I/O process monitor period of the host system 10 . Consequently, the controller 40 is required to completely achieve the SCSI-ID transfer prior to the bus resetting indication from the host system. This can be satisfactorily achieved due to the provision described above.
  • the controller 40 Since the SCSI bus 80 is in the bus free state (step 400 ), the controller 40 recognizes that the controller 30 has not yet received the I/O request from the host system 10 . The controller 40 then instructs the SCSI port 41 to start the initiator operation to participate in the arbitration of, the SCSI bus 80 (step 401 ).
  • the controller 40 when the controller 40 remains in the arbitration (Y in step 402 ), the controller 40 specifies in the selection phase the SCSI-ID of the SCSI port 31 of the failed controller 30 . In this situation, even if a failure occurs in the controller 30 , the SCSI port 31 normally functions in most cases. Consequently, there is set a state in which the SCSI port 31 of the controller 30 exclusively occupies the SCSI bus 80 (step 404 ). In this state, the controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 405 ) and then resets the SCSI port 31 (step 406 ).
  • the SCSI bus 81 exclusively occupied by the controller 30 is released by resetting the SCSI port 31 and is returned to the bus free state. Thereafter, the controller 40 receives the I/O request from the host system 10 (step 413 ). The I/O process 5 continue in this way without any intervention by the user.
  • step 403 it is decided whether or not the controller 40 is selected by the host system 20 in the selection phase (step 403 ). If the controller 40 is selected by the host system (Y in step 403 ), there is set a state in which the controller 40 dedicatedly occupies the SCSI bus 80 . In this state, the controller 40 receives the I/O request from the host system (step 407 ) and then provisionally interrupts the processing. The controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 408 ) and then resets the SCSI port 31 (step 409 ).
  • the controller 40 executes the I/O request from the host system (step 410 ) and then restores the SCSI bus 80 to the bus free state. At this point, the controller 40 receives the I/O request from the host system 10 (step 413 ).
  • the controller 40 assumes a state in which the controller 30 having received the I/O request from the host system 10 or another SCSI device dedicatedly occupies the SCSI bus 80 . In this situation, while the state is kept unchanged, the controller 40 adds the SCSI-ID possessed by the SCSI port 31 (step 411 ) to the SCSI port 41 and then resets the SCSI port 31 (step 412 ). If the controller 30 exclusively occupies the SCSI bus 80 , the SCSI bus 80 is restored to the bus free state by resetting the SCSI port 31 .
  • the SCSI bus 80 is restored to the bus free state when the I/O process of the SCSI device is terminated. Thereafter, the controller 40 accepts the I/O request from the host system 10 (step 413 ).
  • the controller 40 first determines whether or not the controller 40 is executing an I/O request from the host system (step 501 ). If this is not the case (No in step 501 ), the controller 40 continuously monitors the state of the SCSI bus 80 for a period of time equivalent to the period in which the arbitration phase according to the SCSI standards is changed via the selection phase to the message out phase (step 502 ).
  • step 506 At detection of the failure, if the controller 40 is executing an I/O operation (Y in step 501 ) or the controller 40 is selected by the host system during the monitor operation of the SCSI bus 80 (left branch in step 502 ), there is assumed a state in which the SCSI bus 80 is exclusively occupied by the controller 40 and the controller 30 has not received the I/O request. In this state, prior to reporting the termination status of the I/O execution (step 503 ), the controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 504 ) and then resets the SCSI port 31 (step 505 ). After resetting the port 31 , the controller 40 notifies the I/O termination status and then terminates the I/O operation (step 506 ).
  • the SCSI bus 80 is set to the bus free state when the I/O execution process is terminated, and the controller 40 receives any subsequent I/O request from the host system 10 . In this fashion, it is possible to continuously execute the I/O process without user intervention.
  • step 502 If the controller 40 is not executing an I/O operation and the SCSI bus 80 is not released during the monitor operation (right branch in step 502 ), the controller 40 recognizes that the controller 30 or another SCSI device exclusively occupying the SCSI bus is executing an I/O operation. Continuing the SCSI bus monitoring operation (step 508 ), the controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 509 ) and then resets the SCSI port 31 (step 510 ).
  • the controller 30 exclusively occupies the SCSI bus 80
  • the bus 80 is returned to the bus free state by resetting the SCSI port 31 .
  • the bus 80 is returned to the bus free state when the I/O operation of the SCSI device is terminated. Thereafter, the controller 40 receives the I/O request from the host system 10 . If the bus is released before the SCSI port 31 is completely reset (broken line in step 508 ), there is executed the process at detection of the bus free state shown in FIG. 4.
  • the I/O request from the host system 10 can be executed by the controller 40 when a failure occurs in the controller 30 , thereby preventing the permanent error. Consequently, the data processing of the system 10 can be normally continued.
  • FIG. 6 is a diagram showing the configuration developed by removing the port address resetting facility from the controller of FIG. 1.
  • Numerals 90 and 100 indicate controllers respectively conducting functions of the controllers 30 and 40 of FIG. 1 and a numeral 50 indicates a shared memory to supply information between the controllers 90 and 100 .
  • a numeral 34 is a microprocessor controlling overall operation of the controllers
  • numeral 31 indicates an SCSI port which can be controlled only by the microprocessor 34
  • numeral 32 denotes a cache memory
  • numeral 33 stands for a device-side SCSI port
  • numeral 36 designates a data transfer controller
  • a numeral 37 is an array data transfer controller.
  • the controllers 90 and 100 are of the same configuration. In the following paragraphs, description will be given of an example in which the controller 90 receives an I/O request from the host system 10 of FIG. 1 and the controller 100 receives an I/O request from the host system of FIG. 1.
  • FIG. 7 is a diagram showing an SCSI-ID transfer processing procedure with its abscissa representing lapse of time.
  • the controller 100 detects the failure and then sets at a particular address in the shared memory 50 a failure flag indicating the occurrence of the failure in the controller 90 . Thereafter, the controller 100 reads the SCSI-ID of the SCSI port 31 and control information of the controller 90 from the shared memory 50 , and adds by the microprocessor 44 the SCSI-ID to the SCSI port 41 . In contrast thereto, the controller 90 recognizes its own failure according to the failure flag in the shared memory 50 and enters a wait state in which by use of an internal timer, the controller 90 does not execute its own operation for a period of time equivalent to the period of time in which the transfer processing of the controller 100 is completely executed.
  • the controller 90 determines through the wait operation the completion of the processing of the controller 100 and then erases by the microprocessor 34 the SCSI-ID possessed by the SCSI port 31 . As a result, the SCSI-ID transfer process is terminated and then the SCSI port 41 is enabled to receive the I/O request from the host system of FIG. 1.
  • the present invention is also effective in the configuration not including the port address resetting facility. It is also to be assumed that when a failure occurs in the controller 90 , the microprocessor 34 and SCSI port 31 function normally.

Abstract

In an external storage, an I/O process is continued without any intervention of a user or a host system at failure of a controller. When a failure occurs in a controller, a host system 10 recognizes the failure of the controller. Before the failure is notified to the user and application to stop the job, the substitutive controller reads the SCSI-ID possessed by an SCSI port of the failed controller from a shared memory, registers the SCSI-ID of the SCSI port to the SCSI port associated with the substitutive controller, and erases by a port address resetting facility 45 of the substitutive controller the SCSI-ID possessed by an SCSI port of the failed controller. Thanks to the provision, since the SCSI-ID specified at issuance of an I/O request is transferred between the controllers, the user or the host system need not alter the I/O request issuing route. Moreover, while the host system does not recognize the error, the transfer can be conducted.

Description

    BACKGROUND OF THE INVENTION
  • The present invention relates to a technology to guarantee high reliability in operation of a plurality of controllers for input/output (I/O) devices in a computer system, and in particular, to a method of redundantly arranging controllers capable of transferring a process therebetween without intervention of the user and host systems when failure occurs in one of the controllers in an external storage subsystem adopting a Small Computer Systems Interface (SCSI) in which the controllers are arranged at least in a duplicated configuration and the controllers can be accessed from the host systems. [0001]
  • In a system configuration employing the SCS in which a plurality of controllers and a storage shared between at least two controllers are connected by an interface cable in a daisy chain to the host systems, the plural controllers respectively have different port addresses such as SCSI-IDS. Ordinarily, these controllers process I/O requests designated according to pertinent port addresses specified by the host systems. [0002]
  • JP-A-4-364514 describes a system in which the controllers are arranged in a multiplex configuration such that I/O requests from a host apparatus to storages connected to the plural controllers are processed at a high speed. In such a conventional system, when failure occurs in one of the controllers, and when the host system alters the specification of the controller to execute the I/O request, it is possible that the I/O request is processed by a normal controller. However, in a system in which the host system and the plural controllers are connected to each other in a daisy chain, considerations have not been given to a procedure in which when failure occurs in a controller, the process is transferred to a normal controller for the execution thereof without intervention of the host system. [0003]
  • After issuing an I/O request to a controller, the host system ordinarily monitors termination of the I/O request by a timer in the host system. When the I/O is not terminated even when the monitor time predetermined by the host system lapses after the issuance of the I/O request, the host system assumes the state temporarily as an error. Conducting processes such as bus recovery process of an SCSI bus, the host system tries to re-issue the same I/O request with specification of the port address of the failed controller. [0004]
  • When the controller does not respond to the reissued I/O request, the host system regards the state as a permanent error and hence does not thereafter issue any I/O request to the failed controller. Upon failure of a controller in the conventional system, when the host system recognizes the permanent error, the data process thereof is interrupted. Therefore, even when there are disposed a plurality of controllers, user intervention is required to continuously execute the data process of the host system when failure occurs in the Pertinent controller. [0005]
  • Furthermore, when there are disposed a plurality of host systems, and when a controller fails and enters a hang-up situation with the bus occupied by the failed controller, another data process being executed between another host system and another controller is also interrupted. User intervention is also required to recover the interrupted data process. [0006]
  • SUMMARY OF THE INVENTION
  • It is therefore an object of the present invention to provide a failure recovery method and system in which when a failure occurs in a controller, the process thereof is transferred to a normal controller to continuously perform the data process without any intervention by the host system or user. [0007]
  • Additionally, when the failed controller has not yet received the I/O request from the host system and hence the error has not been assumed, it is necessary to possibly suppress I/O requests to the failed controller to prevent an abnormal operation. Consequently, in accordance with the present invention, the transfer of the port address and control information is executed after suppressing an event in which the host systems issue I/O requests thereto. [0008]
  • To achieve the object above according to the present invention, a normal controller has a function to receive control information of the failed controller and a function to reference the port address of the failed controller to add the contents thereof to its own port address. Furthermore, the normal controller possesses a function to reset the port address in the failed controller to thereby erase the port address. [0009]
  • Due to these functions, the normal controller can receive the port address and control information of the failed controller and accept and execute the I/O request issued to the failed controller. In the operation, a method may be employed in which the port address is reset by the pertinent failed controller. [0010]
  • Moreover, according to the present invention, there is disposed a function that the normal controller monitors a bus such as an SCSI bus upon detection of the failure to thereby decide whether or not the failed controller has already received the I/O request from the host system. When the failed controller has already received the I/O request from the host system, the transfer of the port address and control information of the failed controller is terminated to prevent the host system from recognizing the permanent error so as to continue the process of the host system without any intervention by the user or host system. [0011]
  • In addition, when the normal controller is executing an I/O process upon detection of a failure in a controller, it is assumed that the failed controller does not yet receive the I/O request from the host s:iste'M. According to the present invention, there is provided a function to detect the condition such, that the transfer of the port address and control information of the failed controller is accomplished during the I/O process execution of the normal controller. [0012]
  • As a result, I/O requests from the host system to the failed controller can be suppressed until the port address transfer process is completed. In addition, when a bus such as an SCSI bus is not being used by any controller upon detection of the failure, it is considered that the failed controller has not yet received the I/O request from the host system. According to the present invention, there is provided a function in which the condition is detected and the normal controller selects the failed controller such that the transfer of the port address and control information is executed after the selection is accomplished. Due to this function, I/O requests from the host system to the failed controller can be suppressed until the port address transfer process is completed. Owing to adoption of the construction of this type, in a situation in which a failed controller have received an I/O request and the execution of the I/O process has not been terminated with a bus such as an SCSI bus kept exclusively reserved by the failed controller, a normal controller detects the state, completes reception of the port address and control information, and resets the failed controller within the I/O monitor time of the host system. This makes it possible that any subsequent I/O requests to the failed controller can received for execution thereof by the normal controller. As a result, the system can respond to the I/O request re-issued from the host system and hence the interruption of the process of the host system as well as the inhibition of issuance of I/O requests from the host system can be prevented. [0013]
  • Moreover, upon detection of a failure in a controller, the normal controller can suppress I/O requests from the host system to the failed controller. Therefore, when the failed controller has not yet received the I/O request, the host system need not recognize the error and any subsequent I/O requests can be received by the normal controller, thereby implementing the nonstop system operation. [0014]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and other objects and advantages of the present invention will become apparent by reference to the following description and accompanying drawings wherein: [0015]
  • FIG. 1 is a hardware configuration diagram showing an embodiment of the present invention; [0016]
  • FIG. 2 is a diagram of processing sequence of host system at failure of a controller in the embodiment of FIG. 1; [0017]
  • FIG. 3 is a diagram briefly showing processing to be executed depending on states of the disk subsystem in the embodiment of FIG. 1; [0018]
  • FIG. 4 is a flowchart of processing executed upon detection of the controller failure, specifically, processing executed when the SCSI bus is in the bus free state in the embodiment of FIG. 1; [0019]
  • FIG. 5 is a flowchart of processing executed upon detection of the controller failure, specifically, processing executed when the bus is in use in the embodiment of FIG. 1; [0020]
  • FIG. 6 is a hardware configuration diagram of another embodiment according to the present invention; and [0021]
  • FIG. 7 is a schematic diagram showing a method of implementing the SCSI-ID transfer in the configuration of the embodiment of FIG. 6. [0022]
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Description will now be given in detail of an embodiment according to the present invention. [0023]
  • In FIG. 1, [0024] reference numerals 10 and 20 indicate host systems as central processors to conduct data processing and a numeral 70 denotes a disk array subsystem as a peripheral unit in a dual controller structure. In the constitution of the disk array subsystem 70, a numeral 60 designates standalone disks for storing therein data of the host systems, numerals 30 and 40 are controllers to supervise data transfers between the host systems 10,20 and the standalone disks 60, and numeral 50 stands for a shared memory to transmit information between the controllers 30,40. Reference numeral 71 indicates another peripheral unit including an input/output (I/O) device 72 and a controller 73 to control the I/O device 72.
  • The [0025] host systems 10 and 20 are connected via an SCSI bus to the controllers 30, 40, and 73. In the constitution of the controller 30, numeral 31 indicates an SCSI port to control an SCSI bus on the host system side, numeral 32 is a cache memory, numeral 33 denotes a device-side SCSI port to control the SCSI bus connecting the standalone disks to the controller 30, numeral 34 designates a microprocessor to control overall operations of the controller 30, numeral 35 is a port address resetting facility to reset the SCSI port of the controller 40, numeral 36 is a data transfer controller to execute a data transfer between the host system 10 and the cache memory 32, and numeral 37 indicates an array data transfer controller to execute a data transfer between the cache memory 32 and the standalone disk 60.
  • The [0026] data transfer controller 36 has a function to write, when transferring data from the host system 10 to the cache memory 32, the contents of data in the cache memory 42 of the controller 40 as well. In addition, the array data transfer controller 37 possesses a function to generate redundant data for data buffered in the cache memory 32. This function can also be employed to restore data.
  • The [0027] controllers 30 and 40 mutually have the same configuration. Specifically, for each constituent element of the controller 30, a reference number obtained by adding ten to the reference number of the constituent element indicates a partner or associated constituent element in the controller 40. The port address resetting facility 45 can reset the SCSI port 31 of the controller 30. The port address resetting facilities 35 and 45 reset port addresses, i.e., SCSI-IDs preserved by the SCSI ports 41 and 31 in the respective controllers 30 and 40. According to the SCSI standards, the SCSI-IDs can be erased in the next arbitration phase.
  • In addition, since the [0028] data transfer controller 36 has a function to write data in the cache memory 32, any data items transferred from the host systems 10 and 20 are redundantly buffered in the respective cache memories 32 and 42. Accordingly, even when a failure occurs in one of the controllers, the remaining controller can receive the process of the failed controller to execute the process using the data in its own cache memory.
  • The I/O process flow will be described according to an example in which the [0029] host system 10 achieves a data transfer via the controller 30. The host system 10 issues an I/O request with an SCSI-ID designating the controller 30. In the controller 30, the SCSI port 31 keeping the SCSI-ID therein receives the I/O request and then passes the request to the microprocessor 34. The microprocessor 34 analyzes the I/O request and then instructs the data transfer controller 36 to execute a data transfer between the host system 10 and the disk 60.
  • The transfer data is provisionally buffered in the [0030] cache memory 32 and is then written also in the cache memory 42 in contemplation of a possible failure in the controller 30. In this connection, the SCSI-ID is set by the microprocessor 34 at initialization of the SCSI port 31, for example, when the system is powered. The SCSI-ID is saved in the shared memory 50 at the same time. Also stored in the shared memory 50 is control information so that the process can be continuously, executed by a normal controller when one of the controller system fails in the dual controller configuration.
  • Referring now to the process sequence of the host system at failure of the controller shown in FIG. 2, description will be given of a method of continuing an I/O operation of the [0031] host system 10 according to the present invention.
  • First, the internal construction of the [0032] host system 10 will be described, In FIG. 2, numeral 81 is an application program for executing data processing to perform various requests from the user, numeral 82 denotes a file system for keeping therein data structure and controlling I/O requests, numeral 83 indicates a device driver for converting an I/O request into a request mode suitable for a peripheral unit, numeral 84 stands for an SCSI card for transmitting an I/O request to the SCSI bus, numeral 85 is a transfer I/O buffer, and numeral 86 designates a system log in which failure information of the host systems is accumulated.
  • Next, description will be generally given of 5 the processing of the [0033] host system 10 when a failure occurs in the controller 30 of the disk subsystem. Receiving an I/O request occurring in the application 81, the file system 82 issues an I/O request to the SCSI bus 80 via the device driver 83 and SCSI card 84. On receiving the request, when the controller 30 detects a failure in the disk subsystem, the controller 30 reports Check Condition for the I/O request.
  • Next, the [0034] device driver 83 issues a Request Sense command to receive Sense Data which is detailed failure information. According to the Sense Data, the device driver 83 recognizes the state of the controller 30. As a result, the driver 83 issues again (retries) the same I/O request. Since the failed controller 30 cannot either execute the re-issued I/O request, the device driver 83 instructs an operation to discard the process associated with the I/O request and repeats the operation, for example, by Retry after an Abort message. After this operation, the driver 83 recognizes the state as a permanent error to notify the condition to the file system 82.
  • Receiving the permanent error report, the [0035] file system 82 does not thereafter issue any I/O request to the disk subsystem 70. The file system 82 then erases non-reflection data of the I/O buffer 85 and records a failure occurrence in the system log, and then sends an error message via the application program 81 to the user. Consequently, the integrity of updated data cannot be preserved between the application program 81, file stem 2, and disk subsystem depending on cases. Consequently, in any case to which the present invention is not applied, the user is required to stop the application program and the like to restore the disk subsystem so as to thereafter execute again a sequence of processes possibly having caused the mismatching of data in the host system.
  • As another example of general processing, there exists a case in which the [0036] controller 30 cannot report Check Condition to the device driver 83 even when failure occurs. Namely, the controller 30 does not notify the occurrence of the failure to the device driver 83. On this occasion, the device driver 83 checks the state of the disk subsystem by monitoring the state according to a fixed period of time indicated by a timer. When the response is not received within the fixed period of time, the device driver conducts, as in the example above, the process beginning at the re-issuance (retry) of the same I/O request.
  • Referring to FIG. 1, description will be given of an advantageous feature in which the I/O process can be continued without conducting the user operation in accordance with the present invention. The [0037] controllers 30 and 40 update monitor information items of the respective controllers in the shared memory 50 at a fixed interval of time; moreover, the controllers mutually reference monitor information thereof.
  • When the [0038] controllers 30 and 40 are respectively receiving I/O requests issued respectively from the host systems 10 and 20, and when a failure occurs in the controller 30, the monitor information of the controller 30 in the shared memory 50 is updated by the controller 30 to information indicating the failure, or the information is not updated even when a fixed period of time lapses. Referencing the monitor information in the shared memory 50, the controller 40 detects the failure of the controller 30, reads the SCSI-ID of the SCSI port 31 and control information of the controller 30 from the shared memory 50, and adds by the microprocessor 44 the SCSI-ID of the SCSI port 31 to the SCSI port 41.
  • Additionally, using the SCSI [0039] port resetting facility 45, the controller 40 erases the SCSI-ID possessed by the SCSI port 31. This enables the SCSI port 41 to accept an I/O request issued from the host system 20 and an I/O request issued from the host system 10 so thale' the retry of the host system 10 is received for execution thereof by the controller 40.
  • When the retry is normally executed, a normal execution of the I/O request is reported to the [0040] file system 82 and the processing of the host system 10 is normally continued. The control information includes transit information in relation to transfers of data from the cache memories 32 and 42 to standalone disks 60. Consequently, upon receiving the control information, the controller 40 can transfer, in place of the controller 30, the duplicated data written in the cache memory 42, as alternative data of the Write data maintained as non-reflection data in the cache memory 32.
  • Since the method of failure detection and control information transfer of the [0041] controller 30 is not the inherent characteristic of the present invention and has already been described in detail in the Japanese Patent Application No. 7-139781.(filed on Jun. 7, 1995) by the applicant of the present invention, description thereof will be avoided.
  • For the transfer by the [0042] controller 40 of the SCSI-ID of the SCSI port 31 to the SCSI port 41 and the transfer of control information of the controller 30 to the controller 40 described above, the associated processing is required to be appropriately accomplished according to the state of the controller 30. Otherwise, the transfers cannot be correctly carried out. According to the present invention, the status of the failed controller 30, more specifically, the state of reception by the failed controller 30 of the I/O request from the host system is determined on the basis of the usage state (signal state) of the SCSI bus.
  • In the following examples, description will be given of a case in which a failure takes place in the [0043] controller 30 of FIG. 1 and the process is continued by the normal controller 40.
  • Referring next to FIG. 3, description will be given of processing to be executed according to the state of the disk subsystem. [0044]
  • In general, it is difficult to completely forecast operation to be achieved by the failed controller when an I/O request is received from the [0045] host system 10. Therefore, when the failed controller 30 has not yet received the I/O request from the host system 10 when the failure of the controller 30 is detected by the controller 40, the transfer process of the SCSI-ID including the addition of the SCSI-ID to the SCSI port 41 and the resetting of the SCSI port 31 is executed as early as possible so that the controller 40 receives the I/O request.
  • However, when an I/O request is issued from the [0046] host system 10 with specification of the SCSI-ID during the transfer process of the SCSI-ID, the controllers 30 and 40 possess the same SCSI-ID and hence the operation of the SCSI bus becomes unstable. In this situation, according to the present invention, there is provided a method in which the SCSI bus 80 is dedicatedly occupied by one controller during the SCSIID transfer process so as to suppress the I/O request issuance from the host system 10.
  • In accordance with the present invention, the [0047] controller 40 monitors the utilization status (signal state) of the SCSI bus 80 to decide whether or not the controller 30 has already received the I/O request from the host system 10, thereby executing a process associated with the decision.
  • In one of the utilization statuses of the [0048] SCSI bus 80, the SCSI bus 80 is possibly in the bus free state when a failure is detected in the controller 30. In this case, the SCSI bus 80 is possibly in the bus free state. Since the controller 30 has not yet received the I/O request, the controller 40 executes a host operation (the initiator operation) such that the controller 40 selects the controller 30 to exclusively occupy the SCSI bus 80. This makes it possible to suppress the issuance of an I/O request from the host system 10 such that the controller 40 conducts the transfer of the SCSI-ID during this period.
  • In one of the utilization statuses of the [0049] SCSI bus 80, it may be possible that the controller 40 is executing an I/O process through the SCSI bus 80 when a failure is detected in the controller 30. In this situation, it may be possible that the controller 40 is executing an I/O process through the SCSI bus 80. On this occasion, the controller 30 has not received the I/O request and hence the SCSI bus 80 is set to the bus free state at termination of the I/O process and an I/O request may possibly be issued from the host system 10. To overcome this difficulty, the controller 40 also completely executes the SCSI-ID transfer during the execution of the pertinent I/O process. If the SCSI-ID transfer is not completed during the execution of the pertinent I/O, the controller 40 does not send the report of the I/O termination status until the ID transfer is completely finished.
  • In one of the utilization statuses of the [0050] SCSI bus 80, the SCSI bus is possibly being used when a failure is detected in the controller 30. In this case, the system is in a state in which the arbitration or selection is being executed according to the SCSI standards, a state in which another SCSI device connected to the SCSI bus 80 is using the SCSI bus 80, or a state in which the controller 30 has already received the I/O request from the host system 10.
  • In this situation, the [0051] controller 40 monitors the BSY signal of the SCSI bus 80. In association with the monitor period, when the BSY signal continues for a period of time equal to or more than the period of time in which the arbitration phase is changed via the selection phase to the message out phase according to the SCSI standards, it can be decided that the signal is the BSY signal indicating an I/O process in execution, not the BSY signal of the bus mastership arbitration. After the signal decision, the controller 40 executes the SCSI-ID transfer process at a high speed.
  • If another SCSI device is using the [0052] SCSI bus 80, the controller 30 has not received the I/O request. Therefore, the controller 40 achieves the transfer process at a high speed while another SCSI device is using the SCSI bus 80.
  • If the [0053] controller 30 has already received the I/O request from the host system 10, the failed controller 30 has already stopped its operation with the SCSI bus 80 exclusively possessed by the controller 30. Since the device driver 83 is monitoring the I/O operation by the internal timer, the controller 40 is required to execute the SCSI-ID transfer before the host system 10 conducts the Bus Reset and Retry so that the controller 40 responds to the Retry. The monitor period of the controller 40 to monitor the SCSI bus 80 is shorter than the I/O process monitor period of the host system 10. Consequently, the controller 40 is required to completely achieve the SCSI-ID transfer prior to the bus resetting indication from the host system. This can be satisfactorily achieved due to the provision described above.
  • Referring to FIGS. 4 and 5, description will be given of a procedure to acquire the state of the disk subsystem by monitoring the SCSI bus and an associated procedure of transferring the SCSI-ID. [0054]
  • Description will be given of a case in which the [0055] SCSI bus 80 is in the bus free state when a failure of the controller 30 is detected by the controller 40 in FIG. 4.
  • Since the [0056] SCSI bus 80 is in the bus free state (step 400), the controller 40 recognizes that the controller 30 has not yet received the I/O request from the host system 10. The controller 40 then instructs the SCSI port 41 to start the initiator operation to participate in the arbitration of, the SCSI bus 80 (step 401).
  • As a result, when the [0057] controller 40 remains in the arbitration (Y in step 402), the controller 40 specifies in the selection phase the SCSI-ID of the SCSI port 31 of the failed controller 30. In this situation, even if a failure occurs in the controller 30, the SCSI port 31 normally functions in most cases. Consequently, there is set a state in which the SCSI port 31 of the controller 30 exclusively occupies the SCSI bus 80 (step 404). In this state, the controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 405) and then resets the SCSI port 31 (step 406). The SCSI bus 81 exclusively occupied by the controller 30 is released by resetting the SCSI port 31 and is returned to the bus free state. Thereafter, the controller 40 receives the I/O request from the host system 10 (step 413). The I/O process 5 continue in this way without any intervention by the user.
  • When the [0058] controller 40 cannot remain in the arbitration (N in step 402), it is decided whether or not the controller 40 is selected by the host system 20 in the selection phase (step 403). If the controller 40 is selected by the host system (Y in step 403), there is set a state in which the controller 40 dedicatedly occupies the SCSI bus 80. In this state, the controller 40 receives the I/O request from the host system (step 407) and then provisionally interrupts the processing. The controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 408) and then resets the SCSI port 31 (step 409). After resetting the port 31, the controller 40 executes the I/O request from the host system (step 410) and then restores the SCSI bus 80 to the bus free state. At this point, the controller 40 receives the I/O request from the host system 10 (step 413).
  • If the controller does not remain in the arbitration (No in step [0059] 402) and is not selected by the host system (No in step 403), the controller 40 assumes a state in which the controller 30 having received the I/O request from the host system 10 or another SCSI device dedicatedly occupies the SCSI bus 80. In this situation, while the state is kept unchanged, the controller 40 adds the SCSI-ID possessed by the SCSI port 31 (step 411) to the SCSI port 41 and then resets the SCSI port 31 (step 412). If the controller 30 exclusively occupies the SCSI bus 80, the SCSI bus 80 is restored to the bus free state by resetting the SCSI port 31. If another SCSI device dedicatedly occupies the SCSI bus 80, the SCSI bus 80 is restored to the bus free state when the I/O process of the SCSI device is terminated. Thereafter, the controller 40 accepts the I/O request from the host system 10 (step 413).
  • Referring next to FIG. 5, description will be given of a processing procedure in a case in which the BSY signal of the [0060] SCSI bus 80 is asserted at detection of the failure of the controller 30 (step 500).
  • The [0061] controller 40 first determines whether or not the controller 40 is executing an I/O request from the host system (step 501). If this is not the case (No in step 501), the controller 40 continuously monitors the state of the SCSI bus 80 for a period of time equivalent to the period in which the arbitration phase according to the SCSI standards is changed via the selection phase to the message out phase (step 502).
  • At detection of the failure, if the [0062] controller 40 is executing an I/O operation (Y in step 501) or the controller 40 is selected by the host system during the monitor operation of the SCSI bus 80 (left branch in step 502), there is assumed a state in which the SCSI bus 80 is exclusively occupied by the controller 40 and the controller 30 has not received the I/O request. In this state, prior to reporting the termination status of the I/O execution (step 503), the controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 504) and then resets the SCSI port 31 (step 505). After resetting the port 31, the controller 40 notifies the I/O termination status and then terminates the I/O operation (step 506).
  • The [0063] SCSI bus 80 is set to the bus free state when the I/O execution process is terminated, and the controller 40 receives any subsequent I/O request from the host system 10. In this fashion, it is possible to continuously execute the I/O process without user intervention.
  • When the bus free state is detected during the monitor operation of the SCSI bus [0064] 80 (central branch in step 502), the process at bus free detection of FIG. 4 is executed.
  • If the [0065] controller 40 is not executing an I/O operation and the SCSI bus 80 is not released during the monitor operation (right branch in step 502), the controller 40 recognizes that the controller 30 or another SCSI device exclusively occupying the SCSI bus is executing an I/O operation. Continuing the SCSI bus monitoring operation (step 508), the controller 40 adds the SCSI-ID possessed by the SCSI port 31 to the SCSI port 41 (step 509) and then resets the SCSI port 31 (step 510).
  • When the [0066] controller 30 exclusively occupies the SCSI bus 80, the bus 80 is returned to the bus free state by resetting the SCSI port 31. When another SCSI device exclusively occupies the SCSI bus 80, the bus 80 is returned to the bus free state when the I/O operation of the SCSI device is terminated. Thereafter, the controller 40 receives the I/O request from the host system 10. If the bus is released before the SCSI port 31 is completely reset (broken line in step 508), there is executed the process at detection of the bus free state shown in FIG. 4.
  • As a result of the processing procedure, the I/O request from the [0067] host system 10 can be executed by the controller 40 when a failure occurs in the controller 30, thereby preventing the permanent error. Consequently, the data processing of the system 10 can be normally continued.
  • Referring next to FIGS. 6 and 7, description will be given that the present invention can be implemented in a configuration of the controller not including the port address resetting facility. [0068]
  • FIG. 6 is a diagram showing the configuration developed by removing the port address resetting facility from the controller of FIG. 1. [0069] Numerals 90 and 100 indicate controllers respectively conducting functions of the controllers 30 and 40 of FIG. 1 and a numeral 50 indicates a shared memory to supply information between the controllers 90 and 100.
  • In an internal constitution of the [0070] controller 90, a numeral 34 is a microprocessor controlling overall operation of the controllers, numeral 31 indicates an SCSI port which can be controlled only by the microprocessor 34, numeral 32 denotes a cache memory, numeral 33 stands for a device-side SCSI port, numeral 36 designates a data transfer controller, and a numeral 37 is an array data transfer controller. The controllers 90 and 100 are of the same configuration. In the following paragraphs, description will be given of an example in which the controller 90 receives an I/O request from the host system 10 of FIG. 1 and the controller 100 receives an I/O request from the host system of FIG. 1. FIG. 7 is a diagram showing an SCSI-ID transfer processing procedure with its abscissa representing lapse of time.
  • When a failure occurs in the [0071] controller 90, the controller 100 detects the failure and then sets at a particular address in the shared memory 50 a failure flag indicating the occurrence of the failure in the controller 90. Thereafter, the controller 100 reads the SCSI-ID of the SCSI port 31 and control information of the controller 90 from the shared memory 50, and adds by the microprocessor 44 the SCSI-ID to the SCSI port 41. In contrast thereto, the controller 90 recognizes its own failure according to the failure flag in the shared memory 50 and enters a wait state in which by use of an internal timer, the controller 90 does not execute its own operation for a period of time equivalent to the period of time in which the transfer processing of the controller 100 is completely executed.
  • The [0072] controller 90 determines through the wait operation the completion of the processing of the controller 100 and then erases by the microprocessor 34 the SCSI-ID possessed by the SCSI port 31. As a result, the SCSI-ID transfer process is terminated and then the SCSI port 41 is enabled to receive the I/O request from the host system of FIG. 1.
  • Since the SCSI-ID process can be conducted without using the port address resetting facility as described above, the present invention is also effective in the configuration not including the port address resetting facility. It is also to be assumed that when a failure occurs in the [0073] controller 90, the microprocessor 34 and SCSI port 31 function normally.
  • While the present invention has been described with reference to the particular illustrative embodiments, it is not to be restricted by those embodiments but only by the appended claims. It is to be appreciated that those skilled in the art can change or modify the embodiments without departing from the scope and spirit of the present invention. [0074]

Claims (21)

1. A failure recovery method for use in a data processing system including at least one host system, a plurality of controllers, and an interface cable connecting said host system to said controllers in a daisy chain, said controllers respectively including therein I/O ports being connected to said interface cable and having mutually different IDs, an I/O device being controlled by a group of at least two controllers, the method comprising the steps of:
detecting, when a failure is detected in a controller of said group, a utilization state of said interface cable by a controller as a substitutive unit of a failed controller of said group;
deciding, according to the utilization state of said interface cable, a state of reception by said failed controller of an I/O request from said host system;
suppressing by a substitutive controller, when the I/O request is not yet received by said failed controller as a result of the decision, reception of the I/O request by said failed controller; adding an ID of an I/O port related to said failed controller to an I/O port of said substitutive controller; and resetting the I/O port related to said failed controller; and
adding by said substitutive controller, when the I/O request is already received by said failed controller as a result of the decision, the ID of said I/O port related to said failed controller to the I/O port of said substitutive controller and resetting the I/O port related to said failed controller before said host system recognizes a permanent error in said failed controller.
2. A failure recovery method according to
claim 1
, wherein, in resetting the I/O port related to said failed controller, reset is carried out by hardware resetting means in said substitutive controller.
3. A failure recovery method according to
claim 1
, wherein, in resetting the I/O port related to said failed controller, said substitutive controller further includes the steps of:
indicating to said failed controller to reset the I/O port related to said failed controller after lapse of a predetermined period of time; and
adding the ID of the I/O portion related to said failed controller to the I/O port of said substitutive controller within said predetermined period of time.
4. A failure recovery method according to
claim 1
, wherein said interface cable is a Small Computer Systems Interface bus cable.
5. A data processing system, comprising:
at least one host system;
a plurality of controllers; and
an interface cable connecting said host system to said controllers in a daisy chain, said controllers respectively including therein I/O ports being connected to said interface cable and having mutually different IDs;
an I/O device being commonly controlled by a group of at least two controllers; and
a shared memory being commonly accessed from said group, each of controllers in said group including a microprocessor,
the microprocessor in each of said controllers including:
means for detecting a failure in a controller of said group according to contents of said shared memory;
means for detecting a utilization state of said interface cable via an I/O port;
means for deciding, according to the utilization state of said interface cable, a state of reception by said failed controller of an I/O request from said host system;
means for suppressing, when the I/O request is not yet received by said failed controller as a result of the decision, reception of the I/O request by said failed controller; adding an ID of the I/O port related to said failed controller to an I/O port of a controller of its own; and indicating to reset the I/O port related to said failed controller; and
means for adding, when the I/O request is already received by said failed controller as a result of the decision, the ID of the I/O port related to said failed controller to the I/O port of the controller of its own; and indicating to reset the I/O port related to said failed controller before said host system recognizes a permanent error in said failed controller.
6. A data processing system according to
claim 5
, wherein each of the controllers of said group includes hardware resetting means responsive to an indication from said reset indicating means for resetting the I/O port related to said failed controller.
7. A data processing system according to
claim 5
, wherein:
said reset indicating means writes a failure flag at a predetermined address in said shared memory, said flag indicating an occurrence of a failure;
a processor in said failed controller functions as means for reading said failure flag from said shared memory and resetting the I/O port related thereto after lapse of a predetermined period of time; and
said reset indicating means adds the ID of the I/O port related to said failed controller to the I/O port related to own controller within said predetermined period of time.
8. A data processing system according to
claim 5
, wherein said interface cable is an SCSI bus cable.
9. An external storage for use in a data processing system including a host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and storages being accessible from said host system,
said external storage having a function that at occurrence of a failure in a controller excepting at least one controller, a normal controller detects the failure, references a port address of a failed controller, receives control information of said failed controller, and adds control information to the port address thereof.
10. An external storage according to
claim 9
, further including a shared memory for each of said plural controllers for storing therein the port address and control information of each of said controllers and thereby transmitting information between said controllers.
11. An external storage in a data processing system including host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and storages being accessible from said host system,
said external storage having a function that at occurrence of a failure in a controller excepting at least one controller, a normal controller detects the failure, references a port address of a failed controller, receives control information of said failed controller, and adds the control information to the port address thereof,
a controller having a port address resetting facility for resetting the port address of said failed controller and erasing an ID thereof in such a manner that the controller resets the port address of said failed controller, that said failed controller does not respond to subsequent I/O requests from said host system, and that said normal controller having received the port address responds to the I/O requests.
12. An external storage according to
claim 11
, wherein, at occurrence of the failure in the controller, in a state in which said host system has not executed an I/O request to said failed controller and said interface cable connecting said host system to said controllers is not being used,
a normal controller executes selection for said failed controller to acquire a bus mastership between said normal controller and said failed controller, thereby suppressing issuance of an I/O request from said host system to said failed controller during a transfer process of the port address by said normal controller.
13. An external storage according to
claim 11
, wherein, at occurrence of the failure in the controller, in a state in which said host system has not executed an I/O request to said failed controller and said normal controller is using the bus, said normal controller completes the transfer process of the port address of said failed controller during the processing of the I/O request issued from said host system and then notifies termination of the I/O request, thereby suppressing issuance of an I/O request from said host system to said failed controller during the transfer process of the port address by said normal controller.
14. An external storage according to
claim 11
, wherein;
said interface cable is an SCSI cable;
said normal controller monitors, when the bus is in use at occurrence of the failure in the controller, a BSY signal of the bus to determine whether or not the bus is being used by another device connected to the bus, whether or not the system is in a transit state from an arbitration phase to a selection phase according to the SCSI standards, and whether or not said failed controller already received an I/O request from said host system,
said normal controller executes, when the bus is released during the monitor operation, selection for said failed controller to attain a bus mastership between said normal and failed controllers,
said normal controller completes, when said normal controller is selected during the monitor operation, the transfer process of the port address of said failed controller during the processing of the I/O request issued from said host system and then notifies termination of the I/O request, and
said normal controller terminates during the monitoring period the transfer process of the port address of said failed controller.
15. An external storage according-to
claim 14
, wherein the monitoring period of the bus mastership is set to be equal to or more than a period of time in which the arbitration phase is changed via the selection phase to a message out phase according to the SCSI standards so as to confirm that the BSY signal is not associated with arbitration of the bus mastership but is caused by an I/O execution process, thereby executing the transfer of the port address of said failed controller.
16. An external storage in a data processing system including a host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and storages being accessible from said host system, wherein:
at occurrence of a failure in a controller excepting at least one controller, a failed controller recognizes the failure thereof and enters a wait state without executing a control operation thereof in at least a period of time equal to time in which said normal controller conducts a transfer process of control information of said failed controller and addition of a port address;
after said normal controller which recognized the failure finishes the transfer and addition processes, said failed controller erases the port address of said failed controller; and
said normal controller which received the port address of said failed controller responds to a subsequent I/O request issued from said host system since the port address of said failed controller is already erased.
17. An external storage according to
claim 16
, wherein
at occurrence of the failure in the controller, in a state in which said host system has not executed an I/O request to said failed controller and said interface cable connecting said host systems to said controllers is not being used, said normal controller executes selection for said failed controller to acquire a bus mastership between said normal controller and said failed controller, thereby suppressing issuance of an I/O request from said host system to said failed controller during the transfer process of the port address by said normal controller.
18. An external storage according to
claim 16
, wherein, at occurrence of the failure in a controller, in a state in which a host system has not executed an I/O request to said failed controller and said normal controller is using the bus, said normal controller completes the transfer process of the port address of said failed controller-during the processing of the I/O request issued from said host system and then notifies termination of the I/O request, thereby suppressing issuance of an I/O request from said host system to said failed controller during the transfer process of the port address by said normal controller.
19. An external storage according to
claim 16
, wherein:
when the bus is in use at occurrence of the failure in the controller, said normal controller monitors a BSY signal of the bus to determine whether or not the bus is being used by another device connected to the bus, whether or not the system is in a transit state from an arbitration phase to a selection phase according to the SCSI standards, and whether or not said failed controller already received the I/O request from said host system;
when the bus is released during the monitor operation, the normal controller executes selection for said failed controller to attain a bus mastership between said normal and failed controllers;
when said normal controller is selected during the monitor operation, said normal controller completes the transfer process of the port address of said failed controller during the processing of the I/O request issued from said host system and then notifies the termination of the I/O request; and
said normal controller terminates during the monitoring period the transfer process of the port address of said failed controller.
20. An external storage according to
claim 16
, wherein the monitoring period of the bus mastership is set to be equal to or more than a period of time in which the arbitration phase changes via the selection phase to a message out phase so as to confirm that the BSY signal is not associated with arbitration of the bus mastership but is caused by an I/O execution process, thereby executing the transfer of the port address of said failed controller.
21. A host system and an external storage connected by an interface cable in a configuration including a host system, an external storage including a plurality of controllers respectively having therein ports possessing identifiers as individual port addresses and a group of storages controlled by and shared between said plural controllers, and an interface cable connecting in a daisy chain said host system to said plural controllers having the ports therein, said plural controllers and said storages being accessible from said host system,
said external storage having a function that at occurrence of a failure in a controller excepting at least one controller, said normal controller detects the failure, references the port address of the failed controller, receives control information of said failed controller, and adds the control information to the port address thereof,
said host system having a function that in a state in which a controller having received an I/O request issued from the host system cannot respond thereto due to occurrence of a failure in the controller, said host system monitors an I/O completion report from the controller, issues again the I/O request to said failed controller after lapse of the predetermined monitoring period, executes a recovery process including a resetting operation, recognizes a permanent error when the controller does not respond to the recovery process, and notifies the error to the application, and
said normal controller completing an operation including the reference, transfer, and additional port address processes before the permanent error is recognized, thereby preventing a report of the permanent error to an application of said host system.
US09/835,494 1995-10-30 2001-04-17 External storage Expired - Fee Related US6412078B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/835,494 US6412078B2 (en) 1995-10-30 2001-04-17 External storage

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP7-282072 1995-10-30
JP07-282072 1995-10-30
JP28207295A JP3628777B2 (en) 1995-10-30 1995-10-30 External storage device
US08/738,590 US6052795A (en) 1995-10-30 1996-10-29 Recovery method and system for continued I/O processing upon a controller failure
US09/421,235 US6321346B1 (en) 1995-10-30 1999-10-20 External storage
US09/835,494 US6412078B2 (en) 1995-10-30 2001-04-17 External storage

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/421,235 Continuation US6321346B1 (en) 1995-10-30 1999-10-20 External storage

Publications (2)

Publication Number Publication Date
US20010020282A1 true US20010020282A1 (en) 2001-09-06
US6412078B2 US6412078B2 (en) 2002-06-25

Family

ID=17647772

Family Applications (3)

Application Number Title Priority Date Filing Date
US08/738,590 Expired - Lifetime US6052795A (en) 1995-10-30 1996-10-29 Recovery method and system for continued I/O processing upon a controller failure
US09/421,235 Expired - Lifetime US6321346B1 (en) 1995-10-30 1999-10-20 External storage
US09/835,494 Expired - Fee Related US6412078B2 (en) 1995-10-30 2001-04-17 External storage

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US08/738,590 Expired - Lifetime US6052795A (en) 1995-10-30 1996-10-29 Recovery method and system for continued I/O processing upon a controller failure
US09/421,235 Expired - Lifetime US6321346B1 (en) 1995-10-30 1999-10-20 External storage

Country Status (4)

Country Link
US (3) US6052795A (en)
EP (1) EP0772127B1 (en)
JP (1) JP3628777B2 (en)
DE (1) DE69608641T2 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030167363A1 (en) * 2000-07-13 2003-09-04 Shigeo Sakaba Computer system bus interface and control method therefor
US20030212864A1 (en) * 2002-05-08 2003-11-13 Hicken Michael S. Method, apparatus, and system for preserving cache data of redundant storage controllers
US20050010838A1 (en) * 2003-04-23 2005-01-13 Dot Hill Systems Corporation Apparatus and method for deterministically performing active-active failover of redundant servers in response to a heartbeat link failure
US20050102549A1 (en) * 2003-04-23 2005-05-12 Dot Hill Systems Corporation Network storage appliance with an integrated switch
US6898732B1 (en) * 2001-07-10 2005-05-24 Cisco Technology, Inc. Auto quiesce
EP1552392A2 (en) * 2002-08-02 2005-07-13 Grass Valley (U.S.) Inc. Real-time fail-over recovery for a media area network
US20050207105A1 (en) * 2003-04-23 2005-09-22 Dot Hill Systems Corporation Apparatus and method for deterministically performing active-active failover of redundant servers in a network storage appliance
US20050246568A1 (en) * 2003-04-23 2005-11-03 Dot Hill Systems Corporation Apparatus and method for deterministically killing one of redundant servers integrated within a network storage appliance chassis
US20050262312A1 (en) * 2002-07-30 2005-11-24 Noboru Morishita Storage system for multi-site remote copy
US20050289391A1 (en) * 2004-06-29 2005-12-29 Hitachi, Ltd. Hot standby system
US20060041728A1 (en) * 2004-02-25 2006-02-23 Hitachi, Ltd. Logical unit security for clustered storage area networks
US20060195673A1 (en) * 2005-02-25 2006-08-31 International Business Machines Corporation Method, apparatus, and computer program product for coordinating error reporting and reset utilizing an I/O adapter that supports virtualization
US20090265493A1 (en) * 2008-04-16 2009-10-22 Mendu Krishna R Efficient Architecture for Interfacing Redundant Devices to a Distributed Control System
US20090319699A1 (en) * 2008-06-23 2009-12-24 International Business Machines Corporation Preventing Loss of Access to a Storage System During a Concurrent Code Load
US7849352B2 (en) 2003-08-14 2010-12-07 Compellent Technologies Virtual disk drive system and method
US7886111B2 (en) 2006-05-24 2011-02-08 Compellent Technologies System and method for raid management, reallocation, and restriping
US8468292B2 (en) 2009-07-13 2013-06-18 Compellent Technologies Solid state drive data storage system and method
US9146851B2 (en) 2012-03-26 2015-09-29 Compellent Technologies Single-level cell and multi-level cell hybrid solid state drive
US20160283303A1 (en) * 2015-03-27 2016-09-29 Intel Corporation Reliability, availability, and serviceability in multi-node systems with disaggregated memory
US9489150B2 (en) 2003-08-14 2016-11-08 Dell International L.L.C. System and method for transferring data between different raid data storage types for current data and replay data
US11128578B2 (en) * 2018-05-21 2021-09-21 Pure Storage, Inc. Switching between mediator services for a storage system

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3628777B2 (en) * 1995-10-30 2005-03-16 株式会社日立製作所 External storage device
JP3863291B2 (en) * 1998-05-28 2006-12-27 株式会社日立製作所 Database processing method, database processing system, and medium
JPH11345175A (en) * 1998-06-02 1999-12-14 Nec Kofu Ltd System and method for controlling substitutive path
JP4392877B2 (en) 1998-09-18 2010-01-06 株式会社日立製作所 Disk array controller
JP4132322B2 (en) * 1998-12-16 2008-08-13 株式会社日立製作所 Storage control device and control method thereof
US6493777B1 (en) * 1999-09-15 2002-12-10 Lucent Technologies Inc. Method for dynamically reconfiguring data bus control
JP4462697B2 (en) 2000-01-31 2010-05-12 株式会社日立製作所 Storage controller
US6766470B1 (en) * 2000-03-29 2004-07-20 Intel Corporation Enhancing reliability and robustness of a cluster
US6961765B2 (en) 2000-04-06 2005-11-01 Bbx Technologies, Inc. System and method for real time monitoring and control of networked computers
US6697905B1 (en) * 2000-04-13 2004-02-24 International Business Machines Corporation Apparatus for providing I/O support to a computer system and method of use thereof
US6708283B1 (en) 2000-04-13 2004-03-16 Stratus Technologies, Bermuda Ltd. System and method for operating a system with redundant peripheral bus controllers
US6735715B1 (en) 2000-04-13 2004-05-11 Stratus Technologies Bermuda Ltd. System and method for operating a SCSI bus with redundant SCSI adaptors
US6954881B1 (en) * 2000-10-13 2005-10-11 International Business Machines Corporation Method and apparatus for providing multi-path I/O in non-concurrent clustering environment using SCSI-3 persistent reserve
JP2002366334A (en) * 2001-06-07 2002-12-20 Komatsu Ltd Method and device for controlling a lot of processing modules
JP4796251B2 (en) * 2001-09-21 2011-10-19 株式会社日立製作所 Network storage system and control method thereof
JP2003162377A (en) * 2001-11-28 2003-06-06 Hitachi Ltd Disk array system and method for taking over logical unit among controllers
JP4014923B2 (en) * 2002-04-30 2007-11-28 株式会社日立製作所 Shared memory control method and control system
US20030220943A1 (en) * 2002-05-23 2003-11-27 International Business Machines Corporation Recovery of a single metadata controller failure in a storage area network environment
US7448077B2 (en) * 2002-05-23 2008-11-04 International Business Machines Corporation File level security for a metadata controller in a storage area network
US8140622B2 (en) 2002-05-23 2012-03-20 International Business Machines Corporation Parallel metadata service in storage area network environment
US7010528B2 (en) * 2002-05-23 2006-03-07 International Business Machines Corporation Mechanism for running parallel application programs on metadata controller nodes
US7092990B2 (en) * 2002-06-26 2006-08-15 International Business Machines Corporation Handling node address failure in a distributed nodal system of processors
US7350012B2 (en) * 2002-07-12 2008-03-25 Tundra Semiconductor Corporation Method and system for providing fault tolerance in a network
US7814050B2 (en) * 2002-10-22 2010-10-12 Brocade Communications Systems, Inc. Disaster recovery
US7480831B2 (en) * 2003-01-23 2009-01-20 Dell Products L.P. Method and apparatus for recovering from a failed I/O controller in an information handling system
JP4278444B2 (en) 2003-06-17 2009-06-17 株式会社日立製作所 Virtual port name management device
US7966294B1 (en) * 2004-01-08 2011-06-21 Netapp, Inc. User interface system for a clustered storage system
DE112004002797B4 (en) * 2004-03-19 2015-12-31 Zakrytoe Aktsionernoe Obschestvo "Intel A/O" Failover and load balancing
US7249285B2 (en) * 2004-03-25 2007-07-24 International Business Machines Corporation Address watch breakpoints in a hardware synchronization range
US7760626B2 (en) * 2004-03-31 2010-07-20 Intel Corporation Load balancing and failover
US7406625B2 (en) * 2004-08-17 2008-07-29 International Business Machines Corporation Protecting a code range in a program from breakpoints
JP4294568B2 (en) * 2004-10-04 2009-07-15 富士通株式会社 Disk array device and control method thereof
US7478265B2 (en) 2004-10-14 2009-01-13 Hewlett-Packard Development Company, L.P. Error recovery for input/output operations
US7437608B2 (en) * 2004-11-15 2008-10-14 International Business Machines Corporation Reassigning storage volumes from a failed processing system to a surviving processing system
EP1871075B1 (en) * 2004-12-24 2015-03-04 IZUTSU, Masahiro Mobile information communication apparatus, connection unit for mobile information communication apparatus, and external input/output unit for mobile information communication apparatus
US7953917B2 (en) * 2005-06-30 2011-05-31 Intel Corporation Communications protocol expander
JP4923990B2 (en) * 2006-12-04 2012-04-25 株式会社日立製作所 Failover method and its computer system.
US7681089B2 (en) * 2007-02-20 2010-03-16 Dot Hill Systems Corporation Redundant storage controller system with enhanced failure analysis capability
JP5148236B2 (en) * 2007-10-01 2013-02-20 ルネサスエレクトロニクス株式会社 Semiconductor integrated circuit and method for controlling semiconductor integrated circuit
JP5353002B2 (en) * 2007-12-28 2013-11-27 富士通株式会社 Storage system and information processing apparatus access control method
US8321622B2 (en) * 2009-11-10 2012-11-27 Hitachi, Ltd. Storage system with multiple controllers and multiple processing paths
US8201020B2 (en) 2009-11-12 2012-06-12 International Business Machines Corporation Method apparatus and system for a redundant and fault tolerant solid state disk
US9329939B2 (en) * 2011-06-08 2016-05-03 Taejin Info Tech Co., Ltd Two-way raid controller for a semiconductor storage device
JP6307847B2 (en) * 2013-11-19 2018-04-11 富士通株式会社 Information processing apparatus, control apparatus, and control program
US20240012708A1 (en) * 2022-07-06 2024-01-11 Dell Products L.P. Real-time sense data querying

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4228496A (en) * 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4141066A (en) * 1977-09-13 1979-02-20 Honeywell Inc. Process control system with backup process controller
JPS5546535A (en) * 1978-09-28 1980-04-01 Chiyou Lsi Gijutsu Kenkyu Kumiai Method of manufacturing semiconductor device
JPH0618377B2 (en) * 1983-09-08 1994-03-09 株式会社日立製作所 Transmission system
JPS63178360A (en) * 1987-01-20 1988-07-22 Hitachi Ltd Constitutional system for input/output system
JPH01147727A (en) * 1987-12-04 1989-06-09 Hitachi Ltd Fault restoring method for on-line program
US5016244A (en) * 1989-09-08 1991-05-14 Honeywell Inc. Method for controlling failover between redundant network interface modules
US5091847A (en) * 1989-10-03 1992-02-25 Grumman Aerospace Corporation Fault tolerant interface station
US5140592A (en) * 1990-03-02 1992-08-18 Sf2 Corporation Disk array system
US5134619A (en) * 1990-04-06 1992-07-28 Sf2 Corporation Failure-tolerant mass storage system
US5289589A (en) * 1990-09-10 1994-02-22 International Business Machines Corporation Automated storage library having redundant SCSI bus system
JP3014494B2 (en) 1991-06-11 2000-02-28 三菱電機株式会社 Dual port disk controller
US5313584A (en) * 1991-11-25 1994-05-17 Unisys Corporation Multiple I/O processor system
WO1993018456A1 (en) * 1992-03-13 1993-09-16 Emc Corporation Multiple controller sharing in a redundant storage array
JP2868141B2 (en) * 1992-03-16 1999-03-10 株式会社日立製作所 Disk array device
US5566297A (en) * 1994-06-16 1996-10-15 International Business Machines Corporation Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments
GB2290891B (en) * 1994-06-29 1999-02-17 Mitsubishi Electric Corp Multiprocessor system
US5557735A (en) * 1994-07-21 1996-09-17 Motorola, Inc. Communication system for a network and method for configuring a controller in a communication network
US5644700A (en) * 1994-10-05 1997-07-01 Unisys Corporation Method for operating redundant master I/O controllers
DE69523124T2 (en) * 1994-12-15 2002-05-29 Hewlett Packard Co Fault detection system for a mirrored memory in a duplicated controller of a disk storage system
EP0721162A2 (en) * 1995-01-06 1996-07-10 Hewlett-Packard Company Mirrored memory dual controller disk storage system
JP3732869B2 (en) * 1995-06-07 2006-01-11 株式会社日立製作所 External storage device
US5790775A (en) * 1995-10-23 1998-08-04 Digital Equipment Corporation Host transparent storage controller failover/failback of SCSI targets and associated units
JP3628777B2 (en) * 1995-10-30 2005-03-16 株式会社日立製作所 External storage device

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030167363A1 (en) * 2000-07-13 2003-09-04 Shigeo Sakaba Computer system bus interface and control method therefor
US6928497B2 (en) * 2000-07-13 2005-08-09 International Business Machines Corporation Computer system bus interface and control method therefor
US6898732B1 (en) * 2001-07-10 2005-05-24 Cisco Technology, Inc. Auto quiesce
US20030212864A1 (en) * 2002-05-08 2003-11-13 Hicken Michael S. Method, apparatus, and system for preserving cache data of redundant storage controllers
US7293196B2 (en) * 2002-05-08 2007-11-06 Xiotech Corporation Method, apparatus, and system for preserving cache data of redundant storage controllers
US7685387B2 (en) 2002-07-30 2010-03-23 Hitachi, Ltd. Storage system for multi-site remote copy
US20080126725A1 (en) * 2002-07-30 2008-05-29 Noboru Morishita Storage system for multi-site remote copy
US7343461B2 (en) 2002-07-30 2008-03-11 Hitachi, Ltd. Storage system for multi-site remote copy
US20050262312A1 (en) * 2002-07-30 2005-11-24 Noboru Morishita Storage system for multi-site remote copy
US7188218B2 (en) 2002-07-30 2007-03-06 Hitachi, Ltd. Storage system for multi-remote copy
US7103727B2 (en) 2002-07-30 2006-09-05 Hitachi, Ltd. Storage system for multi-site remote copy
EP1552392A4 (en) * 2002-08-02 2006-08-16 Grass Valley Us Inc Real-time fail-over recovery for a media area network
EP1552392A2 (en) * 2002-08-02 2005-07-13 Grass Valley (U.S.) Inc. Real-time fail-over recovery for a media area network
US7308604B2 (en) 2002-08-02 2007-12-11 Thomson Licensing Real-time fail-over recovery for a media area network
US20060090094A1 (en) * 2002-08-02 2006-04-27 Mcdonnell Niall S Real-time fail-over recovery for a media area network
US7380163B2 (en) 2003-04-23 2008-05-27 Dot Hill Systems Corporation Apparatus and method for deterministically performing active-active failover of redundant servers in response to a heartbeat link failure
US7565566B2 (en) * 2003-04-23 2009-07-21 Dot Hill Systems Corporation Network storage appliance with an integrated switch
US8185777B2 (en) 2003-04-23 2012-05-22 Dot Hill Systems Corporation Network storage appliance with integrated server and redundant storage controllers
US9176835B2 (en) 2003-04-23 2015-11-03 Dot Hill Systems Corporation Network, storage appliance, and method for externalizing an external I/O link between a server and a storage controller integrated within the storage appliance chassis
US20050010838A1 (en) * 2003-04-23 2005-01-13 Dot Hill Systems Corporation Apparatus and method for deterministically performing active-active failover of redundant servers in response to a heartbeat link failure
US7676600B2 (en) 2003-04-23 2010-03-09 Dot Hill Systems Corporation Network, storage appliance, and method for externalizing an internal I/O link between a server and a storage controller integrated within the storage appliance chassis
US7661014B2 (en) 2003-04-23 2010-02-09 Dot Hill Systems Corporation Network storage appliance with integrated server and redundant storage controllers
US20070100964A1 (en) * 2003-04-23 2007-05-03 Dot Hill Systems Corporation Application server blade for embedded storage appliance
US20070100933A1 (en) * 2003-04-23 2007-05-03 Dot Hill Systems Corporation Application server blade for embedded storage appliance
US20050246568A1 (en) * 2003-04-23 2005-11-03 Dot Hill Systems Corporation Apparatus and method for deterministically killing one of redundant servers integrated within a network storage appliance chassis
US20050207105A1 (en) * 2003-04-23 2005-09-22 Dot Hill Systems Corporation Apparatus and method for deterministically performing active-active failover of redundant servers in a network storage appliance
US20050102549A1 (en) * 2003-04-23 2005-05-12 Dot Hill Systems Corporation Network storage appliance with an integrated switch
US7627780B2 (en) 2003-04-23 2009-12-01 Dot Hill Systems Corporation Apparatus and method for deterministically performing active-active failover of redundant servers in a network storage appliance
US20050010715A1 (en) * 2003-04-23 2005-01-13 Dot Hill Systems Corporation Network storage appliance with integrated server and redundant storage controllers
US20050027751A1 (en) * 2003-04-23 2005-02-03 Dot Hill Systems Corporation Network, storage appliance, and method for externalizing an internal I/O link between a server and a storage controller integrated within the storage appliance chassis
US7401254B2 (en) 2003-04-23 2008-07-15 Dot Hill Systems Corporation Apparatus and method for a server deterministically killing a redundant server integrated within the same network storage appliance chassis
US7464214B2 (en) 2003-04-23 2008-12-09 Dot Hill Systems Corporation Application server blade for embedded storage appliance
US7437604B2 (en) 2003-04-23 2008-10-14 Dot Hill Systems Corporation Network storage appliance with integrated redundant servers and storage controllers
US7464205B2 (en) 2003-04-23 2008-12-09 Dot Hill Systems Corporation Application server blade for embedded storage appliance
US7962778B2 (en) 2003-08-14 2011-06-14 Compellent Technologies Virtual disk drive system and method
US8020036B2 (en) 2003-08-14 2011-09-13 Compellent Technologies Virtual disk drive system and method
US9047216B2 (en) 2003-08-14 2015-06-02 Compellent Technologies Virtual disk drive system and method
US9489150B2 (en) 2003-08-14 2016-11-08 Dell International L.L.C. System and method for transferring data between different raid data storage types for current data and replay data
US8321721B2 (en) 2003-08-14 2012-11-27 Compellent Technologies Virtual disk drive system and method
US9436390B2 (en) 2003-08-14 2016-09-06 Dell International L.L.C. Virtual disk drive system and method
US8560880B2 (en) 2003-08-14 2013-10-15 Compellent Technologies Virtual disk drive system and method
US8473776B2 (en) 2003-08-14 2013-06-25 Compellent Technologies Virtual disk drive system and method
US9021295B2 (en) 2003-08-14 2015-04-28 Compellent Technologies Virtual disk drive system and method
US7849352B2 (en) 2003-08-14 2010-12-07 Compellent Technologies Virtual disk drive system and method
US8555108B2 (en) 2003-08-14 2013-10-08 Compellent Technologies Virtual disk drive system and method
US10067712B2 (en) 2003-08-14 2018-09-04 Dell International L.L.C. Virtual disk drive system and method
US7945810B2 (en) 2003-08-14 2011-05-17 Compellent Technologies Virtual disk drive system and method
US7941695B2 (en) 2003-08-14 2011-05-10 Compellent Technolgoies Virtual disk drive system and method
US7137031B2 (en) 2004-02-25 2006-11-14 Hitachi, Ltd. Logical unit security for clustered storage area networks
US20070028057A1 (en) * 2004-02-25 2007-02-01 Hitachi, Ltd. Logical unit security for clustered storage area networks
US8583876B2 (en) 2004-02-25 2013-11-12 Hitachi, Ltd. Logical unit security for clustered storage area networks
US20060041728A1 (en) * 2004-02-25 2006-02-23 Hitachi, Ltd. Logical unit security for clustered storage area networks
US7363535B2 (en) 2004-02-25 2008-04-22 Hitachi, Ltd. Logical unit security for clustered storage area networks
US7134048B2 (en) 2004-02-25 2006-11-07 Hitachi, Ltd. Logical unit security for clustered storage area networks
US7418624B2 (en) * 2004-06-29 2008-08-26 Hitachi, Ltd. Hot standby system
US20050289391A1 (en) * 2004-06-29 2005-12-29 Hitachi, Ltd. Hot standby system
US7496790B2 (en) * 2005-02-25 2009-02-24 International Business Machines Corporation Method, apparatus, and computer program product for coordinating error reporting and reset utilizing an I/O adapter that supports virtualization
US8086903B2 (en) 2005-02-25 2011-12-27 International Business Machines Corporation Method, apparatus, and computer program product for coordinating error reporting and reset utilizing an I/O adapter that supports virtualization
US20060195673A1 (en) * 2005-02-25 2006-08-31 International Business Machines Corporation Method, apparatus, and computer program product for coordinating error reporting and reset utilizing an I/O adapter that supports virtualization
US7886111B2 (en) 2006-05-24 2011-02-08 Compellent Technologies System and method for raid management, reallocation, and restriping
US10296237B2 (en) 2006-05-24 2019-05-21 Dell International L.L.C. System and method for raid management, reallocation, and restripping
US8230193B2 (en) 2006-05-24 2012-07-24 Compellent Technologies System and method for raid management, reallocation, and restriping
US9244625B2 (en) 2006-05-24 2016-01-26 Compellent Technologies System and method for raid management, reallocation, and restriping
US20090265493A1 (en) * 2008-04-16 2009-10-22 Mendu Krishna R Efficient Architecture for Interfacing Redundant Devices to a Distributed Control System
US7877625B2 (en) * 2008-04-16 2011-01-25 Invensys Systems, Inc. Efficient architecture for interfacing redundant devices to a distributed control system
US20110099416A1 (en) * 2008-04-16 2011-04-28 Mendu Krishna R Efficient Architecture for Interfacing Redundant Devices to a Distributed Control System
US8516296B2 (en) 2008-04-16 2013-08-20 Invensys Systems, Inc. Efficient architecture for interfacing redundant devices to a distributed control system
US20090319699A1 (en) * 2008-06-23 2009-12-24 International Business Machines Corporation Preventing Loss of Access to a Storage System During a Concurrent Code Load
US8819334B2 (en) 2009-07-13 2014-08-26 Compellent Technologies Solid state drive data storage system and method
US8468292B2 (en) 2009-07-13 2013-06-18 Compellent Technologies Solid state drive data storage system and method
US9146851B2 (en) 2012-03-26 2015-09-29 Compellent Technologies Single-level cell and multi-level cell hybrid solid state drive
US20160283303A1 (en) * 2015-03-27 2016-09-29 Intel Corporation Reliability, availability, and serviceability in multi-node systems with disaggregated memory
US11128578B2 (en) * 2018-05-21 2021-09-21 Pure Storage, Inc. Switching between mediator services for a storage system
US11677687B2 (en) 2018-05-21 2023-06-13 Pure Storage, Inc. Switching between fault response models in a storage system
US11757795B2 (en) 2018-05-21 2023-09-12 Pure Storage, Inc. Resolving mediator unavailability

Also Published As

Publication number Publication date
JP3628777B2 (en) 2005-03-16
EP0772127B1 (en) 2000-05-31
US6412078B2 (en) 2002-06-25
EP0772127A1 (en) 1997-05-07
US6052795A (en) 2000-04-18
DE69608641T2 (en) 2001-02-22
DE69608641D1 (en) 2000-07-06
US6321346B1 (en) 2001-11-20
JPH09128305A (en) 1997-05-16

Similar Documents

Publication Publication Date Title
US6052795A (en) Recovery method and system for continued I/O processing upon a controller failure
CA1172379A (en) Peripheral system having a data buffer for a plurality of peripheral devices, plural connections to each device and a priority of operations
US5758057A (en) Multi-media storage system
EP1019823B1 (en) Redundant controller diagnosis using a private lun
EP0747822B1 (en) External storage system with redundant storage controllers
US5491816A (en) Input/ouput controller providing preventive maintenance information regarding a spare I/O unit
GB1588806A (en) Input/output system for a multiprocessor system
WO2002023547A2 (en) Command read-back to verify integrity of command data written to disk drive
JPH07117903B2 (en) Disaster recovery method
US6643734B2 (en) Control device and control method for a disk array
US20040049710A1 (en) Maintaining data access during failure of a controller
US6389559B1 (en) Controller fail-over without device bring-up
JP2979771B2 (en) Information processing apparatus and bus control method thereof
US20090216930A1 (en) Information processing apparatus and control method thereof
JP4708669B2 (en) Path redundancy apparatus and method
JP3783560B2 (en) Information processing system
JP3311776B2 (en) Data transfer check method in disk subsystem
JP2001356881A (en) Multiplex storage controller
JP4499909B2 (en) Multiplexed storage controller
JP2815730B2 (en) Adapters and computer systems
JP2954078B2 (en) Data maintenance method and apparatus for disk array system
JPS58114152A (en) Back up device for magnetic disc
JPH01140357A (en) Memory access controller
JPS584365B2 (en) Reset control system
JPS6127793B2 (en)

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20140625