WO1999023562A1 - Automatic backup based on disk drive condition - Google Patents

Automatic backup based on disk drive condition Download PDF

Info

Publication number
WO1999023562A1
WO1999023562A1 PCT/US1998/023152 US9823152W WO9923562A1 WO 1999023562 A1 WO1999023562 A1 WO 1999023562A1 US 9823152 W US9823152 W US 9823152W WO 9923562 A1 WO9923562 A1 WO 9923562A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk drive
drive device
backup
computer
user
Prior art date
Application number
PCT/US1998/023152
Other languages
French (fr)
Inventor
Mahmoud Assaf
Original Assignee
Gateway, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gateway, Inc. filed Critical Gateway, Inc.
Priority to CA002307212A priority Critical patent/CA2307212A1/en
Priority to JP2000519357A priority patent/JP2001522089A/en
Priority to AU12940/99A priority patent/AU1294099A/en
Publication of WO1999023562A1 publication Critical patent/WO1999023562A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3485Performance evaluation by tracing or monitoring for I/O devices

Definitions

  • the present invention relates generally to computer systems, and in particular to automated backup of disk drive data based on the condition of the disk drive.
  • Hard disk drives are complex electro-mechanical devices which can suffer performance degradation or failure due to a single event or a combination of events. Some hard disk drive failures happen quickly and without advance warning. Such unpredictable failures can be caused by static electricity, handling damage, or thermal-related solder problems. Other hard disk drive failures result from the gradual degradation of the drive's ability to perform. Hard disk drive failures result in lost data and lost time to a user trying to recover the lost data.
  • S.M.A.R.T. Self-Monitoring, Analysis and Reporting Technology
  • S.M.A.R.T. capable devices monitor a variety of information internal to the device to assess reliability and predict an impending device failure.
  • a S.M.A.R.T. capable drive might monitor the fly height of the head above the magnetic media. If the head starts to fly too high or too low, it is likely that the drive could fail.
  • Other drives may monitor different conditions such soft error rates which are errors that occur sporadically and may not appear on successive attempts to read data.
  • the monitoring techniques employed by S.M.A.R.T.-capable drives vary from one manufacturer to another.
  • the drive's S.M.A.R.T. capability makes information available through an interface to the disk drive.
  • the information may be presented to a user via drivers and supporting applications.
  • the information reaches an application that can display a warning message to a user.
  • the user is responsible for reacting to the warning message as desired.
  • present devices require the user, after a warning is given, to back-up vital data and replace suspect devices prior to data loss or unscheduled down time.
  • Summary of the Invention Backup of data on a personal computer is automatically initiated in response to selected information provided by disk drive performance monitoring.
  • performance monitoring capabilities in a disk drive provide information on potential impending failure or performance degradation.
  • the information is provided to an application such as a tape backup program.
  • the tape backup program initiates a tape backup of data on the disk drive.
  • the tape backup is initiated when the information is representative of predefined or user defined states of performance or other conditions which indicate an impending or possible failure.
  • the predefined states are defined to allow a normal backup prior to a predicted failure of the disk drive, and to ensure that the disk drive has sufficient performance to allow optimal data transfer rates during such a backup.
  • the tape backup program augments information normally provided by the self monitoring functions by indicating that the disk drive is being backed up at a particular time, and also indicate status of the backup and completion. If the user is not at the computer system, the tape backup program will automatically begin the backup by ensuring that a suitable media, such as a tape is in position in the tape drive. If not, it prompts the user to insert a tape.
  • the tape backup program allows a user to continue working, and backing up data real time, such as by use of any writable media, such as tape, diskette or zip drive until the potentially failing disk drive can be repaired.
  • the backup program also allows a user to leave a system unattended, with some assurance that potential disk drive failures are likely to be detected and data backed up without user intervention.
  • nonvolatile storage devices are used as a backup device, such as another disk drive, or a writable CD ROM.
  • the disk drive is backed up via a network connection to a server or other device having suitable storage capabilities.
  • Figure 1 is a block diagram of a computer system employing the present invention.
  • Figure 2 is a block diagram of functional modules used in one embodiment of the present invention.
  • Figure 3 is a flowchart depicting steps followed by the functional modules in Figure 2 to detect a potential failure condition and initiate a backup of the data in the potentially failing device.
  • Figure 4 is a flowchart depicting steps followed to determine if a backup is required based on prior backup history.
  • FIG. 1 A block diagram of a computer system 100 in Figure 1 will be described with respect to the present invention. Further details of software modules implementing the invention will be described with reference to Figure 2, and flowcharts depicting details of the process implemented by the modules and computer system will be described in Figures 3 and 4.
  • Computer system 100 in one embodiment is a typical personal computer and comprises a processor 110 coupled to a memory 112 and system controller 114.
  • the system controller is also coupled to the processor 110 and both the processor 110 and system controller 114 can access data in memory 112.
  • the system controller 114 is also coupled to a host bus 116.
  • Host bus 116 is also coupled to a plurality of peripheral devices comprising a disk drive 118, a tape drive 120, PCI device interface 122, a graphics controller 124 which is further coupled to a display device 126, and a keyboard/mouse controller 128 which in turn is coupled to a keyboard 130. All of these elements operate together in a well known manner, with software residing in memory 112 such as RAM, BIOS, DRAM or other memory being executed in processor 110.
  • System controller 114 provides an interface to the peripheral devices, allowing data transfers between the peripheral devices and to and from memory 112 without data having to first be routed through processor 110.
  • FIG. 2 a block diagram wherein the blocks represent program modules and devices shows blocks involved in detecting potential failures in disk drive device 118, permitting backup of data on disk drive 118 onto tape drive 120.
  • Predictive failure analysis functionality is provided on many disk drives that are available on the market today from disk drive vendors including IBM Corporation, Western Digital Corporation, Seagate and Quantum to name a few.
  • One industry standard for predictive failure analysis functionality is referred to as Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) as indicated in block form at 210.
  • Information regarding the operational characteristics of the disk drive 118 are provided at registers which are then polled by BIOS/Driver 212 and provided to an application agent 214.
  • Application agent 214 provides messages to a user regarding the status of the disk drive 118 and initiates a tape backup of data on the disk drive if it is determined that a failure of the disk drive is likely to occur within a set time.
  • Application agent 214 first ensures that proper media 216 is available for use by the tape drive 120, and if not, will prompt a user to insert suitable media such as a tape.
  • Application agent 214 then invokes operating system services 220 to start a backup program 222 which can be the same program as normally used to backup the disk drive 118.
  • Backup program 222 initiates the backup, and data from the disk drive is transferred to the tape as represented by a bus 218, such as a PCI bus. It should be noted that backup program 222 can be used to cause backup to any suitable storage device, whether local or remote via network.
  • Application agent 214 serves as a router between the bios 212 and the operating system.
  • Analysis block 210 monitors a range of attributes and sends attribute and threshold information to application agent 214 via registers. In normal operation, analysis block 210 then decides if an alert is warranted, and sends that message to the system, along with the attribute and threshold information.
  • the attribute and threshold level implementation varies with each disk drive vendor, and are based on historical failure analysis of data collected from information stored in disk drives that have failed. Attribute individualism is important because drive architectures vary from model to model. Attributes and thresholds that detect failure for one model may not be functional for another model.
  • Predictable failures are characterized by degradation of an attribute over time, before the disc drive fails. This creates a situation where attributes can be monitored, making it possible for predictive failure analysis. Many mechanical failures are typically considered predictable, such as the degradation of head flying height, which would indicate a potential head crash. Certain electronic failures may show degradation before failing, but more commonly, mechanical problems are gradual and predictable.
  • attributes are drive-specific, a variety of typical characteristics can be identified: head flying height, data throughput performance, spin-up time, re-allocated sector count, seek error rate, seek time performance, spin try recount, and drive calibration retry count to name a few. Others may be used in various disk drives dependent upon the design and historical failure information.
  • S.M.A.R.T. S.M.A.R.T.
  • S.M.A.R.T. emerged for the ATA/IDE environment when SFF-8035 was placed in the public domain.
  • SCSI drives incorporate a different industry standard specification, as defined in the ANSI-SCSI Informational Exception Control (IEC) document X3T10/94- 190.
  • the S.M.A.R.T. system technology of attributes and thresholds is similar in ATA/IDE and SCSI environments, but the reporting of information differs.
  • software on the host interprets the alarm signal from the drive generated by the "report status" command of S.M.A.R.T.
  • Application agent 214 polls the drive on a regular basis to check the status of this command, and if it signals imminent failure, sends an alarm to the end user or system administrator.
  • Application agent 214 evaluates the attributes and alarms reported, in addition to the "report status" command from the S.M.A.R.T. analysis block 210.
  • SCSI drives with reliability prediction capability only communicate a reliability condition as either good or failing.
  • the failure decision occurs at the disc drive as represented at analysis block 210, which notifies the user, and initiates tape backup.
  • the SCSI specification provides for a sense bit to be flagged if the disc drive determines that a reliability issue exists.
  • APIs are provided to set ATA registers in ATA/IDE disk drives supporting S.M.A.R.T. via BIOS/DRIVER 212 which is a BIOS or driver which is capable of sending S.M.A.R.T. commands to and receiving S.M.A.R.T. data from the ATA interface registers.
  • Application agent 214 such as a backup program is provided on top of the BIOS or driver to allow a user to control the S.M.A.R.T. device and monitor the status of that device.
  • Some subcommands and their respective codes include ENABLE/DISABLE ATTRIBUTE AUTOSAVE - code D2h, ENABLE S.M.A.R.T.
  • OPERATIONS - code D8h ENABLE S.M.A.R.T. OPERATIONS - code D9h, and RETURN S.M.A.R.T. STATUS - code DAh.
  • the RETURN S.M.A.R.T. STATUS subcommand is used to retrieve status information from one or more ATA registers.
  • steps taken to monitor the status of the disk drive 118 and respond are detailed. These steps may be implemented entirely in a device driver, BIOS or an application program, or spread therebetween. Most implementations will provide for status polling in a driver or BIOS, with other steps implemented in an application program written in any number of high level languages such as C++.
  • the drive registers or bit is polled.
  • a polling interval can be user defined or preset. A shorter time will provide a better chance of recovering if a failure is quick to develop, but it should be recognized that there are some modes of failure that are currently not predictable. The interval time should be selected to ensure significant system resources are not consumed by the polling and further processing activity associated with each poll.
  • the register value or values which comprise information regarding the status of the disk drive and attributes such as those listed previously are received and compared with predefined or user defined values.
  • only the status of the disk drive which in the case of SCSI devices is a single bit indicating potential failure condition. If a potential failure condition is either received or deduced from the attributes at 320, messages indicating such a failure condition being eminent are provided to the user or a system administrator at 322. If no failure condition is detected, control is returned to polling at 310.
  • tape backup is attempted starting at 324, where the tape drive is checked for suitable media such as a tape cartridge. If no media is detected, the user is prompted to insert such media at 328 and a wait state is entered at 330 until such media is detected as present. Following the detection of media at 324, a normal tape backup operation is begun at 336. Such operations are well known in the art and in the past have been user initiated or periodically performed during normal operation. Status of the backup operation via messaging facilities is provided to the user as indicated at 338 either before or during the tape backup operation. When the tape backup is completed at 344, an indication of the completion is provided to the user prior to end 346.
  • the user interacts with application agent 214 via function provided in the flowchart of Figure 4.
  • the user is provided an interface via command, graphical user interface, menu driven interface, voice or other constructs to enable or disable the automatic tape backup feature.
  • the user is permitted to edit the backup criteria via similar interface. This allows a user to attempt to ensure that the data throughput of the disk drive is still sufficient to provide data fast enough to keep the tape drive operating in a streaming mode. If the data transfer rate is too slow, the tape device may only be able to write one block at a time and then try to resynchronize the tape to write the next block of data after stopping and rewinding following the first block if the second block is not immediately available. Buffering techniques can be useful in ensuring that the tape drive operates in a streaming mode, but may not suffice if the performance of the disk drive has deteriorated too far.
  • the enable/disable and editing criteria interfaces may be combined into a single screen, which may also be combined with normal control of disk drive functions, such as via a control panel as is commonly used in personal computer operating environments or operating systems.
  • previous backup information which has been stored is interrogated and if the drive has been recently backed up as determined at 422, the backup feature is disabled for a selected period of time. Following this time, which is user definable but defaulted to approximately 24 hours, the backup feature is enabled at 430. The user may also set values at 412 to indicate that the backup feature should not be automatically enabled. If the disk drive has been recently backed up at 422, control is returned at 432.
  • the functions provided by blocks 418, 422 and 430 may also be performed on a periodic basis, which again can be user definable at 412.
  • tape drives have been specified in the embodiments described as the backup device, other devices may also be used, such as semiconductor memory devices, or even other disk drives on the same computer system or on a server or other networked computer or storage facility.
  • BIOS or the application can be provided by software, hardware or firmware as is well known to those skilled in the art, and the location of the provider of the functions is also a matter of well known design choice.
  • the present invention could be incorporated with other computer systems, such as a portable computers, servers, midrange computers or other computers.

Abstract

Backup of a personal computer is automatically initiated in response to disk drive performance monitoring software which predicts impending failure or performance degradation and provides messages of such to a tape backup program. A tape backup program initiates a tape backup of data on the disk drive in response to information provided by the disk drive performance monitoring software based on user defined states of performance or other conditions which indicate an impending or possible failure. The tape backup program augments messages normally provided by the self monitoring software by indicating that the disk drive is being backed up at a particular time, and also indicate status of the backup and completion.

Description

AUTOMATIC BACKUP BASED ON DISK DRIVE CONDITION
Field of the Invention
The present invention relates generally to computer systems, and in particular to automated backup of disk drive data based on the condition of the disk drive.
Background of the Invention Hard disk drives are complex electro-mechanical devices which can suffer performance degradation or failure due to a single event or a combination of events. Some hard disk drive failures happen quickly and without advance warning. Such unpredictable failures can be caused by static electricity, handling damage, or thermal-related solder problems. Other hard disk drive failures result from the gradual degradation of the drive's ability to perform. Hard disk drive failures result in lost data and lost time to a user trying to recover the lost data.
One way to protect against data loss associated with hard disk drive failure to use the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) The failures that result from the degradation of performance are the type of failures that S.M.A.R.T. is designed to predict. S.M.A.R.T. capable devices monitor a variety of information internal to the device to assess reliability and predict an impending device failure. For example, a S.M.A.R.T. capable drive might monitor the fly height of the head above the magnetic media. If the head starts to fly too high or too low, it is likely that the drive could fail. Other drives may monitor different conditions such soft error rates which are errors that occur sporadically and may not appear on successive attempts to read data. The monitoring techniques employed by S.M.A.R.T.-capable drives vary from one manufacturer to another.
When the S.M.A.R.T. capable drive predicts an impending failure, the drive's S.M.A.R.T. capability makes information available through an interface to the disk drive. The information may be presented to a user via drivers and supporting applications. The information reaches an application that can display a warning message to a user. The user is responsible for reacting to the warning message as desired. Thus, present devices require the user, after a warning is given, to back-up vital data and replace suspect devices prior to data loss or unscheduled down time.
However, a problem results if the user is not able to respond by backing- up the data before the failure occurs. One such a situation arises on workstations connected to a network if the user does not have the authority or the ability to back-up the data and replace the drive. Failure of the hard disk drive results in lost data, lost time and in many cases lost money. Further problems may be caused when computers are constantly left running, such as overnight, when a user is not normally monitoring the computer. Several times during normal working hours, the user may also be away from a running computer. There is a need for addressing disk drive problems when the user is not available. There is a further need for enhancing system reliability when a user is not attending the system.
Summary of the Invention Backup of data on a personal computer is automatically initiated in response to selected information provided by disk drive performance monitoring. In one embodiment, performance monitoring capabilities in a disk drive provide information on potential impending failure or performance degradation. The information is provided to an application such as a tape backup program. The tape backup program initiates a tape backup of data on the disk drive. The tape backup is initiated when the information is representative of predefined or user defined states of performance or other conditions which indicate an impending or possible failure. The predefined states are defined to allow a normal backup prior to a predicted failure of the disk drive, and to ensure that the disk drive has sufficient performance to allow optimal data transfer rates during such a backup. In one embodiment, the tape backup program augments information normally provided by the self monitoring functions by indicating that the disk drive is being backed up at a particular time, and also indicate status of the backup and completion. If the user is not at the computer system, the tape backup program will automatically begin the backup by ensuring that a suitable media, such as a tape is in position in the tape drive. If not, it prompts the user to insert a tape. The tape backup program allows a user to continue working, and backing up data real time, such as by use of any writable media, such as tape, diskette or zip drive until the potentially failing disk drive can be repaired. The backup program also allows a user to leave a system unattended, with some assurance that potential disk drive failures are likely to be detected and data backed up without user intervention.
In still further embodiments, other forms of nonvolatile storage devices are used as a backup device, such as another disk drive, or a writable CD ROM. In one variation, the disk drive is backed up via a network connection to a server or other device having suitable storage capabilities.
Brief Description of the Drawings Figure 1 is a block diagram of a computer system employing the present invention. Figure 2 is a block diagram of functional modules used in one embodiment of the present invention.
Figure 3 is a flowchart depicting steps followed by the functional modules in Figure 2 to detect a potential failure condition and initiate a backup of the data in the potentially failing device. Figure 4 is a flowchart depicting steps followed to determine if a backup is required based on prior backup history.
Description of the Embodiments In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that structural changes may be made without departing from the scope of the present invention. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
A block diagram of a computer system 100 in Figure 1 will be described with respect to the present invention. Further details of software modules implementing the invention will be described with reference to Figure 2, and flowcharts depicting details of the process implemented by the modules and computer system will be described in Figures 3 and 4.
Computer system 100 in one embodiment is a typical personal computer and comprises a processor 110 coupled to a memory 112 and system controller 114. The system controller is also coupled to the processor 110 and both the processor 110 and system controller 114 can access data in memory 112. The system controller 114 is also coupled to a host bus 116. Host bus 116 is also coupled to a plurality of peripheral devices comprising a disk drive 118, a tape drive 120, PCI device interface 122, a graphics controller 124 which is further coupled to a display device 126, and a keyboard/mouse controller 128 which in turn is coupled to a keyboard 130. All of these elements operate together in a well known manner, with software residing in memory 112 such as RAM, BIOS, DRAM or other memory being executed in processor 110. System controller 114 provides an interface to the peripheral devices, allowing data transfers between the peripheral devices and to and from memory 112 without data having to first be routed through processor 110.
Some of the programs that processor 110 executes include an operating system, application programs, peripheral device drivers and other modules or programs. In Figure 2, a block diagram wherein the blocks represent program modules and devices shows blocks involved in detecting potential failures in disk drive device 118, permitting backup of data on disk drive 118 onto tape drive 120. Predictive failure analysis functionality is provided on many disk drives that are available on the market today from disk drive vendors including IBM Corporation, Western Digital Corporation, Seagate and Quantum to name a few. One industry standard for predictive failure analysis functionality is referred to as Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) as indicated in block form at 210.
Information regarding the operational characteristics of the disk drive 118 are provided at registers which are then polled by BIOS/Driver 212 and provided to an application agent 214. Application agent 214 provides messages to a user regarding the status of the disk drive 118 and initiates a tape backup of data on the disk drive if it is determined that a failure of the disk drive is likely to occur within a set time. Application agent 214 first ensures that proper media 216 is available for use by the tape drive 120, and if not, will prompt a user to insert suitable media such as a tape. Application agent 214 then invokes operating system services 220 to start a backup program 222 which can be the same program as normally used to backup the disk drive 118. Backup program 222 initiates the backup, and data from the disk drive is transferred to the tape as represented by a bus 218, such as a PCI bus. It should be noted that backup program 222 can be used to cause backup to any suitable storage device, whether local or remote via network. Application agent 214 serves as a router between the bios 212 and the operating system.
Analysis block 210 monitors a range of attributes and sends attribute and threshold information to application agent 214 via registers. In normal operation, analysis block 210 then decides if an alert is warranted, and sends that message to the system, along with the attribute and threshold information. The attribute and threshold level implementation varies with each disk drive vendor, and are based on historical failure analysis of data collected from information stored in disk drives that have failed. Attribute individualism is important because drive architectures vary from model to model. Attributes and thresholds that detect failure for one model may not be functional for another model.
Predictable failures are characterized by degradation of an attribute over time, before the disc drive fails. This creates a situation where attributes can be monitored, making it possible for predictive failure analysis. Many mechanical failures are typically considered predictable, such as the degradation of head flying height, which would indicate a potential head crash. Certain electronic failures may show degradation before failing, but more commonly, mechanical problems are gradual and predictable.
Though attributes are drive-specific, a variety of typical characteristics can be identified: head flying height, data throughput performance, spin-up time, re-allocated sector count, seek error rate, seek time performance, spin try recount, and drive calibration retry count to name a few. Others may be used in various disk drives dependent upon the design and historical failure information.
There are currently two S.M.A.R.T. specifications which are being implemented in disk drives. S.M.A.R.T. emerged for the ATA/IDE environment when SFF-8035 was placed in the public domain. SCSI drives incorporate a different industry standard specification, as defined in the ANSI-SCSI Informational Exception Control (IEC) document X3T10/94- 190.
The S.M.A.R.T. system technology of attributes and thresholds is similar in ATA/IDE and SCSI environments, but the reporting of information differs. In an ATA/IDE environment, software on the host interprets the alarm signal from the drive generated by the "report status" command of S.M.A.R.T. Application agent 214 polls the drive on a regular basis to check the status of this command, and if it signals imminent failure, sends an alarm to the end user or system administrator. Application agent 214 evaluates the attributes and alarms reported, in addition to the "report status" command from the S.M.A.R.T. analysis block 210.
Generally speaking, SCSI drives with reliability prediction capability only communicate a reliability condition as either good or failing. In a SCSI environment, the failure decision occurs at the disc drive as represented at analysis block 210, which notifies the user, and initiates tape backup. The SCSI specification provides for a sense bit to be flagged if the disc drive determines that a reliability issue exists.
APIs are provided to set ATA registers in ATA/IDE disk drives supporting S.M.A.R.T. via BIOS/DRIVER 212 which is a BIOS or driver which is capable of sending S.M.A.R.T. commands to and receiving S.M.A.R.T. data from the ATA interface registers. Application agent 214, such as a backup program is provided on top of the BIOS or driver to allow a user to control the S.M.A.R.T. device and monitor the status of that device. Some subcommands and their respective codes include ENABLE/DISABLE ATTRIBUTE AUTOSAVE - code D2h, ENABLE S.M.A.R.T. OPERATIONS - code D8h, ENABLE S.M.A.R.T. OPERATIONS - code D9h, and RETURN S.M.A.R.T. STATUS - code DAh. The RETURN S.M.A.R.T. STATUS subcommand is used to retrieve status information from one or more ATA registers.
In Figure 3, steps taken to monitor the status of the disk drive 118 and respond are detailed. These steps may be implemented entirely in a device driver, BIOS or an application program, or spread therebetween. Most implementations will provide for status polling in a driver or BIOS, with other steps implemented in an application program written in any number of high level languages such as C++. At 310, the drive registers or bit is polled. A polling interval can be user defined or preset. A shorter time will provide a better chance of recovering if a failure is quick to develop, but it should be recognized that there are some modes of failure that are currently not predictable. The interval time should be selected to ensure significant system resources are not consumed by the polling and further processing activity associated with each poll. At 312, the register value or values which comprise information regarding the status of the disk drive and attributes such as those listed previously are received and compared with predefined or user defined values. In one embodiment, only the status of the disk drive, which in the case of SCSI devices is a single bit indicating potential failure condition. If a potential failure condition is either received or deduced from the attributes at 320, messages indicating such a failure condition being eminent are provided to the user or a system administrator at 322. If no failure condition is detected, control is returned to polling at 310.
Following detection of a potential failure condition, tape backup is attempted starting at 324, where the tape drive is checked for suitable media such as a tape cartridge. If no media is detected, the user is prompted to insert such media at 328 and a wait state is entered at 330 until such media is detected as present. Following the detection of media at 324, a normal tape backup operation is begun at 336. Such operations are well known in the art and in the past have been user initiated or periodically performed during normal operation. Status of the backup operation via messaging facilities is provided to the user as indicated at 338 either before or during the tape backup operation. When the tape backup is completed at 344, an indication of the completion is provided to the user prior to end 346.
Users interact with application agent 214 via function provided in the flowchart of Figure 4. At block 410, the user is provided an interface via command, graphical user interface, menu driven interface, voice or other constructs to enable or disable the automatic tape backup feature. At 412, the user is permitted to edit the backup criteria via similar interface. This allows a user to attempt to ensure that the data throughput of the disk drive is still sufficient to provide data fast enough to keep the tape drive operating in a streaming mode. If the data transfer rate is too slow, the tape device may only be able to write one block at a time and then try to resynchronize the tape to write the next block of data after stopping and rewinding following the first block if the second block is not immediately available. Buffering techniques can be useful in ensuring that the tape drive operates in a streaming mode, but may not suffice if the performance of the disk drive has deteriorated too far.
The enable/disable and editing criteria interfaces may be combined into a single screen, which may also be combined with normal control of disk drive functions, such as via a control panel as is commonly used in personal computer operating environments or operating systems. At 418, previous backup information which has been stored is interrogated and if the drive has been recently backed up as determined at 422, the backup feature is disabled for a selected period of time. Following this time, which is user definable but defaulted to approximately 24 hours, the backup feature is enabled at 430. The user may also set values at 412 to indicate that the backup feature should not be automatically enabled. If the disk drive has been recently backed up at 422, control is returned at 432. The functions provided by blocks 418, 422 and 430 may also be performed on a periodic basis, which again can be user definable at 412.
CONCLUSION A system for providing automatic backup of disk drive data upon detection of potential future failure of the disk drive has been described. It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. While the invention makes use of the predictive failure analysis capabilities described in S.M.A.R.T., other predictive failure analysis capabilities may also be used to provide an automated backup function. Such capabilities can also easily be integrated into other types of devices which store data and whose potential failure can be predicted, such as CD ROM devices and other devices which may not yet even be invented. Further, while tape drives have been specified in the embodiments described as the backup device, other devices may also be used, such as semiconductor memory devices, or even other disk drives on the same computer system or on a server or other networked computer or storage facility. Many of the functions provided by BIOS or the application can be provided by software, hardware or firmware as is well known to those skilled in the art, and the location of the provider of the functions is also a matter of well known design choice. Further, the present invention could be incorporated with other computer systems, such as a portable computers, servers, midrange computers or other computers.

Claims

What is claimed is:
1. A back up memory system comprising: a poller that polls a disk drive device which provides information regarding the status of the device; a comparator that compares the information with predetermined values; and a backup initiator that initiates a backup of data stored on the disk drive device.
2. The memory system of claim 1 and further comprising device registers which provide the information.
3. The memory system of claim 1 and further comprising a tape drive device coupled to the disk drive device.
4. The memory system of claim 3 and further comprising a messaging system that prompts a user of the memory system to provide suitable media for the tape drive device prior to initiating a backup of data stored on the disk drive device.
5. A computer system comprising: a disk drive device having integrated performance monitoring and status reporting capability; a tape drive device coupled to the disk drive device; a polling module that polls the disk drive to determine the current status of the disk drive device; and a tape drive module that initiates backup of data on the disk drive onto suitable media in the tape drive device based on the status of the disk drive device.
6. The computer system of claim 5 and further comprising a messaging system that prompts a user of the memory system to provide suitable media for the tape drive device prior to initiating a backup of data stored on the disk drive device.
7. The computer system of claim 5 wherein the disk drive device comprises a register, and wherein the polling module polls the register to determine the current status of the disk drive device.
8. The computer system of claim 5 wherein the tape drive module comprises a software application program.
9. The computer system of claim 5 and further comprising an interface module that provides a computer system user the ability to enable and disable initiation of backup of data.
10. A computer readable media comprising a computer program that when executed by a suitably configured computer system causes the computer system to perform the steps comprising: polling a disk drive device which provides information regarding the status of the device; comparing the information with selected values; and initiating a backup of data stored on the disk drive device.
11. The computer readable media of claim 10, wherein the computer program causes the computer to further perform the steps comprising: providing a user interface to edit the selected values.
12. The computer readable media of claim 11 , wherein the computer program causes the computer to further perform the steps comprising: providing a user interface to permit a user to enable and disable initiating backup of data stored on the disk drive device regardless of the comparison of the information to the selected values.
13. The computer readable media of claim 10, wherein the computer program causes the computer to further perform the steps comprising: checking a backup device for suitable media prior to initiating the backup of data stored on the disk drive device.
14. The computer readable media of claim 10, wherein the computer program causes the computer to further perform the steps comprising: providing a plurality of messages regarding status of the backup and disk drive.
15. A computer system comprising : a processor coupled to a memory; a system controller coupled to the processor and to the memory; a system bus coupled to the system controller; a display coupled to the system bus; a disk drive device coupled to the system bus and having integrated performance monitoring and status reporting capability; a tape drive device coupled the system bus; a polling module that polls the disk drive to determine the current status of the disk drive device; and a tape drive module that initiates backup of data on the disk drive onto suitable media in the tape drive device based on the status of the disk drive device.
16. The computer system of claim 15 and further comprising a messaging system that prompts a user of the memory system to provide suitable media for the tape drive device prior to initiating a backup of data stored on the disk drive device.
17. The computer system of claim 16 wherein the messaging system further notifies a user of the status of the backup of data.
18. The computer system of claim 15 wherein the tape drive module stores a history of backup activity and disables initiation of backup of data if a previous backup has been performed within a certain period of time.
19. The computer system of claim 15 wherein the disk drive device comprises a register, and wherein the polling module polls the register to determine the current status of the disk drive device.
20. The computer system of claim 15 and further comprising an interface module that provides a computer system user the ability to enable and disable initiation of backup of data.
PCT/US1998/023152 1997-11-03 1998-10-30 Automatic backup based on disk drive condition WO1999023562A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA002307212A CA2307212A1 (en) 1997-11-03 1998-10-30 Automatic backup based on disk drive condition
JP2000519357A JP2001522089A (en) 1997-11-03 1998-10-30 Automatic backup based on disk drive status
AU12940/99A AU1294099A (en) 1997-11-03 1998-10-30 Automatic backup based on disk drive condition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US96262497A 1997-11-03 1997-11-03
US08/962,624 1997-11-03

Publications (1)

Publication Number Publication Date
WO1999023562A1 true WO1999023562A1 (en) 1999-05-14

Family

ID=25506149

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/023152 WO1999023562A1 (en) 1997-11-03 1998-10-30 Automatic backup based on disk drive condition

Country Status (4)

Country Link
JP (1) JP2001522089A (en)
AU (1) AU1294099A (en)
CA (1) CA2307212A1 (en)
WO (1) WO1999023562A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004025650A1 (en) * 2002-09-16 2004-03-25 Seagate Technology, Inc. Predictive disc drive failure methodology
US6973553B1 (en) 2000-10-20 2005-12-06 International Business Machines Corporation Method and apparatus for using extended disk sector formatting to assist in backup and hierarchical storage management
WO2010010393A1 (en) * 2008-07-22 2010-01-28 Watkin Peter M Monitoring of backup activity on a computer system
WO2010080781A1 (en) * 2009-01-06 2010-07-15 Crawford Media Services, Inc. Systems and methods for monitoring archive storage condition and preventing the loss of archived data
US9176813B2 (en) 2012-05-23 2015-11-03 Fujitsu Limited Information processing apparatus, control method
US9229821B2 (en) 2013-11-13 2016-01-05 International Business Machines Corporation Reactionary backup scheduling around meantime between failures of data origination
US10157105B2 (en) * 2016-07-28 2018-12-18 Prophetstor Data Services, Inc. Method for data protection for cloud-based service system
US20190018727A1 (en) * 2017-07-17 2019-01-17 Seagate Technology Llc Data replication in a storage system
EP3457282A1 (en) * 2017-09-15 2019-03-20 ProphetStor Data Services, Inc. Method for data protection in cloud-based service system
EP3547139A1 (en) * 2018-03-30 2019-10-02 AO Kaspersky Lab System and method of assessing and managing storage device degradation
US10783042B2 (en) 2018-03-30 2020-09-22 AO Kaspersky Lab System and method of assessing and managing storage device degradation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007213670A (en) * 2006-02-08 2007-08-23 Funai Electric Co Ltd Hard disk device
JP6689959B2 (en) * 2016-03-30 2020-04-28 株式会社Kokusai Electric Substrate processing apparatus, processing system, and semiconductor device manufacturing method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5212784A (en) * 1990-10-22 1993-05-18 Delphi Data, A Division Of Sparks Industries, Inc. Automated concurrent data backup system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5212784A (en) * 1990-10-22 1993-05-18 Delphi Data, A Division Of Sparks Industries, Inc. Automated concurrent data backup system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"PLAYING IT S.M.A.R.T.", XP002096629, Retrieved from the Internet <URL:http://www.uglyware.com/Smart/moresmart.html> [retrieved on 19990312] *
B. TRAVIS: "Disk-drive-controller ICs provide board-level performance", EDN ELECTRICAL DESIGN NEWS., vol. 29, no. 25, 13 December 1984 (1984-12-13), NEWTON, MASSACHUSETTS US, pages 42 - 58, XP002096628 *
PENOKIE G.: "EXCEPTION HANDLING SELECTION MODE PAGE", XP002096630, Retrieved from the Internet <URL:ftp://ftp.symbios.com/pub/standards/io/t10/document.94/94-190r4pdf> [retrieved on 19950110] *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6973553B1 (en) 2000-10-20 2005-12-06 International Business Machines Corporation Method and apparatus for using extended disk sector formatting to assist in backup and hierarchical storage management
WO2004025650A1 (en) * 2002-09-16 2004-03-25 Seagate Technology, Inc. Predictive disc drive failure methodology
WO2010010393A1 (en) * 2008-07-22 2010-01-28 Watkin Peter M Monitoring of backup activity on a computer system
GB2474790A (en) * 2008-07-22 2011-04-27 Peter M Watkin Monitoring of backup activity on a computer system
GB2474790B (en) * 2008-07-22 2012-12-19 Peter M Watkin Monitoring of backup activity on a computer system
WO2010080781A1 (en) * 2009-01-06 2010-07-15 Crawford Media Services, Inc. Systems and methods for monitoring archive storage condition and preventing the loss of archived data
US9176813B2 (en) 2012-05-23 2015-11-03 Fujitsu Limited Information processing apparatus, control method
US9229821B2 (en) 2013-11-13 2016-01-05 International Business Machines Corporation Reactionary backup scheduling around meantime between failures of data origination
US10157105B2 (en) * 2016-07-28 2018-12-18 Prophetstor Data Services, Inc. Method for data protection for cloud-based service system
US20190018727A1 (en) * 2017-07-17 2019-01-17 Seagate Technology Llc Data replication in a storage system
US10783029B2 (en) * 2017-07-17 2020-09-22 Seagate Technology Llc Data replication in a storage system
EP3457282A1 (en) * 2017-09-15 2019-03-20 ProphetStor Data Services, Inc. Method for data protection in cloud-based service system
EP3547139A1 (en) * 2018-03-30 2019-10-02 AO Kaspersky Lab System and method of assessing and managing storage device degradation
US10783042B2 (en) 2018-03-30 2020-09-22 AO Kaspersky Lab System and method of assessing and managing storage device degradation

Also Published As

Publication number Publication date
CA2307212A1 (en) 1999-05-14
AU1294099A (en) 1999-05-24
JP2001522089A (en) 2001-11-13

Similar Documents

Publication Publication Date Title
US6058494A (en) Storage system with procedure for monitoring low level status codes, deriving high level status codes based thereon and taking appropriate remedial actions
US6401214B1 (en) Preventive recovery action in hard disk drives
US7991923B2 (en) Storage device condition reporting and error correction
US7340638B2 (en) Operating system update and boot failure recovery
US6263454B1 (en) Storage system
JP2005322399A (en) Maintenance method of track data integrity in magnetic disk storage device
WO1999023562A1 (en) Automatic backup based on disk drive condition
CN101582046B (en) High-available system state monitoring, forcasting and intelligent management method
US20090138740A1 (en) Method and computer device capable of dealing with power fail
US20080148094A1 (en) Managing storage stability
US8234235B2 (en) Security and remote support apparatus, system and method
US20080209254A1 (en) Method and system for error recovery of a hardware device
JP4798037B2 (en) Hard disk drive status monitoring device and hard disk drive status monitoring method
JP2008198322A5 (en)
US6684344B1 (en) Control unit of external storage, method for substituting defective block, and storage medium wherein control program for substituting defective block has been stored
JP2880701B2 (en) Disk subsystem
CN113901530A (en) Hard disk defensive early warning protection method, device, equipment and readable medium
CN115061641B (en) Disk fault processing method, device, equipment and storage medium
US20090070509A1 (en) Method of detecting and protecting falling portable computer hard disk through software monitoring driver
JP2001210027A (en) Hard disk device
EP0825537A1 (en) Error indication for a storage system with removable media
US7382559B2 (en) Recovery processing method for device specific information of medium storage device and medium storage device
US7454561B1 (en) Method for operating disk drives in a data storage system
JP3620984B2 (en) Computer automatic schedule control system, recording medium therefor, and computer automatic schedule control method
JPH0651918A (en) Semiconductor disk device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 12940/99

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2307212

Country of ref document: CA

Ref country code: CA

Ref document number: 2307212

Kind code of ref document: A

Format of ref document f/p: F

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 519357

Kind code of ref document: A

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase