CN103019894A - Reconstruction method for redundant array of independent disks - Google Patents

Reconstruction method for redundant array of independent disks Download PDF

Info

Publication number
CN103019894A
CN103019894A CN2012105704971A CN201210570497A CN103019894A CN 103019894 A CN103019894 A CN 103019894A CN 2012105704971 A CN2012105704971 A CN 2012105704971A CN 201210570497 A CN201210570497 A CN 201210570497A CN 103019894 A CN103019894 A CN 103019894A
Authority
CN
China
Prior art keywords
disk
raid
write
read
raid system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105704971A
Other languages
Chinese (zh)
Other versions
CN103019894B (en
Inventor
金振成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Technology Co., Ltd.
Shenzhen Innovation Technology Co., Ltd.
Original Assignee
Innovation And Technology Storage Technology Co Ltd
UIT STORAGE TECHNOLOGY (SHENZHEN) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation And Technology Storage Technology Co Ltd, UIT STORAGE TECHNOLOGY (SHENZHEN) Co Ltd filed Critical Innovation And Technology Storage Technology Co Ltd
Priority to CN201210570497.1A priority Critical patent/CN103019894B/en
Publication of CN103019894A publication Critical patent/CN103019894A/en
Application granted granted Critical
Publication of CN103019894B publication Critical patent/CN103019894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a reconstruction method for a redundant array of independent disks (RAID). The method comprises the following steps of: (A) discovering that a first disk in the RAID system cannot give a response to input/output (IO) operation, independently powering off the first disk, and starting a timer of a preset time length by using a controller of a RAID system; (B) during the timing of the timer, performing normal read-write operation by using the RAID system, and recording the numbers of all stripe subjected to write operation; (C) when the timer is overtime, powering on the first disk; (D) after the first disk is electrified, performing read-write test operation on the first disk; (E) judging whether the first disk is read and written normally or not, executing a step (F) if the first disk is read and written normally, otherwise executing a step (G); (F) recovering data in corresponding stripes of the first disk according to the numbers of all the stripes subjected to the write operation during the outage of the first disk, and finishing the flow after the data is completely recovered; and (G) marking the first disk as a damaged disk, replacing the first disk by using a second disk which serves as a hot spare disk, performing calculation according to the data and parity check of other disks in the RAID system, and writing a calculation result into the second disk.

Description

A kind of method for reconstructing of raid-array
Technical field
The application relates to computer memory technical field, and particularly raid-array (RedundantArray ofIndependent Disks, RAID) technology relates in particular to a kind of method for reconstructing of raid-array.
Background technology
RAID be a kind of polylith independently disk combine by different modes and form a disk group (logic magnetic disc), thereby provide than the higher memory property of single disk and the technology of data redundancy protection be provided.The principle of RAID technology exactly data and corresponding parity information are stored on each disk that forms the RAID system, and parity information is stored in respectively on the different disks with corresponding data.After a data in magnetic disk of RAID system is damaged, utilize remaining data and corresponding parity information to go to recover impaired data.As basis and the critical component of network store system, RAID with its fast, the characteristics of magnanimity and high reliability and being celebrated.After the RAID technology occurs, very extensive in the application demand of the every field such as industry, military affairs, education, also be industrial hot spot to the research of RAID technology always.
The different modes that forms disk array is called RAID rank (RAID Levels).Such as common RAID rank RAID0 is arranged, RAID1, RAID5, RAID6 etc.Different RAID ranks provide different Data Protection Scheme.
The RAID5 that forms take 4 disks has only allowed a hard disk to break down as example, and when a disk broke down, RAID5 had not just possessed the data redundancy defencive function, so need to change as early as possible when breaking down dish.After changing faulty hard disk, Magnetic Disk Controller can utilize data and the parity checking on the normal disk to calculate, and the result who calculates is write on the new disk after the replacing, and this process is called the reconstruction of RAID.
The purpose of rebuilding is in order to allow RAID again have the data redundancy defencive function.When the disk failure of RAID occurring, disk array manufacturer generally realizes the automatic Reconstruction of RAID with the HotSpare disk technology.The HotSpare disk technology in simple terms, is exactly when creating the RAID system, and for this RAID specifies a disk as HotSpare disk, when certain piece member disk of RAID system broke down, HotSpare disk can the automatic replacement failed disk, triggers RAID and rebuilds.As its name suggests, " heat " standby dish does not need the read-write interrupted in the RAID system professional when replacing failed disk, namely during the RAID system reconstructing, still can carry out this RAID system is carried out read-write operation.
In the prior art, when input and output (IO) request on upper strata can not be coiled response by certain member of RAID system, can think all that generally this member's dish lost efficacy, the RAID system can start process of reconstruction automatically.The reconstruction operation expense of RAID system is large, the cycle is long, affects the performance of normal data I/O, and generally during rebuilding, if other disk failure is arranged, the RAID system is collapse directly, and then a little less than allowing the RAID system be highly brittle, therefore should avoid starting reconstruction operation as far as possible.
Summary of the invention
The application provides the method for reconstructing of a kind of RAID, can reduce as far as possible and carry out the probability that RAID rebuilds.
The method for reconstructing of a kind of RAID that the embodiment of the present application provides comprises:
The controller of A, RAID system finds that the first disk in this RAID system can't respond the IO operation, closes separately the power supply of the first disk, and starts the timer of a scheduled duration;
B, during described timer timing, the RAID system carries out normal read-write operation, and all bar reel numbers of write operation occured record during this period;
C, described timer expiry are opened the power supply of the first disk, power on for the first disk;
After D, the first disk power on, the first disk done carry out readwrite tests operation;
E, judge whether the first disk is read and write normally, if, carry out F, otherwise execution in step G;
F, according to the generation of the first disk turnoff time interocclusal record all bar reel numbers of write operation, recover data in the corresponding band of the first disk, be recovered rear process ends;
G, be low-quality disk with the first disk label, will replace the first disk as the second disk of HotSpare disk, calculate according to data and the parity checking of other disks in the RAID system, the result who calculates is write in the described second disk.
Preferably, described readwrite tests operation comprises:
D1, check whether online and driven being loaded in the operating system of the first disk, if not online then the first disk is low-quality disk; If continue online execution in step D2;
D2, this disk is sent " TEST UNIT READY " this scsi command chkdsk whether be ready to and read and write; If cannot read and write then disk is low-quality disk; If can execution in step D3;
D3, RAID metadata corresponding to the first disk that records in the operating system write the position of these disk corresponding element data, if write failure, judge that then the first disk is low-quality disk, successfully continue execution in step D4 if write;
D4, the first disk RAID metadata is done read operation, if the merit of being read as then the first disk is confirmed as dish, reading failure judges that then the first disk is low-quality disk.
As can be seen from the above technical solutions, when certain disk of RAID system can't respond the IO operation, at first it is carried out lower electric treatment, during lower electricity, allow application layer that RAID is is normally read and write, and all bar reel numbers of write operation occur during this period; Then this disk is carried out upper electric treatment, tests it and whether can normally read and write, if, according to the generation of record all bar reel numbers of write operation, begin to recover data in the corresponding band of this disk; Otherwise, be low-quality disk with this disk label, and start conventional process of reconstruction.In most of the cases can make in this way the disk of RAID system recover normal and need not to carry out reconstruction operation.
Description of drawings
The method for reconstructing process flow diagram of a kind of raid-array that Fig. 1 provides for the embodiment of the present application.
Embodiment
In most cases, the IO on upper strata request can not be coiled response by certain member of RAID system, is not that the disk that coils as this member has really damaged.According to disk producer Seagate corporate statistics, disk can not respond IO when request, and 95% situation is that these situations can make disk still effective by simple reparation operation because the software error of firmware, verification and so on causes; Only having in 5% the situation, is because disk really is damaged.Therefore, if in the inreal situation about damaging of disk, just the RAID system is started process of reconstruction, can greatly improve the operation and maintenance cost of RAID system.
The application provides a kind of method for reconstructing of raid-array, and its basic thought is: each the disk groove position interface in the RAID system provides the control disk to realize the circuit of independent upper and lower electricity; When certain disk of RAID system can't respond the IO operation, at first it is carried out lower electric treatment, during lower electricity, allow application layer that RAID is is normally read and write, and all bar reel numbers of write operation occur during this period; Then this disk is carried out upper electric treatment, tests it and whether can normally read and write, if, according to the generation of record all bar reel numbers of write operation, begin to recover data in the corresponding band of this disk; Otherwise, be low-quality disk with this disk label, and start conventional process of reconstruction.
Clearer for the know-why, characteristics and the technique effect that make the present techniques scheme, below in conjunction with specific embodiment the present techniques scheme is described in detail.
The method for reconstructing flow process of a kind of raid-array that the embodiment of the present application provides comprises the steps: as shown in Figure 1
The controller of step 101:RAID system finds that certain the piece disk in this RAID system can't respond the IO operation, closes separately the power supply of this disk, allows this disk cut off the power supply, and starts the timer of a scheduled duration.Below this disk is called the first disk.
Step 102: at (namely between the first disk turnoff time) during the described timer timing, the RAID system carries out normal read-write operation, and all bar reel numbers of write operation occured record during this period.
Step 103: described timer expiry, open the power supply of the first disk, power on for the first disk.
Step 104: after the first disk powers on, the first disk done carry out readwrite tests operation.
In the embodiment of the present application, readwrite tests is done following operation:
D1, check whether online and driven being loaded in the operating system of the first disk, if not online then the first disk is low-quality disk; If continue online execution in step D2;
D2, this disk is sent " TEST UNIT READY " this scsi command chkdsk whether be ready to and read and write; If cannot read and write then disk is low-quality disk; If can execution in step D3;
D3, RAID metadata corresponding to the first disk that records in the operating system write the position of these disk corresponding element data, if write failure, judge that then the first disk is low-quality disk, successfully continue execution in step D4 if write;
D4, the first disk RAID metadata is done read operation, if the merit of being read as then the first disk is confirmed as dish, reading failure judges that then the first disk is low-quality disk.
Step 105: judge whether the first disk is read and write normally, if, execution in step 106, otherwise execution in step 107.
Step 106: according to the generation of the first disk turnoff time interocclusal record all bar reel numbers of write operation, recover data in the corresponding band of the first disk, be recovered rear process ends.
Step 107: be low-quality disk with the first disk label, will replace the first disk as the second disk of HotSpare disk, calculate according to data and the parity checking of other disks in the RAID system, the result who calculates is write in the described second disk.
The above only is the application's preferred embodiment; not in order to limit the application's protection domain; all within the spirit and principle of present techniques scheme, any modification of making, be equal to replacement, improvement etc., all should be included within the scope of the application's protection.

Claims (2)

1. the method for reconstructing of a raid-array RAID is characterized in that, comprising:
The controller of A, RAID system finds that the first disk in this RAID system can't respond the IO operation, closes separately the power supply of the first disk, and starts the timer of a scheduled duration;
B, during described timer timing, the RAID system carries out normal read-write operation, and all bar reel numbers of write operation occured record during this period;
C, described timer expiry are opened the power supply of the first disk, power on for the first disk;
After D, the first disk power on, the first disk done carry out readwrite tests operation;
E, judge whether the first disk is read and write normally, if, carry out F, otherwise execution in step G;
F, according to the generation of the first disk turnoff time interocclusal record all bar reel numbers of write operation, recover data in the corresponding band of the first disk, be recovered rear process ends;
G, be low-quality disk with the first disk label, will replace the first disk as the second disk of HotSpare disk, calculate according to data and the parity checking of other disks in the RAID system, the result who calculates is write in the described second disk.
2. method according to claim 1 is characterized in that, described readwrite tests operation comprises:
D1, check whether online and driven being loaded in the operating system of the first disk, if not online then the first disk is low-quality disk; If continue online execution in step D2;
D2, this disk is sent " TEST UNIT READY " this scsi command chkdsk whether be ready to and read and write; If cannot read and write then disk is low-quality disk; If can execution in step D3;
D3, RAID metadata corresponding to the first disk that records in the operating system write the position of these disk corresponding element data, if write failure, judge that then the first disk is low-quality disk, successfully continue execution in step D4 if write;
D4, the first disk RAID metadata is done read operation, if the merit of being read as then the first disk is confirmed as dish, reading failure judges that then the first disk is low-quality disk.
CN201210570497.1A 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks Active CN103019894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210570497.1A CN103019894B (en) 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210570497.1A CN103019894B (en) 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks

Publications (2)

Publication Number Publication Date
CN103019894A true CN103019894A (en) 2013-04-03
CN103019894B CN103019894B (en) 2015-03-04

Family

ID=47968524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210570497.1A Active CN103019894B (en) 2012-12-25 2012-12-25 Reconstruction method for redundant array of independent disks

Country Status (1)

Country Link
CN (1) CN103019894B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103513942A (en) * 2013-10-21 2014-01-15 华为技术有限公司 Method and device for reconstructing independent redundancy array of inexpensive disks
CN103699855A (en) * 2013-12-05 2014-04-02 华为技术有限公司 Data processing method and data processing device
CN104111880A (en) * 2013-04-16 2014-10-22 华中科技大学 Quick single-disk failure recovery method for triple-erasure-correcting codes
CN105892950A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 Disk array reconstruction method and disk array reconstruction system
CN107301106A (en) * 2017-06-28 2017-10-27 郑州云海信息技术有限公司 The restoration methods and device of a kind of RAID system failure
WO2017220013A1 (en) * 2016-06-23 2017-12-28 中兴通讯股份有限公司 Service processing method and apparatus, and storage medium
CN114968129A (en) * 2022-07-28 2022-08-30 苏州浪潮智能科技有限公司 Disk array redundancy method, system, computer device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060156059A1 (en) * 2005-01-13 2006-07-13 Manabu Kitamura Method and apparatus for reconstructing data in object-based storage arrays
CN101840311A (en) * 2009-12-30 2010-09-22 创新科存储技术有限公司 Self-repairing method suitable for RAID system and RAID system
CN101840360A (en) * 2009-10-28 2010-09-22 创新科存储技术有限公司 Rapid reconstruction method and device of RAID (Redundant Array of Independent Disk) system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060156059A1 (en) * 2005-01-13 2006-07-13 Manabu Kitamura Method and apparatus for reconstructing data in object-based storage arrays
CN101840360A (en) * 2009-10-28 2010-09-22 创新科存储技术有限公司 Rapid reconstruction method and device of RAID (Redundant Array of Independent Disk) system
CN101840311A (en) * 2009-12-30 2010-09-22 创新科存储技术有限公司 Self-repairing method suitable for RAID system and RAID system

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111880A (en) * 2013-04-16 2014-10-22 华中科技大学 Quick single-disk failure recovery method for triple-erasure-correcting codes
CN104111880B (en) * 2013-04-16 2016-03-02 华中科技大学 A kind of forms data dish inefficacy fast reconstructing method holding three dish inefficacy correcting and eleting codes
WO2015058542A1 (en) * 2013-10-21 2015-04-30 华为技术有限公司 Reconstruction method and device for redundant array of independent disks
CN103513942B (en) * 2013-10-21 2016-06-29 华为技术有限公司 The reconstructing method of raid-array and device
CN103513942A (en) * 2013-10-21 2014-01-15 华为技术有限公司 Method and device for reconstructing independent redundancy array of inexpensive disks
CN103699855B (en) * 2013-12-05 2018-04-27 华为技术有限公司 A kind of data processing method and device
CN103699855A (en) * 2013-12-05 2014-04-02 华为技术有限公司 Data processing method and data processing device
CN105892950A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 Disk array reconstruction method and disk array reconstruction system
WO2017220013A1 (en) * 2016-06-23 2017-12-28 中兴通讯股份有限公司 Service processing method and apparatus, and storage medium
CN107544874A (en) * 2016-06-23 2018-01-05 南京中兴新软件有限责任公司 Method for processing business and device
CN107301106A (en) * 2017-06-28 2017-10-27 郑州云海信息技术有限公司 The restoration methods and device of a kind of RAID system failure
CN114968129A (en) * 2022-07-28 2022-08-30 苏州浪潮智能科技有限公司 Disk array redundancy method, system, computer device and storage medium
CN114968129B (en) * 2022-07-28 2022-12-06 苏州浪潮智能科技有限公司 Disk array redundancy method, system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN103019894B (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CN103019894B (en) Reconstruction method for redundant array of independent disks
CN102012847B (en) Improved disk array reconstruction method
TWI450087B (en) Data storage method for a plurality of raid systems and data storage system thereof
CN100392611C (en) Storage control apparatus and method
CN102184129B (en) Fault tolerance method and device for disk arrays
CN103019623B (en) Memory disc disposal route and device
CN105531677A (en) Raid parity stripe reconstruction
CN102207895B (en) Data reconstruction method and device of redundant array of independent disk (RAID)
CN104050056A (en) File system backup of multi-storage-medium device
TW200532449A (en) Efficient media scan operations for storage systems
CN104035830A (en) Method and device for recovering data
TW200540623A (en) System and method for drive recovery following a drive failure
CN105224891A (en) Magnetic disc optic disc fused data method for secure storing, system and device
CN104503781A (en) Firmware upgrading method for hard disk and storage system
CN101840311B (en) Self-repairing method suitable for RAID system and RAID system
US20050033933A1 (en) Systems and methods for modifying disk drive firmware in a raid storage system
US20060215456A1 (en) Disk array data protective system and method
CN109445982A (en) Realize the data storage device of data reliable read write
TWI386922B (en) Can improve the efficiency of the implementation of the hard disk device and its reading and writing methods
JP4698710B2 (en) Storage system and power saving method thereof
US8341468B2 (en) Information apparatus
CN109358984A (en) The storage device of data recovery is carried out using temperature equalization data reconstruction method
CN109375869A (en) Realize the method and system, storage medium of data reliable read write
CN109284201A (en) Temperature equalization data reconstruction method and system, storage medium
US11042298B2 (en) Access schemes for drive-specific read/write parameters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 518057 Shenzhen Software Park, No. 9, 501, 502, Science and Technology Middle Road, Nanshan District, Shenzhen City, Guangdong Province

Co-patentee after: Innovation Technology Co., Ltd.

Patentee after: Shenzhen Innovation Technology Co., Ltd.

Address before: 518057 Shenzhen Software Park, No. 9, 501, 502, Science and Technology Middle Road, Nanshan District, Shenzhen City, Guangdong Province

Co-patentee before: Innovation and Technology Storage Technology Co., Ltd.

Patentee before: UIT Storage Technology (Shenzhen) Co., Ltd.

CP01 Change in the name or title of a patent holder