CN102214210A - Method, device and system for processing repeating data - Google Patents

Method, device and system for processing repeating data Download PDF

Info

Publication number
CN102214210A
CN102214210A CN2011101257404A CN201110125740A CN102214210A CN 102214210 A CN102214210 A CN 102214210A CN 2011101257404 A CN2011101257404 A CN 2011101257404A CN 201110125740 A CN201110125740 A CN 201110125740A CN 102214210 A CN102214210 A CN 102214210A
Authority
CN
China
Prior art keywords
data
block
block data
data object
piecemeal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011101257404A
Other languages
Chinese (zh)
Other versions
CN102214210B (en
Inventor
段雨梅
谢勇
徐君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Huawei Technology Co Ltd
Original Assignee
Huawei Symantec Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Symantec Technologies Co Ltd filed Critical Huawei Symantec Technologies Co Ltd
Priority to CN201110125740.4A priority Critical patent/CN102214210B/en
Publication of CN102214210A publication Critical patent/CN102214210A/en
Application granted granted Critical
Publication of CN102214210B publication Critical patent/CN102214210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention provides a method, device and system for processing repeating data. The method comprises the following steps of: blocking data objects with a sliding window to obtain data of each block, wherein in order to obtain data of each block, the sliding start position of the sliding window jumps backwards from the ending position of data of a precious block by a minimum block length, the start position of the data of each block is the ending position of the data of a precious block, and the length of the data of each block is equal to the sum of the minimum block length and the sliding length of the sliding window in the processing process of a corresponding block; and matching the data of each block with stored data of a block in storage equipment, and deleting the data of a block if the data of the block is stored in the storage equipment, wherein data of the block stored in the storage equipment is taken as the data of a block in the data objects. By adopting the embodiment of the invention, the deleting efficiency of repeating data can be increased, and the storage requirement of gradually increasing data amount is met.

Description

Repeating data disposal route, device and system
Technical field
The embodiment of the invention relates to memory technology, relates in particular to a kind of repeating data disposal route, device and system.
Background technology
Data de-duplication is also referred to as Intelligent Compression or single instance storage, be a kind ofly can search for repeating data automatically, identical data is only kept a unique copy, and use the pointer that points to single copy to replace other duplicate copies, the memory technology that eliminate redundant data to reach, reduces storage capacity requirement.
In the prior art, data de-duplication method can adopt elongated piecemeal Content-Defined Chunking, hereinafter to be referred as: CDC) algorithm.Specifically, this method adopts fingerprint algorithm to calculate the fingerprint of the data object in the moving window, if satisfy predetermined condition,, realize piecemeal to the data object by continuous moving window and calculated fingerprint then with the starting position of this moving window and end position border as data block.For dividing the data block that obtains each time, need to judge that whether this data block is greater than the length lower limit earlier, if greater than this length lower limit, calculate the finger print information of this data block again, hash value for example, compare with the finger print information of storing in the memory device, if a certain finger print information of storing in the finger print information of this data block and the memory device is identical, illustrate that then this data block is the repeating data piece, stored the data block identical in the memory device with this data block, therefore, this data object can reference stores equipment in data blocks stored, if there be not the finger print information identical in the memory device with the finger print information of this data block, then this data block and finger print information thereof can be stored in the memory device, judge in order to follow-up repeating data.
But the inventor finds, it is lower to adopt the CDC algorithm to carry out the efficient of data de-duplication, the storage demand that can't the applicable data amount increases gradually.
Summary of the invention
The embodiment of the invention provides a kind of repeating data disposal route, device and system, and to adopt the CDC algorithm to carry out the efficient of data de-duplication lower to solve prior art, the storage demand that can't the applicable data amount increases gradually.
The embodiment of the invention provides a kind of repeating data disposal route, comprising:
Adopting moving window that the data object is carried out piecemeal handles, obtain each block data, for obtaining each block data, the slip reference position of described moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals described minimum branch block length and the sliding length sum of described moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data;
The block data of having stored in each block data and the memory device is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
The embodiment of the invention provides a kind of repeating data treating apparatus, comprising:
The piecemeal processing module, being used to adopt moving window that the data object is carried out piecemeal handles, obtain each block data, for obtaining each block data, the slip reference position of described moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals described minimum branch block length and the sliding length sum of described moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data;
Data processing module, be used for the block data that each block data and memory device have been stored is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
The embodiment of the invention provides a kind of repeating data disposal system, comprises memory device and above-mentioned repeating data treating apparatus, and described memory device is used for memory partitioning data and the finger print information corresponding with described block data.
The embodiment of the invention, by controlled moving window before carrying out the piecemeal processing at every turn, the minimum branch block length of jumping is earlier carried out piecemeal again and is handled, the length of the block data after making each piecemeal handle is all at least greater than minimum branch block length, thereby make each piecemeal processing need not to judge once more block data when finishing length whether greater than minimum branch block length but can directly carry out next time piecemeal and handle, save the processing time, guaranteed the continuity between each time piecemeal processing procedure.And this process makes moving window need not all bytes of ergodic data object, but all skips the byte of some before each piecemeal is handled.Therefore, the embodiment of the invention can improve the treatment effeciency of data de-duplication, satisfies the storage demand that data volume constantly increases.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do one to the accompanying drawing of required use in embodiment or the description of the Prior Art below introduces simply, apparently, accompanying drawing in describing below is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the process flow diagram of an embodiment of repeating data disposal route of the present invention;
Fig. 2 carries out the process synoptic diagram that the 5th piecemeal is handled for using method embodiment shown in Figure 1 to the data object;
Fig. 3 is the process flow diagram of another embodiment of repeating data disposal route of the present invention;
Fig. 4 is the structural representation of an embodiment of repeating data treating apparatus of the present invention;
Fig. 5 is the structural representation of another embodiment of repeating data treating apparatus of the present invention;
Fig. 6 is the structural representation of an embodiment of repeating data disposal system of the present invention.
Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
Fig. 1 is the process flow diagram of an embodiment of repeating data disposal route of the present invention, and as shown in Figure 1, the method for present embodiment can comprise:
Step 101, employing moving window carry out piecemeal to the data object to be handled, obtain each block data, for obtaining each block data, the slip reference position of moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals minimum branch block length and the sliding length sum of moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data.
In the present embodiment, the repeating data treating apparatus can adopt moving window that the data object is carried out piecemeal and handle, this repeating data treating apparatus can be equipment such as memory controller, present embodiment does not limit this repeating data treating apparatus and adopts which kind of device, it does not limit the concrete structure of this repeating data treating apparatus, as long as can carry out data processing yet.
Specifically, in existing C DC algorithm, moving window needs each byte of ergodic data object in sliding process, and, when each piecemeal processing procedure finishes, the repeating data treating apparatus need judge that whether the length of the block data that the piecemeal processing is obtained is greater than minimum branch block length, only divide block length greater than minimum, this block data just meets default piecemeal condition, with control follow-up with block data with stored the calculating granularity that data compare, so-called minimum branch block length is the lower limit length that piecemeal is handled each block data that obtains, and the minimum block length of dividing is more little, and the piecemeal granularity is more little.But this deterministic process all can carried out after each piecemeal is handled, and moving window needs all bytes of ergodic data object, causes the efficient of data de-duplication lower, the storage demand that can't the applicable data amount increases gradually.
By contrast, in the present embodiment, before the data object being carried out the piecemeal processing at every turn, the slip reference position that the repeating data treating apparatus can be controlled moving window is skipped minimum branchs block length backward from the end position of previous block data and is begun slip, therefore the length of block data is inevitable divides block length greater than minimum, thereby saved length that prior art finishes to judge block data in the back in the processing of moving window piecemeal whether greater than the process of minimum branch block length, guaranteed the continuity between each time piecemeal processing procedure.Need to prove that present embodiment does not limit minimum branch block length, those skilled in the art can be as required, and for example a plurality of factors such as size of piecemeal granularity, piecemeal treatment effeciency, data object are set.
Fig. 2 carries out the process synoptic diagram that the 5th piecemeal is handled for using method embodiment shown in Figure 1 to the data object, and as shown in Figure 2, L1 divides block length for default minimum; L2 is the sliding length of moving window in the 5th piecemeal processing procedure; L3 handles the 5th block data that obtains for the 5th piecemeal; P1 is the end position of the 4th block data, also is the reference position of the 5th block data, or the processing of the 4th piecemeal finishes the end position of back moving window; The P2 slip reference position that to be moving window handle at the 5th piecemeal, this slip reference position is in jump the backward position of minimum piecemeal length L 1 of the end position P1 of the 4th block data; P3 is the end position of the moving window after the processing of the 5th piecemeal finishes.Hence one can see that, and the length of the 5th block data equals minimum piecemeal length L 1 and the sliding length L2 sum of moving window in the 5th piecemeal processing procedure.
Handle for the 1st piecemeal, its with follow-up each time piecemeal handle different place be, the position of the slip reference position of moving window after the reference position of described data object is jumped described minimum branch block length backward, it is similar that all the other processes and above-mentioned piecemeal are handled, and repeats no more herein.
Step 102, the block data of having stored in each block data and the memory device is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
For instance, present embodiment can adopt the mode of finger print information coupling, determines whether block data is stored in the memory device.The repeating data treating apparatus can calculate first finger print information of this block data after obtaining block data, this first finger print information can characterize the characteristic information of this block data.Present embodiment does not need to limit the repeating data treating apparatus and specifically adopts which kind of checking algorithm to calculate first finger print information of block data.First finger print information that algorithm that those skilled in the art only need be adopted according to first finger print information of the block data correspondence of storing in the memory device or Matching Algorithm are calculated the block data that promptly will store gets final product.For instance, this first finger print information can for block data is carried out cryptographic hash that hash conversion obtains, block data is carried out cyclic redundancy check (CRC), and (cyclic redundancy check is hereinafter to be referred as the CRC check sign indicating number that CRC) obtains etc.
Processing procedure at a block data in the data object, owing to stored a large amount of block datas and the first corresponding finger print information in the memory device, therefore, the repeating data treating apparatus can judge whether this block data has been stored in the memory device, and the means of its judgement are a large amount of finger print informations that will store in first finger print information of this block data and the memory device and compare.If there has been first finger print information of this block data in memory device, also promptly represent and stored this block data in this memory device, for fear of repeated storage, this repeating data treating apparatus can be with the block data in the memory device as the block data in the data object to be stored, and the block data of this data object itself then can be deleted.At adopting what form with the block data of the block data in the memory device as current data object to be stored, present embodiment does not limit, and those skilled in the art can adopt existing techniques in realizing, for example adopt modes such as pointed.If there is not the finger print information of this block data in the memory device, also promptly represent and do not store this block data in this memory device, then the repeating data treating apparatus can be after whole data objects processing be finished or after this block data is divided out, with this block data with and first finger print information deposit in the memory device, in order to the follow-up usefulness that judges whether to store other block data.Need to prove, memory device has been stored first finger print information of this block data, its form of expression can be with memory device in the finger print information stored be first finger print information that equates, it also can be first finger print information that is complementary, present embodiment does not limit, as long as can determine whether this block data is stored in the memory device according to first finger print information of this block data and first finger print information of having stored, and the present embodiment type that do not limit this memory device with and application.
Experiment shows, is 32KB in the average mark block length, and minimum branchs block length is under the situation of 16KB, and the method for employing present embodiment can improve 50% piecemeal efficient.
Present embodiment, by controlled moving window before carrying out the piecemeal processing at every turn, the minimum branch block length of jumping is earlier carried out piecemeal again and is handled, the length of the block data after making each piecemeal handle is all at least greater than minimum branch block length, thereby make each piecemeal processing need not to judge once more block data when finishing length whether greater than minimum branch block length but can directly carry out next time piecemeal and handle, save the processing time, guaranteed the continuity between each time piecemeal processing procedure.And this process makes moving window need not all bytes of ergodic data object, but all skips the byte of some before each piecemeal is handled.Therefore, present embodiment can improve the treatment effeciency of data de-duplication, satisfies the storage demand that data volume constantly increases.
In the specific implementation process of method embodiment shown in Figure 1, the process that deblocking is handled can adopt following dual mode to realize:
Mode one, employing moving window slide into the other end from an end of data object, the data object is carried out piecemeal handle, and obtain each block data.
Specifically, this piecemeal processing mode one can only adopt a moving window to slide into the other end from an end of whole data object serially, handles the data object is carried out piecemeal.
Mode two, employing moving window walk abreast and slide into the other end from an end of each data area of data object, the data object is carried out piecemeal handle, obtain each block data, the corresponding moving window in each data area, wherein, after being cut apart, the data object obtains each data area.
Specifically, this piecemeal processing mode two can be divided into data object a plurality of data areas, for each data area, all can adopt a moving window to slide into the other end from an end of this data area, therefore, piecemeal processing mode two is with the difference of piecemeal processing mode one, piecemeal processing mode two can adopt a plurality of moving windows to carry out the division of data block concurrently, therefore, piecemeal processing mode two is with respect to piecemeal processing mode one, and treatment effeciency is higher.
Need to prove that present embodiment does not limit the dividing mode that data object is divided into a plurality of data areas, for instance, can random division, average according to the length of data object and to divide or to divide etc. according to the concurrent processing ability of processor.
In addition, present embodiment does not limit the glide direction of the moving window corresponding with each data area yet, for instance, each moving window all can adopt from the left end of data area and slide into right-hand member or slide into left end from right-hand member, and perhaps some moving window slides into right-hand member and some moving window slides into left end from the right-hand member of data area from the left end of data area.
Aforesaid way one or mode two are applied among the method embodiment shown in Figure 1, and the process of among the method embodiment shown in Figure 1 the block data of having stored in each block data and the memory device being carried out matching treatment can comprise following dual mode:
Mode one, in the data object being carried out the process that piecemeal handles, divide block data on one side, Yi Bian carry out matching treatment, also be the piecemeal processing process can with the process executed in parallel of matching treatment.
Mode two, after finishing whole piecemeals of data object are handled, whole block datas are carried out matching treatment.
By this dual mode as can be known, the mode one of matching treatment is applicable to that the block data that obtains after each piecemeal is handled can directly carry out matching treatment and need not to consider the situation of other block datas; Second the mode of matching treatment is applicable to that the block data that obtains after piecemeal is handled may not directly carry out matching treatment and the situation that also needs the whole block datas that obtain are taken all factors into consideration.This is taken all factors into consideration and may relate to each block data and whether surpassed maximum minute block length, whether had block data that needs merge etc.Therefore, the embodiment of the invention alternatively, after the data object being carried out the deblocking processing, if there be the block data of length in the block data that obtains greater than maximum branch block length, then the repeating data treating apparatus can also carry out dividing processing to the block data that surpasses maximum branch block length, thereby the length of control block data is no more than maximum branch block length.
By the description of technique scheme as can be known, carry out repeating data at whole data object to handle no matter be, still carrying out repeating data at a part of data area in the data object handles, its processing procedure at a block data is similarly, only is elaborated at the processing procedure of a block data below.
Fig. 3 is the process flow diagram of another embodiment of repeating data disposal route of the present invention, and as shown in Figure 3, the method for present embodiment still adopts process shown in Figure 2 to describe, and the method for present embodiment can comprise:
The reference position of step 301, moving window jumps to P2.
Step 302, this moving window that slides, and second finger print information of the data object in the calculating moving window.
Whether second finger print information slides into the edge of current block data in order to judge moving window.Specifically, present embodiment is still with the 5th block data shown in Figure 2, and second finger print information that present embodiment adopts fingerprint algorithm to obtain determines whether this moving window arrives the edge of the 5th block data in the data object.About obtaining of second finger print information, present embodiment is that those skilled in the art can adopt existing techniques in realizing, repeat no more herein by the employing fingerprint algorithm.
When specific implementation, the sliding step of moving window can be preset, handle for a piecemeal, if sliding step is less, slide and once possibly can't arrive the edge of block data, then may need repeatedly moving window to be slided according to default step-length, could arrive the edge of the block data of this piecemeal processing.
Step 303, judge according to this second finger print information whether moving window slides into the edge of the 5th block data, if execution in step 304 then, otherwise, execution in step 302.
The repeating data treating apparatus all can judge whether moving window slides into the edge of block data according to second finger print information of the data object in the current moving window behind moving window each time.When specific implementation, present embodiment can be preset the piecemeal condition, should default piecemeal condition can set according to the continuity Characteristics value of data object, when moving window slides into the position that two block datas have a common boundary, the continuity Characteristics value of the data object in this moving window is less, and when moving window slided into an a part of position in the block data, then the continuity Characteristics value of the data object in this moving window was bigger.Therefore, if second finger print information that calculates does not satisfy default piecemeal condition, also even second finger print information less than default continuity Characteristics value, illustrate that then this moving window has slided into the intersection of two block datas, also promptly slide into the edge of current block data, then the repeating data treating apparatus can know that this piecemeal processing finishes, if second finger print information is greater than default continuity Characteristics value, the current continuous position that may also be in a block data of moving window then is described, then the repeating data treating apparatus can continue moving window, repeat above-mentioned steps, till the edge that finds the 5th block data.
Step 304, with P1 to the data object between P3 as the 5th block data.
First finger print information of step 305, the 5th block data of calculating.
Step 306, judge whether stored this first finger print information in the memory device, if execution in step 307 then, otherwise execution in step 308.
Step 307, from data object the 5th block data of deletion, and the block data of having stored that will be corresponding with this first finger print information carries out piecemeal processing procedure next time as the 5th block data.
Step 308, the 5th block data and the first corresponding finger print information are stored in the memory device.
Present embodiment, by fingerprint algorithm the data object being carried out piecemeal handles, and by controlling moving window after each piecemeal processing finishes, the minimum branch block length of jumping is carried out the sliding process of piecemeal processing next time again, whether the length that need not to judge once more block data when making the piecemeal processing finish is greater than minimum branch block length but can directly carry out next time piecemeal and handle, thereby saved the processing time, guaranteed the continuity between each time piecemeal processing procedure.And this process makes moving window need not all bytes of ergodic data object, but all skips the byte of some before each piecemeal is handled, thereby has improved the treatment effeciency of data de-duplication, has satisfied the storage demand that data volume constantly increases.
Fig. 4 is the structural representation of an embodiment of repeating data treating apparatus of the present invention, and as shown in Figure 4, the device of present embodiment can comprise: piecemeal processing module 11 and data processing module 12, wherein:
Piecemeal processing module 11 is used to adopt moving window that the data object is carried out the piecemeal processing, obtain each block data, for obtaining each block data, the slip reference position of described moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals described minimum branch block length and the sliding length sum of described moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data;
Data processing module 12 is used for the block data that each block data and memory device have been stored is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
The device of present embodiment can be used to carry out the method for method embodiment shown in Figure 1, and its realization principle and technique effect are similar, repeat no more herein.
Fig. 5 is the structural representation of another embodiment of repeating data treating apparatus of the present invention, as shown in Figure 5, further, piecemeal processing module 11 comprises the device of present embodiment on the basis of apparatus structure shown in Figure 4: the first piecemeal processing unit 111 and the second piecemeal processing unit 112, wherein:
The first piecemeal processing unit 111 is used to adopt moving window to slide into the other end from an end of described data object, described data object is carried out piecemeal handle, and obtains each block data;
The second piecemeal processing unit 112, be used to adopt an end of parallel each data area from described data object of moving window to slide into the other end, described data object is carried out piecemeal to be handled, obtain each block data, the corresponding moving window in each data area, wherein, after being cut apart, the data object obtains each data area.
The device of present embodiment can further include: data area determination module 13, this data area determination module 13 is used for according to concurrent processing ability and/or data object size, determines each data area in the described data object.Alternatively, the device of present embodiment can also comprise: data block is cut apart module 14, is used for the block data that surpasses maximum branch block length is carried out dividing processing.When specific implementation, the first piecemeal processing unit 111 or the second piecemeal processing unit 112 specifically are used to the described moving window that slides, and calculate second finger print information of the data object in the described moving window; Determine according to described second finger print information whether described moving window slides into the edge of the block data of this deblocking processing, if then will handle the block data that obtains as this deblocking to the data object the current end position of described moving window from the end position of previous block data; Otherwise, continue to carry out the described moving window of above-mentioned slip, and calculate the second finger print information step of the data object in the described moving window, slide into up to described moving window till the edge of the block data that this deblocking handles.
In the device of present embodiment, data processing module 12 may further include: fingerprint computing unit 121 and deletion processing unit 122, wherein,
Fingerprint computing unit 121 is used for calculating first finger print information of the block data of described data object;
Deletion processing unit 122, be used for if described memory device has been stored described first finger print information, then from described data object, delete the block data of the described first finger print information correspondence, and the block data of having stored that will be corresponding with described first finger print information is as block data deleted in the described data object.
Need to prove, also can only comprise a unit in the first piecemeal processing unit 111 and the second piecemeal processing unit 112 in the piecemeal processing module 11, wherein, the first piecemeal processing unit 111 can be used to realize above-mentioned piecemeal processing mode one, the second piecemeal processing unit 112 can be used to realize above-mentioned piecemeal processing mode two, its realization principle and technique effect are similar, repeat no more herein.Fingerprint computing unit 121 and deletion processing unit 122 can be used to realize above-mentioned matching treatment mode one or matching treatment mode two, and its realization principle and technique effect are similar, repeat no more herein.When specific implementation, the device of present embodiment can be used to carry out the method for method embodiment shown in Figure 3, and its realization principle and technique effect are similar, repeat no more herein.
Fig. 6 is the structural representation of an embodiment of repeating data disposal system of the present invention, as shown in Figure 6, the system of present embodiment can comprise memory device 1 and repeating data treating apparatus 2, this memory device 1 is used for memory partitioning data and first finger print information corresponding with described block data, this repeating data treating apparatus 2 can adopt Fig. 4 or structure shown in Figure 5, it can be used for the method for execution graph 1 or method embodiment shown in Figure 3, its realization principle and technique effect are similar, repeat no more herein.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (11)

1. a repeating data disposal route is characterized in that, comprising:
Adopting moving window that the data object is carried out piecemeal handles, obtain each block data, for obtaining each block data, the slip reference position of described moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals described minimum branch block length and the sliding length sum of described moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data;
The block data of having stored in each block data and the memory device is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
2. method according to claim 1 is characterized in that, described employing moving window carries out piecemeal to the data object to be handled, and obtains each block data, comprising:
Adopt moving window to slide into the other end, described data object is carried out piecemeal handle, obtain each block data from an end of described data object; Perhaps,
The employing moving window walks abreast and slides into the other end from an end of each data area of described data object, described data object is carried out piecemeal handle, obtain each block data, the corresponding moving window in each data area, wherein, after being cut apart, the data object obtains each data area.
3. method according to claim 2 is characterized in that, described employing moving window walks abreast and slides into the other end from an end of each data area of described data object, described data object is carried out piecemeal handle, and obtains before each block data, also comprises:
According to concurrent processing ability and/or data object size, determine each data area in the described data object.
4. method according to claim 2 is characterized in that, described obtaining after each block data also comprises:
The block data that surpasses maximum branch block length is carried out dividing processing.
5. according to the described method of arbitrary claim in the claim 1~4, it is characterized in that, described the block data of having stored in each block data and the memory device is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object, comprising:
In the process of described data object being carried out the piecemeal processing, perhaps after the whole piecemeals processing that finish described data object, carry out following operation:
Calculate first finger print information of the block data in the described data object;
If described memory device has been stored described first finger print information, then from described data object, delete the block data of the described first finger print information correspondence, and the block data of having stored that will be corresponding with described first finger print information is as block data deleted in the described data object.
6. a repeating data treating apparatus is characterized in that, comprising:
The piecemeal processing module, being used to adopt moving window that the data object is carried out piecemeal handles, obtain each block data, for obtaining each block data, the slip reference position of described moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals described minimum branch block length and the sliding length sum of described moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data;
Data processing module, be used for the block data that each block data and memory device have been stored is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
7. device according to claim 6 is characterized in that, described piecemeal processing module comprises following at least one unit:
The first piecemeal processing unit is used to adopt moving window to slide into the other end from an end of described data object, described data object is carried out piecemeal handle, and obtains each block data;
The second piecemeal processing unit, be used to adopt an end of parallel each data area from described data object of moving window to slide into the other end, described data object is carried out piecemeal to be handled, obtain each block data, the corresponding moving window in each data area, wherein, after being cut apart, the data object obtains each data area.
8. device according to claim 7 is characterized in that, also comprises:
The data area determination module is used for according to concurrent processing ability and/or data object size, determines each data area in the described data object.
9. device according to claim 7 is characterized in that, also comprises:
Data block is cut apart module, is used for the block data that surpasses maximum branch block length is carried out dividing processing.
10. according to the described device of arbitrary claim in the claim 6~10, it is characterized in that described data processing module comprises:
The fingerprint computing unit is used for calculating first finger print information of the block data of described data object;
The deletion processing unit, be used for if described memory device has been stored described first finger print information, then from described data object, delete the block data of the described first finger print information correspondence, and the block data of having stored that will be corresponding with described first finger print information is as block data deleted in the described data object.
11. a repeating data disposal system is characterized in that, comprises each described repeating data treating apparatus in memory device and the claim 6~10;
Described memory device is used for memory partitioning data and first finger print information corresponding with described block data.
CN201110125740.4A 2011-05-16 2011-05-16 Method, device and system for processing repeating data Active CN102214210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110125740.4A CN102214210B (en) 2011-05-16 2011-05-16 Method, device and system for processing repeating data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110125740.4A CN102214210B (en) 2011-05-16 2011-05-16 Method, device and system for processing repeating data

Publications (2)

Publication Number Publication Date
CN102214210A true CN102214210A (en) 2011-10-12
CN102214210B CN102214210B (en) 2013-03-13

Family

ID=44745518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110125740.4A Active CN102214210B (en) 2011-05-16 2011-05-16 Method, device and system for processing repeating data

Country Status (1)

Country Link
CN (1) CN102214210B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication
CN103049263A (en) * 2012-12-12 2013-04-17 华中科技大学 Document classification method based on similarity
CN103078709A (en) * 2013-01-05 2013-05-01 中国科学院深圳先进技术研究院 Data redundancy identifying method
CN103150260A (en) * 2011-11-25 2013-06-12 华为数字技术(成都)有限公司 Method and device for deleting repeating data
WO2013159631A1 (en) * 2012-04-23 2013-10-31 华为技术有限公司 Method and device for data block
CN103530310A (en) * 2012-07-03 2014-01-22 国际商业机器公司 Sub-block partitioning method and system for hash-based deduplication
CN103942124A (en) * 2014-04-24 2014-07-23 深圳市中博科创信息技术有限公司 Method and device for data backup
CN104012055A (en) * 2012-12-13 2014-08-27 华为技术有限公司 Method and apparatus processing data
CN104169917A (en) * 2014-02-14 2014-11-26 华为技术有限公司 A method for locating data stream break points based on a server and the server
CN104408154A (en) * 2014-12-04 2015-03-11 华为技术有限公司 Repeated data deletion method and device
CN104753626A (en) * 2013-12-25 2015-07-01 华为技术有限公司 Data compression method, equipment and system
CN104936045A (en) * 2015-06-03 2015-09-23 无锡天脉聚源传媒科技有限公司 HTML5-based video file processing method and apparatus
CN104994441A (en) * 2015-07-06 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and device for transmitting video files
CN105653209A (en) * 2015-12-31 2016-06-08 浪潮(北京)电子信息产业有限公司 Object storage data transmitting method and device
CN105808169A (en) * 2016-03-14 2016-07-27 联想(北京)有限公司 Data deduplication method, apparatus and system
US9906577B2 (en) 2014-02-14 2018-02-27 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
CN108089816A (en) * 2017-11-14 2018-05-29 西北工业大学 A kind of query formulation data de-duplication method and device based on load balancing
CN108249240A (en) * 2018-01-18 2018-07-06 上海三荣电梯制造有限公司 A kind of method that can detect record elevator status data automatically
CN110633257A (en) * 2019-09-20 2019-12-31 中国银行股份有限公司 Real-time synchronization method and system for bank parameter files in private cloud environment
CN111722787A (en) * 2019-03-22 2020-09-29 华为技术有限公司 Blocking method and device
CN113632059A (en) * 2020-03-06 2021-11-09 华为技术有限公司 Apparatus and method for eliminating defragmentation in deduplication
CN111158948B (en) * 2019-12-30 2024-04-09 深信服科技股份有限公司 Data storage and verification method and device based on deduplication and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6744388B1 (en) * 2002-06-19 2004-06-01 Xilinx, Inc. Hardware-friendly general purpose data compression/decompression algorithm
CN101546320A (en) * 2008-03-27 2009-09-30 林兆祥 Data difference analysis method based on sliding window
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types
CN101931495A (en) * 2009-06-18 2010-12-29 华为技术有限公司 Data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6744388B1 (en) * 2002-06-19 2004-06-01 Xilinx, Inc. Hardware-friendly general purpose data compression/decompression algorithm
CN101546320A (en) * 2008-03-27 2009-09-30 林兆祥 Data difference analysis method based on sliding window
CN101931495A (en) * 2009-06-18 2010-12-29 华为技术有限公司 Data processing method and device
CN101706825A (en) * 2009-12-10 2010-05-12 华中科技大学 Replicated data deleting method based on file content types

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150260A (en) * 2011-11-25 2013-06-12 华为数字技术(成都)有限公司 Method and device for deleting repeating data
CN103150260B (en) * 2011-11-25 2016-06-08 华为数字技术(成都)有限公司 Data de-duplication method and device
WO2013159631A1 (en) * 2012-04-23 2013-10-31 华为技术有限公司 Method and device for data block
CN103530310B (en) * 2012-07-03 2016-12-28 国际商业机器公司 The method and system of sub-block segmentation is heavily carried out for based on hash disappearing
CN103530310A (en) * 2012-07-03 2014-01-22 国际商业机器公司 Sub-block partitioning method and system for hash-based deduplication
US9471620B2 (en) 2012-07-03 2016-10-18 International Business Machines Corporation Sub-block partitioning for hash-based deduplication
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication
CN102831222B (en) * 2012-08-24 2014-12-31 华中科技大学 Differential compression method based on data de-duplication
CN103049263A (en) * 2012-12-12 2013-04-17 华中科技大学 Document classification method based on similarity
CN103049263B (en) * 2012-12-12 2015-06-10 华中科技大学 Document classification method based on similarity
CN104012055A (en) * 2012-12-13 2014-08-27 华为技术有限公司 Method and apparatus processing data
CN104012055B (en) * 2012-12-13 2017-04-12 华为技术有限公司 Method and apparatus processing data
CN103078709B (en) * 2013-01-05 2016-04-13 中国科学院深圳先进技术研究院 Data redundancy recognition methods
CN103078709A (en) * 2013-01-05 2013-05-01 中国科学院深圳先进技术研究院 Data redundancy identifying method
CN104753626A (en) * 2013-12-25 2015-07-01 华为技术有限公司 Data compression method, equipment and system
US10264045B2 (en) 2014-02-14 2019-04-16 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
US10542062B2 (en) 2014-02-14 2020-01-21 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
CN104169917A (en) * 2014-02-14 2014-11-26 华为技术有限公司 A method for locating data stream break points based on a server and the server
US9967304B2 (en) 2014-02-14 2018-05-08 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
US9906577B2 (en) 2014-02-14 2018-02-27 Huawei Technologies Co., Ltd. Method and server for searching for data stream dividing point based on server
CN104169917B (en) * 2014-02-14 2016-08-24 华为技术有限公司 A kind of method based on whois lookup data flow point cutpoint and server
CN103942124A (en) * 2014-04-24 2014-07-23 深圳市中博科创信息技术有限公司 Method and device for data backup
CN104408154B (en) * 2014-12-04 2018-05-29 华为技术有限公司 Data de-duplication method and device
CN104408154A (en) * 2014-12-04 2015-03-11 华为技术有限公司 Repeated data deletion method and device
CN104936045A (en) * 2015-06-03 2015-09-23 无锡天脉聚源传媒科技有限公司 HTML5-based video file processing method and apparatus
CN104936045B (en) * 2015-06-03 2018-05-15 无锡天脉聚源传媒科技有限公司 A kind of video file processing method and processing device based on HTML5
CN104994441B (en) * 2015-07-06 2018-09-25 无锡天脉聚源传媒科技有限公司 A kind of method and device of transmitting video files
CN104994441A (en) * 2015-07-06 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and device for transmitting video files
CN105653209A (en) * 2015-12-31 2016-06-08 浪潮(北京)电子信息产业有限公司 Object storage data transmitting method and device
CN105808169A (en) * 2016-03-14 2016-07-27 联想(北京)有限公司 Data deduplication method, apparatus and system
CN108089816A (en) * 2017-11-14 2018-05-29 西北工业大学 A kind of query formulation data de-duplication method and device based on load balancing
CN108089816B (en) * 2017-11-14 2021-05-11 西北工业大学 Query type repeated data deleting method and device based on load balancing
CN108249240A (en) * 2018-01-18 2018-07-06 上海三荣电梯制造有限公司 A kind of method that can detect record elevator status data automatically
CN111722787B (en) * 2019-03-22 2021-12-03 华为技术有限公司 Blocking method and device
CN111722787A (en) * 2019-03-22 2020-09-29 华为技术有限公司 Blocking method and device
US11755540B2 (en) 2019-03-22 2023-09-12 Huawei Technologies Co., Ltd. Chunking method and apparatus
CN110633257A (en) * 2019-09-20 2019-12-31 中国银行股份有限公司 Real-time synchronization method and system for bank parameter files in private cloud environment
CN111158948B (en) * 2019-12-30 2024-04-09 深信服科技股份有限公司 Data storage and verification method and device based on deduplication and storage medium
CN113632059A (en) * 2020-03-06 2021-11-09 华为技术有限公司 Apparatus and method for eliminating defragmentation in deduplication

Also Published As

Publication number Publication date
CN102214210B (en) 2013-03-13

Similar Documents

Publication Publication Date Title
CN102214210B (en) Method, device and system for processing repeating data
US20180113643A1 (en) Packing deduplicated data into finite-sized containers
US9213715B2 (en) De-duplication with partitioning advice and automation
CN103020255B (en) Classification storage means and device
US8666955B2 (en) Data management method and data management system
US9851917B2 (en) Method for de-duplicating data and apparatus therefor
CN103154950B (en) Repeated data deleting method and device
CN103677674B (en) A kind of data processing method and device
US9582433B2 (en) Disk array flushing method and disk array flushing apparatus
CN103389926B (en) A kind of method and apparatus for backing up virtual disk
CN104166606A (en) File backup method and main storage device
AU2011256912A1 (en) Systems and methods for providing increased scalability in deduplication storage systems
CN103858125B (en) Repeating data disposal route, device and memory controller and memory node
CN102682086B (en) Data segmentation method and data segmentation equipment
CN103514210A (en) Method and device for processing small files
CN103914522A (en) Data block merging method applied to deleting duplicated data in cloud storage
KR20140050941A (en) Method for managing data in non-volatile memory device
CN102479245A (en) Data block segmentation method
CN103324699A (en) Rapid data de-duplication method adapted to big data application
CN103150260A (en) Method and device for deleting repeating data
CN103309975A (en) Duplicated data deleting method and apparatus
CN104142969A (en) Data segmentation processing method and device
CN101236525A (en) File memory, reading, deleting and copying method and its relevant system
US6591287B1 (en) Method to increase the efficiency of job sequencing from sequential storage
CN103678158A (en) Optimization method and system for data layout

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: 611731 Chengdu high tech Zone, Sichuan, West Park, Qingshui River

Applicant after: HUAWEI DIGITAL TECHNOLOGIES (CHENG DU) Co.,Ltd.

Address before: 611731 Chengdu high tech Zone, Sichuan, West Park, Qingshui River

Applicant before: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES Co.,Ltd.

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES CO., LTD. TO: HUAWEI DIGITAL TECHNOLOGY (CHENGDU) CO., LTD.

C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220908

Address after: No. 1899 Xiyuan Avenue, high tech Zone (West District), Chengdu, Sichuan 610041

Patentee after: Chengdu Huawei Technologies Co.,Ltd.

Address before: 611731 Qingshui River District, Chengdu hi tech Zone, Sichuan, China

Patentee before: HUAWEI DIGITAL TECHNOLOGIES (CHENG DU) Co.,Ltd.

TR01 Transfer of patent right