Embodiment
For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
Fig. 1 is the process flow diagram of an embodiment of repeating data disposal route of the present invention, and as shown in Figure 1, the method for present embodiment can comprise:
Step 101, employing moving window carry out piecemeal to the data object to be handled, obtain each block data, for obtaining each block data, the slip reference position of moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals minimum branch block length and the sliding length sum of moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data.
In the present embodiment, the repeating data treating apparatus can adopt moving window that the data object is carried out piecemeal and handle, this repeating data treating apparatus can be equipment such as memory controller, present embodiment does not limit this repeating data treating apparatus and adopts which kind of device, it does not limit the concrete structure of this repeating data treating apparatus, as long as can carry out data processing yet.
Specifically, in existing C DC algorithm, moving window needs each byte of ergodic data object in sliding process, and, when each piecemeal processing procedure finishes, the repeating data treating apparatus need judge that whether the length of the block data that the piecemeal processing is obtained is greater than minimum branch block length, only divide block length greater than minimum, this block data just meets default piecemeal condition, with control follow-up with block data with stored the calculating granularity that data compare, so-called minimum branch block length is the lower limit length that piecemeal is handled each block data that obtains, and the minimum block length of dividing is more little, and the piecemeal granularity is more little.But this deterministic process all can carried out after each piecemeal is handled, and moving window needs all bytes of ergodic data object, causes the efficient of data de-duplication lower, the storage demand that can't the applicable data amount increases gradually.
By contrast, in the present embodiment, before the data object being carried out the piecemeal processing at every turn, the slip reference position that the repeating data treating apparatus can be controlled moving window is skipped minimum branchs block length backward from the end position of previous block data and is begun slip, therefore the length of block data is inevitable divides block length greater than minimum, thereby saved length that prior art finishes to judge block data in the back in the processing of moving window piecemeal whether greater than the process of minimum branch block length, guaranteed the continuity between each time piecemeal processing procedure.Need to prove that present embodiment does not limit minimum branch block length, those skilled in the art can be as required, and for example a plurality of factors such as size of piecemeal granularity, piecemeal treatment effeciency, data object are set.
Fig. 2 carries out the process synoptic diagram that the 5th piecemeal is handled for using method embodiment shown in Figure 1 to the data object, and as shown in Figure 2, L1 divides block length for default minimum; L2 is the sliding length of moving window in the 5th piecemeal processing procedure; L3 handles the 5th block data that obtains for the 5th piecemeal; P1 is the end position of the 4th block data, also is the reference position of the 5th block data, or the processing of the 4th piecemeal finishes the end position of back moving window; The P2 slip reference position that to be moving window handle at the 5th piecemeal, this slip reference position is in jump the backward position of minimum piecemeal length L 1 of the end position P1 of the 4th block data; P3 is the end position of the moving window after the processing of the 5th piecemeal finishes.Hence one can see that, and the length of the 5th block data equals minimum piecemeal length L 1 and the sliding length L2 sum of moving window in the 5th piecemeal processing procedure.
Handle for the 1st piecemeal, its with follow-up each time piecemeal handle different place be, the position of the slip reference position of moving window after the reference position of described data object is jumped described minimum branch block length backward, it is similar that all the other processes and above-mentioned piecemeal are handled, and repeats no more herein.
Step 102, the block data of having stored in each block data and the memory device is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
For instance, present embodiment can adopt the mode of finger print information coupling, determines whether block data is stored in the memory device.The repeating data treating apparatus can calculate first finger print information of this block data after obtaining block data, this first finger print information can characterize the characteristic information of this block data.Present embodiment does not need to limit the repeating data treating apparatus and specifically adopts which kind of checking algorithm to calculate first finger print information of block data.First finger print information that algorithm that those skilled in the art only need be adopted according to first finger print information of the block data correspondence of storing in the memory device or Matching Algorithm are calculated the block data that promptly will store gets final product.For instance, this first finger print information can for block data is carried out cryptographic hash that hash conversion obtains, block data is carried out cyclic redundancy check (CRC), and (cyclic redundancy check is hereinafter to be referred as the CRC check sign indicating number that CRC) obtains etc.
Processing procedure at a block data in the data object, owing to stored a large amount of block datas and the first corresponding finger print information in the memory device, therefore, the repeating data treating apparatus can judge whether this block data has been stored in the memory device, and the means of its judgement are a large amount of finger print informations that will store in first finger print information of this block data and the memory device and compare.If there has been first finger print information of this block data in memory device, also promptly represent and stored this block data in this memory device, for fear of repeated storage, this repeating data treating apparatus can be with the block data in the memory device as the block data in the data object to be stored, and the block data of this data object itself then can be deleted.At adopting what form with the block data of the block data in the memory device as current data object to be stored, present embodiment does not limit, and those skilled in the art can adopt existing techniques in realizing, for example adopt modes such as pointed.If there is not the finger print information of this block data in the memory device, also promptly represent and do not store this block data in this memory device, then the repeating data treating apparatus can be after whole data objects processing be finished or after this block data is divided out, with this block data with and first finger print information deposit in the memory device, in order to the follow-up usefulness that judges whether to store other block data.Need to prove, memory device has been stored first finger print information of this block data, its form of expression can be with memory device in the finger print information stored be first finger print information that equates, it also can be first finger print information that is complementary, present embodiment does not limit, as long as can determine whether this block data is stored in the memory device according to first finger print information of this block data and first finger print information of having stored, and the present embodiment type that do not limit this memory device with and application.
Experiment shows, is 32KB in the average mark block length, and minimum branchs block length is under the situation of 16KB, and the method for employing present embodiment can improve 50% piecemeal efficient.
Present embodiment, by controlled moving window before carrying out the piecemeal processing at every turn, the minimum branch block length of jumping is earlier carried out piecemeal again and is handled, the length of the block data after making each piecemeal handle is all at least greater than minimum branch block length, thereby make each piecemeal processing need not to judge once more block data when finishing length whether greater than minimum branch block length but can directly carry out next time piecemeal and handle, save the processing time, guaranteed the continuity between each time piecemeal processing procedure.And this process makes moving window need not all bytes of ergodic data object, but all skips the byte of some before each piecemeal is handled.Therefore, present embodiment can improve the treatment effeciency of data de-duplication, satisfies the storage demand that data volume constantly increases.
In the specific implementation process of method embodiment shown in Figure 1, the process that deblocking is handled can adopt following dual mode to realize:
Mode one, employing moving window slide into the other end from an end of data object, the data object is carried out piecemeal handle, and obtain each block data.
Specifically, this piecemeal processing mode one can only adopt a moving window to slide into the other end from an end of whole data object serially, handles the data object is carried out piecemeal.
Mode two, employing moving window walk abreast and slide into the other end from an end of each data area of data object, the data object is carried out piecemeal handle, obtain each block data, the corresponding moving window in each data area, wherein, after being cut apart, the data object obtains each data area.
Specifically, this piecemeal processing mode two can be divided into data object a plurality of data areas, for each data area, all can adopt a moving window to slide into the other end from an end of this data area, therefore, piecemeal processing mode two is with the difference of piecemeal processing mode one, piecemeal processing mode two can adopt a plurality of moving windows to carry out the division of data block concurrently, therefore, piecemeal processing mode two is with respect to piecemeal processing mode one, and treatment effeciency is higher.
Need to prove that present embodiment does not limit the dividing mode that data object is divided into a plurality of data areas, for instance, can random division, average according to the length of data object and to divide or to divide etc. according to the concurrent processing ability of processor.
In addition, present embodiment does not limit the glide direction of the moving window corresponding with each data area yet, for instance, each moving window all can adopt from the left end of data area and slide into right-hand member or slide into left end from right-hand member, and perhaps some moving window slides into right-hand member and some moving window slides into left end from the right-hand member of data area from the left end of data area.
Aforesaid way one or mode two are applied among the method embodiment shown in Figure 1, and the process of among the method embodiment shown in Figure 1 the block data of having stored in each block data and the memory device being carried out matching treatment can comprise following dual mode:
Mode one, in the data object being carried out the process that piecemeal handles, divide block data on one side, Yi Bian carry out matching treatment, also be the piecemeal processing process can with the process executed in parallel of matching treatment.
Mode two, after finishing whole piecemeals of data object are handled, whole block datas are carried out matching treatment.
By this dual mode as can be known, the mode one of matching treatment is applicable to that the block data that obtains after each piecemeal is handled can directly carry out matching treatment and need not to consider the situation of other block datas; Second the mode of matching treatment is applicable to that the block data that obtains after piecemeal is handled may not directly carry out matching treatment and the situation that also needs the whole block datas that obtain are taken all factors into consideration.This is taken all factors into consideration and may relate to each block data and whether surpassed maximum minute block length, whether had block data that needs merge etc.Therefore, the embodiment of the invention alternatively, after the data object being carried out the deblocking processing, if there be the block data of length in the block data that obtains greater than maximum branch block length, then the repeating data treating apparatus can also carry out dividing processing to the block data that surpasses maximum branch block length, thereby the length of control block data is no more than maximum branch block length.
By the description of technique scheme as can be known, carry out repeating data at whole data object to handle no matter be, still carrying out repeating data at a part of data area in the data object handles, its processing procedure at a block data is similarly, only is elaborated at the processing procedure of a block data below.
Fig. 3 is the process flow diagram of another embodiment of repeating data disposal route of the present invention, and as shown in Figure 3, the method for present embodiment still adopts process shown in Figure 2 to describe, and the method for present embodiment can comprise:
The reference position of step 301, moving window jumps to P2.
Step 302, this moving window that slides, and second finger print information of the data object in the calculating moving window.
Whether second finger print information slides into the edge of current block data in order to judge moving window.Specifically, present embodiment is still with the 5th block data shown in Figure 2, and second finger print information that present embodiment adopts fingerprint algorithm to obtain determines whether this moving window arrives the edge of the 5th block data in the data object.About obtaining of second finger print information, present embodiment is that those skilled in the art can adopt existing techniques in realizing, repeat no more herein by the employing fingerprint algorithm.
When specific implementation, the sliding step of moving window can be preset, handle for a piecemeal, if sliding step is less, slide and once possibly can't arrive the edge of block data, then may need repeatedly moving window to be slided according to default step-length, could arrive the edge of the block data of this piecemeal processing.
Step 303, judge according to this second finger print information whether moving window slides into the edge of the 5th block data, if execution in step 304 then, otherwise, execution in step 302.
The repeating data treating apparatus all can judge whether moving window slides into the edge of block data according to second finger print information of the data object in the current moving window behind moving window each time.When specific implementation, present embodiment can be preset the piecemeal condition, should default piecemeal condition can set according to the continuity Characteristics value of data object, when moving window slides into the position that two block datas have a common boundary, the continuity Characteristics value of the data object in this moving window is less, and when moving window slided into an a part of position in the block data, then the continuity Characteristics value of the data object in this moving window was bigger.Therefore, if second finger print information that calculates does not satisfy default piecemeal condition, also even second finger print information less than default continuity Characteristics value, illustrate that then this moving window has slided into the intersection of two block datas, also promptly slide into the edge of current block data, then the repeating data treating apparatus can know that this piecemeal processing finishes, if second finger print information is greater than default continuity Characteristics value, the current continuous position that may also be in a block data of moving window then is described, then the repeating data treating apparatus can continue moving window, repeat above-mentioned steps, till the edge that finds the 5th block data.
Step 304, with P1 to the data object between P3 as the 5th block data.
First finger print information of step 305, the 5th block data of calculating.
Step 306, judge whether stored this first finger print information in the memory device, if execution in step 307 then, otherwise execution in step 308.
Step 307, from data object the 5th block data of deletion, and the block data of having stored that will be corresponding with this first finger print information carries out piecemeal processing procedure next time as the 5th block data.
Step 308, the 5th block data and the first corresponding finger print information are stored in the memory device.
Present embodiment, by fingerprint algorithm the data object being carried out piecemeal handles, and by controlling moving window after each piecemeal processing finishes, the minimum branch block length of jumping is carried out the sliding process of piecemeal processing next time again, whether the length that need not to judge once more block data when making the piecemeal processing finish is greater than minimum branch block length but can directly carry out next time piecemeal and handle, thereby saved the processing time, guaranteed the continuity between each time piecemeal processing procedure.And this process makes moving window need not all bytes of ergodic data object, but all skips the byte of some before each piecemeal is handled, thereby has improved the treatment effeciency of data de-duplication, has satisfied the storage demand that data volume constantly increases.
Fig. 4 is the structural representation of an embodiment of repeating data treating apparatus of the present invention, and as shown in Figure 4, the device of present embodiment can comprise: piecemeal processing module 11 and data processing module 12, wherein:
Piecemeal processing module 11 is used to adopt moving window that the data object is carried out the piecemeal processing, obtain each block data, for obtaining each block data, the slip reference position of described moving window is in jump the backward position of minimum branchs block length of the end position of a last block data, and the reference position of each block data equals described minimum branch block length and the sliding length sum of described moving window in corresponding piecemeal processing procedure at the end position of a last block data and the length of each block data;
Data processing module 12 is used for the block data that each block data and memory device have been stored is carried out matching treatment, if block data has been stored in the memory device, then delete this block data, and with the block data stored in the described memory device as the block data in the described data object.
The device of present embodiment can be used to carry out the method for method embodiment shown in Figure 1, and its realization principle and technique effect are similar, repeat no more herein.
Fig. 5 is the structural representation of another embodiment of repeating data treating apparatus of the present invention, as shown in Figure 5, further, piecemeal processing module 11 comprises the device of present embodiment on the basis of apparatus structure shown in Figure 4: the first piecemeal processing unit 111 and the second piecemeal processing unit 112, wherein:
The first piecemeal processing unit 111 is used to adopt moving window to slide into the other end from an end of described data object, described data object is carried out piecemeal handle, and obtains each block data;
The second piecemeal processing unit 112, be used to adopt an end of parallel each data area from described data object of moving window to slide into the other end, described data object is carried out piecemeal to be handled, obtain each block data, the corresponding moving window in each data area, wherein, after being cut apart, the data object obtains each data area.
The device of present embodiment can further include: data area determination module 13, this data area determination module 13 is used for according to concurrent processing ability and/or data object size, determines each data area in the described data object.Alternatively, the device of present embodiment can also comprise: data block is cut apart module 14, is used for the block data that surpasses maximum branch block length is carried out dividing processing.When specific implementation, the first piecemeal processing unit 111 or the second piecemeal processing unit 112 specifically are used to the described moving window that slides, and calculate second finger print information of the data object in the described moving window; Determine according to described second finger print information whether described moving window slides into the edge of the block data of this deblocking processing, if then will handle the block data that obtains as this deblocking to the data object the current end position of described moving window from the end position of previous block data; Otherwise, continue to carry out the described moving window of above-mentioned slip, and calculate the second finger print information step of the data object in the described moving window, slide into up to described moving window till the edge of the block data that this deblocking handles.
In the device of present embodiment, data processing module 12 may further include: fingerprint computing unit 121 and deletion processing unit 122, wherein,
Fingerprint computing unit 121 is used for calculating first finger print information of the block data of described data object;
Deletion processing unit 122, be used for if described memory device has been stored described first finger print information, then from described data object, delete the block data of the described first finger print information correspondence, and the block data of having stored that will be corresponding with described first finger print information is as block data deleted in the described data object.
Need to prove, also can only comprise a unit in the first piecemeal processing unit 111 and the second piecemeal processing unit 112 in the piecemeal processing module 11, wherein, the first piecemeal processing unit 111 can be used to realize above-mentioned piecemeal processing mode one, the second piecemeal processing unit 112 can be used to realize above-mentioned piecemeal processing mode two, its realization principle and technique effect are similar, repeat no more herein.Fingerprint computing unit 121 and deletion processing unit 122 can be used to realize above-mentioned matching treatment mode one or matching treatment mode two, and its realization principle and technique effect are similar, repeat no more herein.When specific implementation, the device of present embodiment can be used to carry out the method for method embodiment shown in Figure 3, and its realization principle and technique effect are similar, repeat no more herein.
Fig. 6 is the structural representation of an embodiment of repeating data disposal system of the present invention, as shown in Figure 6, the system of present embodiment can comprise memory device 1 and repeating data treating apparatus 2, this memory device 1 is used for memory partitioning data and first finger print information corresponding with described block data, this repeating data treating apparatus 2 can adopt Fig. 4 or structure shown in Figure 5, it can be used for the method for execution graph 1 or method embodiment shown in Figure 3, its realization principle and technique effect are similar, repeat no more herein.
One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.