US9037621B2 - Efficient reconstruction of virtual disk hierarchies across storage domains - Google Patents
Efficient reconstruction of virtual disk hierarchies across storage domains Download PDFInfo
- Publication number
- US9037621B2 US9037621B2 US13/934,127 US201313934127A US9037621B2 US 9037621 B2 US9037621 B2 US 9037621B2 US 201313934127 A US201313934127 A US 201313934127A US 9037621 B2 US9037621 B2 US 9037621B2
- Authority
- US
- United States
- Prior art keywords
- component
- datastore
- virtual disk
- disk
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims abstract description 31
- 230000001419 dependent effect Effects 0.000 claims description 14
- 230000015654 memory Effects 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 4
- 238000012546 transfer Methods 0.000 description 11
- 238000007726 management method Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000013508 migration Methods 0.000 description 5
- 230000005012 migration Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004224 protection Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000344 soap Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G06F17/30115—
Definitions
- delta disks also referred to as “redo logs” “diff files” etc.
- redo logs may be used to customize a base disk image for a virtual disk.
- a discussion of the use of such delta disks for virtual computer systems is provided in U.S. Pat. No. 7,356,679, entitled “Computer Image Capture, Customization and Deployment,” which issued on Apr. 8, 2008.
- Each delta disk contains changes to the base disk image to provide customization and data retention for each user.
- the combination of base and delta disks make up a virtual disk hierarchy virtualized by virtualization software so that it appears to each user (or virtual machine) as a single physical disk.
- Each virtual disk may be organized in a manner similar to conventional physical disks, i.e., into discrete addressable disk blocks.
- the delta disk is accessed to determine if the portion of the virtual disk being accessed is contained within the delta disk. For example, if a particular disk block of the virtual disk includes modifications since creation of the delta disk, then that disk block will be present in the delta disk. If the disk block is not present in the delta disk, then the corresponding disk block is accessed in the base image, from which the requested information is retrieved.
- writes to the virtual disk are directed to the delta disk. If the write is directed to a previously modified disk block present in the delta disk, then the previously modified disk block is overwritten, otherwise the delta disk is augmented to include the newly modified disk block.
- a base disk image for a company containing the operating system and installed software usable by each employee, such as an email client.
- a plurality of departmental delta disks include software suitable for each department. Accounting software for the accounting department, computer-aided design software for the engineering department, etc.
- individual deltas may be maintained by individual users.
- a method and software recreates on a target datastore a set of hierarchical files that are present on a source datastore, the set including a parent component and a child component.
- a content identifier (ID) is maintained for each component of the set of hierarchical files. The content ID is updated when the contents of a corresponding one of the components are modified.
- the child component contains changes to the parent component and is writable, whereas the parent component is read-only.
- the child component is copied from the source datastore to the target datastore.
- the content ID corresponding with the parent component on the source datastore is compared with content IDs corresponding to files present on the target datastore.
- a matching file is identified, the matching file being a file on the target datastore that corresponds to the matching content ID.
- the matching file is associated with the copied child component so that the matching file becomes a new parent component to the copied child component, thereby recreating the set of hierarchical files on the target datastore using the matching file.
- FIG. 1 shows a logical representation of an exemplary virtualized computer system.
- FIG. 2 shows a schematic diagram of an exemplary hierarchical disk structure.
- FIG. 3 shows a schematic diagram illustrating by way of example system for the management of virtual machines.
- FIGS. 4A and 4B show schematic diagrams illustrating the transfer of linked clones across datastores.
- FIG. 5 shows a flowchart illustrating by way of example a procedure for transferring a virtual disk hierarchy across datastores, in accordance with an example embodiment.
- FIG. 1 shows a logical representation of an exemplary virtualized computer system 100 .
- Virtualized computer system 100 includes a physical hardware platform 110 , virtualization software 120 running on hardware platform 110 , and one or more virtual machines 130 running on hardware platform 110 by way of virtualization software 120 .
- Virtualization software 120 is therefore logically interposed between the physical hardware of hardware platform 110 and guest system software 132 running “in” virtual machine 130 .
- Hardware platform 110 may be a general purpose computing system having one or more system buses 111 that place various hardware platform components in data communication with one another.
- processors 114 are placed in data communication with a memory 115 using system bus(es) 111 .
- Memory 115 may comprise a system of memories including read only memory (ROM), random access memory (RAM), cache memories, and various register memories.
- Non-volatile data storage 117 may include one or more disk drives or other machine-readable media or mass data storage systems for storing software or data.
- Memory 115 and/or non-volatile data storage 117 may store virtualization software 120 and guest system software 132 running in virtual machine 1310 .
- User interface 112 may be provided including a keyboard controller (not shown), a mouse controller (not shown), a video controller (not shown), and an audio controller (not shown), each of which may be connected to corresponding user devices (not shown).
- virtualized computer system 100 may or may not include user interface devices or they may not be connected directly to hardware platform 110 . Instead, a user interaction may be automated or occur remotely as generally known in the field of data center administration.
- Network interface 116 enables data communication over a network 140 .
- network interface 116 may facilitate communication using a network protocol, such as TCP/IP or Fibre Channel.
- Virtualization software 120 is well known in the field of computer virtualization. Virtualization software 120 performs system resource management and virtual machine resource emulation. Virtual machine resource emulation may be performed by a virtual machine monitor (VMM) component (not shown). In typical implementations, each virtual machine 130 (only one shown) has a corresponding VMM instance. Depending on implementation, virtualization software 120 may be unhosted or hosted. Unhosted virtualization software generally relies on a specialized virtualization kernel for managing system resources, whereas hosted virtualization software relies on a commodity operating system—the “host operating system”—such as Windows, Mac OS X, or Linux to manage system resources. In a hosted virtualization system, the host operating system may be considered as part of virtualization software 120 .
- the host operating system such as Windows, Mac OS X, or Linux to manage system resources. In a hosted virtualization system, the host operating system may be considered as part of virtualization software 120 .
- Virtual machine 130 conceptually comprises the state of virtual hardware devices (as emulated by virtualization software 120 ) and contents of guest system software 134 . Hardware emulation is performed by virtualization software 120 .
- guest system software 132 includes a guest operating system 134 and guest applications 136 .
- Guest operating system 134 may be a commodity operating system such as Windows or Linux.
- Virtualization software 120 is responsible for managing inputs and outputs to and from virtual machine 130 .
- Guest system software 132 is stored on a virtual disk, which may be maintained on non-volatile data storage device 117 or in datastore 150 .
- datastore is intended to be broadly interpreted to mean a container for data, and may be implemented as a container or storage volume for virtual machines formatted with a file system such as NFS or VMFS.
- the virtual disk image is maintained in a base disk image file 155 and one or more delta disks 160 on external datastore 150 .
- Delta disks 160 include at least one delta disk particular to VM 130 as well delta disks for other VMs (not shown).
- FIG. 2 shows a schematic diagram of an exemplary hierarchical disk structure, which includes a base disk image 172 and a number of delta disks 174 - 184 .
- Each of delta disk 174 - 184 and base disk image 172 is a disk image defined by one or more files stored on one or more datastores, such as datastore 150 shown in FIG. 1 .
- Delta disks 176 , 178 , 180 , and 184 are “terminal delta disks” in that they are the last in the chain of delta disks. Each delta corresponds to a virtual disk image.
- base disk image 172 includes content that may be common to a plurality of different disk images.
- Intermediate delta disks 174 and 182 contain changes to base disk image 172 .
- Terminal delta disks 176 , 178 contain changes to intermediate delta disk 174 .
- terminal delta disk 184 contains changes to intermediate delta disk 182 . There may be any number of intermediate delta disks, including zero, for any terminal delta disk.
- Each hierarchical disk component can be thought of as being a child component and/or a parent component in a chain.
- Each terminal delta disk 176 , 178 , 180 , 184 is a child component since it depends on either an intermediate delta disk or a base disk image.
- Base disk image 172 is a parent when it has one or more delta disks depending on it.
- Each intermediate delta disk 174 , 182 is a parent of either another intermediate delta disk or a terminal delta disk.
- Each intermediate delta disk 174 , 182 is also a child of either the base disk image, or another intermediate delta disk.
- base disk image 172 included an installation of an operating system such as Microsoft Windows and an office production suite, including a word processor, email client, spreadsheet, etc.
- Intermediate delta disks 174 , 182 may include additional installed applications needed for users of a particular group in an organization, such as accountants or engineers.
- a delta disk such as delta 174 is created which initially appears to the computer as an exact copy of base disk image 172 , since no changes were written to the delta disk.
- the virtual machine is launched using the delta disk image, essentially launching the operating system installed on base disk image 172 .
- the various departmental applications may be installed to the virtual disk formed by the hierarchical disk structure formed by the delta disk and base disk image.
- the virtual machine may then be powered down if needed.
- a snapshot of the VM is created, which then makes the delta an intermediate delta disk.
- Terminal deltas pointing to the just-created intermediate delta can then be created for a plurality of users.
- Each terminal delta may be individualized with configurations necessary for them to coexist on a network, e.g., unique machine numbers, MAC addresses, etc. which are managed using well-understood techniques, described for example, in U.S. Pat. No. 7,356,679, entitled “Computer Image Capture, Customization and Deployment,” which issued on Apr. 8, 2008.
- delta disk 184 If delta disk 184 were to be deleted, then the image can be reverted back to the state prior to the changes embodied in terminal delta disk 184 by simply referencing intermediate delta disk 182 , which becomes a terminal delta disk (and read-only protection removed) since no additional delta disks depend from intermediate delta disk 182 .
- Other protections for intermediate delta disks and base disk image files may be provided in addition to, or instead of file system “read-only” tagging.
- a database may be used to track interrelations between components of a hierarchical disk, and software accessing these components can be written to ensure that base disk images and intermediate delta disks are never written to.
- FIG. 3 is a schematic diagram illustrating by way of example system 300 for the management of virtual machines.
- system 300 includes virtual machine (VM) manager 302 which is an application that executes on a management server 301 .
- VM manager 302 can be an implementation of vCenter, a product commercially available from VMware, Inc.
- VM hosts 305 a , 305 b may be a member of a common cluster of VM hosts that share a datastore, but plurality may also be in separate clusters or not in any cluster.
- VM manager 302 has access to a database 304 , which might also run on the management server 301 , or could run in a separate database server (not shown).
- management server 301 can be implemented as a virtual machine that runs in one of VM hosts 305 a , 305 b , or an additional VM host (not shown).
- Management server 301 is connected to VM hosts 305 a , 305 b , via network 320 , which may be, for example, a network such as a LAN, WAN, Internet, or the like, or a combination of different networks.
- VM hosts 305 a and 305 b each execute a hypervisor 202 a , 202 b , respectively, which in turn each implement one or more VMs.
- commands flow from the virtual machine manager 302 to the hypervisors 202 a and 202 b , and information flows from hypervisors 202 a and 202 b to virtual machine manager 302 .
- API 303 provides an interface to access the functionality provided by VM manager 302 .
- API 303 is implemented as a web service receiving information or requests in XML format.
- this type of interface is described in the VMware VI 3.0 SDK Reference Manual (Revision 20060906 Version 2.0.1 Item: SDK-ENG-Q306-291.
- Hosts 305 a and 305 b are connected via network 325 to datastores 306 a and 306 b .
- Network 325 may be an Ethernet local area network (LAN), Fibre Channel network, or the like.
- datastores 306 a , 306 b are connected to communications network 320 rather than a separate storage network.
- Datastores may be implemented as network attached storage (NAS) or as a storage area network or a combination thereof.
- NAS network attached storage
- Each datastore 306 a , 306 b may be a logical storage volume (backed by a physical device called “logical unit number” (LUN), a mount point like NFS or a physical disk available on the host) and may, as would be understood by those skilled in the art, include or reside on one or more physical storage devices connected to a management or control interface (not shown). Since they may be logical volumes, it is possible that datastores 306 a , 306 b are maintained on a common storage array or separate storage arrays. Also, although not shown in FIG. 3 , there may be an intermediary storage controller or devices that act on behalf of the VM hosts connected to the network. Finally, it is possible that each datastore reside within a corresponding VM host and be connected using a standard ATA or SCSI connection.
- LUN logical unit number
- NFS mount point like NFS or a physical disk available on the host
- Database 304 stores a content ID for each component of one or more hierarchical disk structures such as that shown in FIG. 2 .
- Database 304 may additionally store other data, e.g., configurations, settings, and status related to VM hosts 305 a , 305 b .
- Database 304 may, e.g., be a relational database, an object-oriented database, an object-relational database, etc.
- a content ID may be stored in a database associated with a host server such as 305 a and 305 b , in a distributed system of hypervisors.
- a distributed system of hypervisors may have a virtual machine manager 302 , but one is not required. It will be appreciated that this alternative embodiment promote scalability and fault tolerance if redundant copies of a content ID were persistently stored, albeit at the time expense of additional communication regarding content IDs between the hypervisors themselves or between the hypervisors and the virtual machine manager, if present.
- each content ID is 128-bit number that is randomly generated, e.g., using a pseudorandom or random number generator. It is also possible to sequentially assign content IDs. In this respect, one might regard a content ID as somewhat similar to a Universally Unique Identifier (UUID) or a Globally Unique Identifier (GUID).
- UUID Universally Unique Identifier
- GUID Globally Unique Identifier
- VM manager 302 assigns a content ID to a component (i.e., a delta disk or base disk image) in a virtual disk hierarchy when the hypervisor operating in conjunction with the virtual machine associated with the component performs a “file open” operation and an initial “file write” operation on the component. The content ID remains unchanged during additional “file write” operations by the virtual machine.
- the system e.g., VM manager 302 , will assign a new content ID to the component.
- Content ID collisions can occur if two components happen to be given the same content ID even though the contents are not identical. Such a collision, although extremely unlikely, could cause serious data corruption.
- One approach to prevent content ID collisions would be to monitor any content ID changes in the system and look for collisions. If the content ID of a given disk component is changed to a particular value, and another disk component in the system already has a content ID of that value, then you can conservatively assume that this is a collision, since it is very unlikely that a disk write caused the content to suddenly become the same as another disk component. In the case of a collision, a new content ID is assigned to the changed disk component.
- An alternate method can be used if deemed necessary for newly recognized disk images, i.e., disk images that have no content ID assigned. For example, file length comparison and contents of a disk block at a random file offset can be used to eliminate each other disk component as matching, in which case a new content ID would be assigned.
- API 303 may be accessed remotely using a web service protocol such as SOAP. Messages may be sent and received using a script or program executing on a web service client machine (not shown). The client can cause the virtual machine manager 302 to issue a command to one of the hosts 305 a or 405 b , directing it to transfer (e.g., a copy operation or a copy operation and a delete operation) VM 404 from VM host 305 a to VM host 305 b , as illustrated.
- the hypervisor may be include programming to carry out this functionality, or another software component (not shown) within host 305 a , 305 b , may carry out the transfer.
- datastore 306 a may be a disk volume controlled by hypervisor 202 a and is only readily accessible by hypervisor 202 a whereas datastore 306 b is a volume that is controlled by and readily accessible by hypervisor 202 b .
- datastore 306 b is a volume that is controlled by and readily accessible by hypervisor 202 b .
- VM 404 accesses a virtual disk image implemented by a disk hierarchy including base disk 155 and delta disk 162 .
- both the base disk image 155 and delta disk 162 might be a VMDK file, i.e., contents of a virtual machine's hard disk drive may be encapsulated using the VMDK file format.
- VM 404 itself may be embodied in VMX file 157 .
- VMX files are data files for storing the configuration settings for VMs.
- VMX file 157 may be transferred to target data store 360 b along with the virtual disk image associated with VM 404 . Additional files associated with the migrating VM may also be transferred.
- Arrows 312 represent transfer of VM 404 from source VM host 305 a to target VM host 305 b and delta disk 162 from source datastore 306 a to target datastore 306 b . Since delta disk 162 relies on base disk 155 , a copy of base disk 155 needs to be present on target datastore 306 b as well.
- VM manager 302 on receiving a request via API 303 to move a particular VM 404 from VM host 305 a to VM host 305 b , VM manager 302 issues a query to database 304 to identify the components of disk hierarchy associated with the specified VM. VM Manager then checks to see which components, if any, are already present on target datastore 306 b , which contains files for VMs on VM host 305 b.
- the check is performed by accessing the content ID of hierarchical disk components present on datastore 306 b , and comparing the content ID of base disk image 155 with those disk components already on datastore 306 b . If datastore 306 b includes base disk image 155 having a content ID that matches the value of the content ID base disk image 155 on datastore 306 a , an extremely strong inference arises that the two base disk images have the same contents. In this case, VM manager 302 does not copy the base disk image 155 from datastore 306 , but merely associates delta disk 162 on target datastore 306 b with the base disk image 155 on target datastore 306 b .
- sociate is meant that the newly associated base disk image on the target datastore would then be referenced for disk reads when the information sought is not present in the delta disk. The reference may be written to a field within or property of the delta disk itself or it may be maintained separately.
- base disk image 155 is not present on datastore 306 b , i.e., there are no disk component on datastore 306 b having a matching content ID, then base disk image 155 is copied from datastore 306 a to datastore 306 b , as shown by broken arrow 314 .
- VM manager 302 may command the hypervisor 202 a or 202 b to copy only delta disk 162 to datastore 306 b and then associate copied delta disk 162 on datastore 306 b with the copy of base disk image 155 already on datastore 306 b .
- VM manager 302 may be in communication with datastores 306 a , 306 b , and perform the move and/or association directly rather than so commanding one of the VM hosts.
- the VM being transferred might be in a “powered off” or “suspended” state, such that the virtual machine is not currently executing or in a scheduling queue.
- the VM may be executing during migration of the disk, e.g., as described in United States Patent Application Publication 2009/0037680, which published Feb. 5, 2009.
- the virtual machine may be migrated “live,” i.e., without significant interruption of the operation of the VM along with disk migration. Live migration of a VM is described in U.S. Pat. No. 7,484,208, which issued Jan. 27, 2009. Live migration might be performed when VM manager 302 engages in dynamic load balancing of virtual machines or some other form of distributed resource management.
- linked clones Two virtual machines are referred to as “linked clones” when they share a common base disk image or intermediate delta disk.
- Some hypervisors support linked clones.
- a linked clone includes a specific type of virtual disk hierarchy, e.g., a virtual disk hierarchy with at least one virtual delta disks associated with a virtual base disk.
- a linked clone might be created by a disk snapshot, and facilitate the rapid deployment of the same guest system software in a multitude of virtual machines.
- FIGS. 4A and 4B are schematic diagrams illustrating by way of example the transfer of linked clones across datastores.
- a pair of linked clones includes two virtual machines, VM A 605 and VM B 606 , each of which is associated with the same virtual base disk 602 through corresponding virtual delta disks 603 , 604 , respectively.
- a count (not shown) is maintained, e.g., in database 304 ( FIG. 3 ) that identifies the number of delta components that depend from the component.
- Base disk 602 has two delta components, delta A 603 and delta B 604 , so its reference count would be two. When additional delta disks are created from base disk 602 , the count is incremented.
- a VM manager receives a request to transfer VM B 606 from source datastore 620 to target datastore 630 . It will be appreciated that this transfer involves, among other things, copying the virtual base disk 602 with content ID X to target datastore 630 , as well as the delta disk 604 .
- the reference count for base disk 602 on target datastore 630 is initialized at one, since newly copied delta B 604 depends on base disk 602 .
- VM B i.e., the file or files containing configuration and state information, such as the VMX file described above, and delta B 604 may be deleted from source datastore 620 after the virtual machine's transfer to target datastore 630 .
- delta B 604 When delta B 604 is deleted, the reference count for base disk 602 on source datastore 620 is decremented from two to one, indicating that only one delta disk (delta A 603 ) depends from base disk 602 on source datastore 620 .
- the reference count may be referenced to identify whether delta disks depend on base disk image 602 . If the reference count is zero, then no delta disks depend from base disk image 602 , it may be deleted from source datastore 620 . In the present example, base disk image 602 is not deleted from source datastore 620 , since at this time the reference count is equal to a number that is greater than zero, i.e., one.
- VM A 605 still resides in source datastore 620 and relies on the presence of a copy of base disk image 602 in source datastore 620 .
- delta B 604 and base disk 602 are copied from source datastore 620 to target datastore 630 , they each retain their respective content ID such that both copies of base disk 602 have the same content ID.
- the system transfers VM A 605 and its delta disk 603 to target datastore 630 as shown in FIG. 4B .
- a check of content IDs of components present in target datastore 630 reveals that a copy of base disk image 602 is already present on target datastore 630 . Therefore, the system does not create a second copy of that virtual base disk, saving both copying time and storage space on target datastore 630 .
- the system associates (or “reparents”) delta disk 603 on target datastore 630 with the base disk image 602 already present on target datastore 630 , thereby re-creating the linked clones on target datastore 630 .
- delta A 603 is parented to base disk 602
- the reference count for the copy of base disk 602 on target datastore 630 is incremented from one to two.
- the disk hierarchy originally present on source datastore 620 has been recreated on target datastore 630 .
- each component may be deleted from source datastore 620 , as indicated by the broken outlines of these components in FIG. 4B .
- delta A 603 is deleted from source datastore 620
- the reference count for base disk 602 is decremented from one to zero, indicating there are no longer any delta disks depending on base disk 602 .
- base disk 602 When base disk 602 is copied to target datastore 620 , a check of the reference count for source datastore 620 reveals that base disk 602 may be deleted from source datastore 620 since the reference count is now zero, indicating no delta disks remain on source datastore 620 that depend on base disk 602 .
- deletion of hierarchical disk components from source datastore 620 is optional.
- the copy operation may be performed for data redundancy, i.e., as a “backup” operation, in which case maintaining original copies of the components on source datastore 620 would be desirable.
- FIG. 5 shows a flowchart 700 illustrating by way of example a procedure for transferring a virtual disk hierarchy across datastores.
- the procedure begins as indicated by start block 702 and flows to operation 704 , wherein a VM manager receives a request (e.g., through an API call to a virtual machine manager) to move a virtual disk hierarchy from a source datastore to a target datastore.
- the request may include one or more calls to the API, where the one or more calls request a copying of a virtual machine and a virtual disk hierarchy from the source datastore to the target datastore and the one or more calls further indicate that a sharing of a non-writable component of the virtual disk hierarchy is allowable.
- the procedure flows to operation 706 , wherein the terminal delta disk is copied from the source datastore to the target datastore and the terminal delta on the source datastore is optionally deleted.
- the system then enters a loop and determines whether the current virtual delta disk (e.g., the virtual delta disk that was copied to the target datastore and deleted from the source datastore) is dependent on an intermediate delta disk or base disk on the source datastore. If the determination in operation 708 is negative, the system exits the loop and proceeds to operation 714 . If the determination in operation 708 is positive, the system proceeds with operation 710 wherein it is determined whether the target datastore contains a copy of the intermediate delta disk or base disk on which the current delta disk depends. As described above with reference to FIG. 3 , this determination involves comparing a content ID of the dependent disk with hierarchical disk components already present on the target datastore. If there is no match, then the determination in operation 710 is negative, and the system proceeds to operation 712 .
- the current virtual delta disk e.g., the virtual delta disk that was copied to the target datastore and deleted from the source datastore
- the intermediate delta disk or base disk on which the current delta disk depends is copied to the target datastore, and the current delta is reparented to the just copied intermediate delta or base disk.
- the procedure flows to operation 716 , wherein the copied current delta on the target datastore is reparented to the copy of the intermediate delta/base disk already present on the target datastore.
- the source datastore (or database) may be checked to determine whether the copy of the intermediate delta/base disk on the source datastore has any dependent delta disks. If not, then the copy of the intermediate delta/base disk on the source datastore may optionally be deleted. The procedure then ends as indicated by done block 720 .
- Operation 714 may be implemented as a check against error. If the “current delta,” which should actually be a base disk, is identified in the database or elsewhere as a delta disk, then the procedure flows to operation 718 wherein an error is generated and the procedure ends. However, if the base disk is not identified as a terminal delta, then the procedure flows to done block 720 and the procedure is complete.
- This highlighted path shows that when a content ID match is found, the just-copied delta is reparented to the matched file and the procedure is completed. No other ancestors need to be copied, irrespective of how many levels of components are left in the hierarchy. This is very useful as finding the match for the first parent itself means that only the terminal delta needs to be copied. Thus, the relocation happens very fast compared to prior methods of disk migration. In addition to faster completion time, the above-described procedure reduces the amount of data to be copied when VMs are sharing base disks and significantly reduces storage requirements on the target datastore.
- deletion operation mentioned above with reference to operations 706 , 712 , and 716 are optional, insofar as there might be circumstances where a system administrator might want to create multiple versions of the same virtual disk hierarchy, e.g., for purposes of redundancy (i.e., backups), fault tolerance, etc.
- the inventions also relate to a device or an apparatus for performing these operations.
- the apparatus may be specially constructed for that purpose or it may be a general-purpose computer selectively activated or configured by a computer program stored in the computer.
- various general-purpose machines may be used with computer programs written in accordance with the teachings herein or it may be more convenient to construct a more specialized apparatus to perform the operations.
- the inventions can also be embodied as computer readable code on a computer readable medium.
- the computer readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices.
- the computer readable medium can also be distributed over a network-coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
Abstract
A method and software is described for recreating on a target datastore a set of hierarchical files that are present on a source datastore. A content identifier (ID) is maintained for each component of the set of hierarchical files. The content ID of a component is updated when its contents are modified. The child component is copied from the source datastore to the target datastore. The content ID corresponding to the parent component on the source datastore is compared with content IDs corresponding to files present on the target datastore. When a matching content ID is discovered, it infers a copy of the parent component. The matching file on the target datastore is associated with the copied child component so that the matching file becomes a new parent component to the copied child component, thereby recreating the set of hierarchical files on the target.
Description
This application is a continuation of and claims the benefit of U.S. patent application Ser. No. 12/469,577, filed May 20, 2009, issued on Jul. 2, 2013 as U.S. Pat. No. 8,478,801 which is hereby incorporated by reference.
In certain computing applications, it is desirable to separately maintain a data file and changes to the data file instead of writing the changes directly to the data file. For example, one might want to have the ability to “undo” the changes and return to the original data file. Alternatively, there might be a desire to make multiple independent changes to an initial file, without having to copy the initial file for each independent change.
In the field of computer virtualization, and particularly desktop virtualization, delta disks, also referred to as “redo logs” “diff files” etc., may be used to customize a base disk image for a virtual disk. A discussion of the use of such delta disks for virtual computer systems is provided in U.S. Pat. No. 7,356,679, entitled “Computer Image Capture, Customization and Deployment,” which issued on Apr. 8, 2008. Each delta disk contains changes to the base disk image to provide customization and data retention for each user. The combination of base and delta disks make up a virtual disk hierarchy virtualized by virtualization software so that it appears to each user (or virtual machine) as a single physical disk. Each virtual disk may be organized in a manner similar to conventional physical disks, i.e., into discrete addressable disk blocks. When the virtual disk is read, first the delta disk is accessed to determine if the portion of the virtual disk being accessed is contained within the delta disk. For example, if a particular disk block of the virtual disk includes modifications since creation of the delta disk, then that disk block will be present in the delta disk. If the disk block is not present in the delta disk, then the corresponding disk block is accessed in the base image, from which the requested information is retrieved. Writes to the virtual disk are directed to the delta disk. If the write is directed to a previously modified disk block present in the delta disk, then the previously modified disk block is overwritten, otherwise the delta disk is augmented to include the newly modified disk block.
In some cases, it may be desirable to have one or more chains of delta disks from a single base. In an enterprise environment, for example, there may be a base disk image for a company containing the operating system and installed software usable by each employee, such as an email client. Then from this base disk image, a plurality of departmental delta disks include software suitable for each department. Accounting software for the accounting department, computer-aided design software for the engineering department, etc. Then from each of these departmental deltas, individual deltas may be maintained by individual users.
In ordinary use of computer disk images, the data storage requirements increase over time, making it necessary at some point to move one or more disks images from one datastore to another. In the field of virtualization, sometimes it is necessary to migrate a virtual machine from one datastore to another, e.g., for load-balancing, or to take a physical computer out of service. However, where a virtual disk is made up of a base disk image and one or more deltas, each of the parent delta images and base image must be copied along with the delta, or else some logical connection must be maintained across data stores, which in some cases is not possible.
A method and software recreates on a target datastore a set of hierarchical files that are present on a source datastore, the set including a parent component and a child component. A content identifier (ID) is maintained for each component of the set of hierarchical files. The content ID is updated when the contents of a corresponding one of the components are modified. In one embodiment, the child component contains changes to the parent component and is writable, whereas the parent component is read-only. The child component is copied from the source datastore to the target datastore. The content ID corresponding with the parent component on the source datastore is compared with content IDs corresponding to files present on the target datastore. When a matching content ID is discovered, a matching file is identified, the matching file being a file on the target datastore that corresponds to the matching content ID. The matching file is associated with the copied child component so that the matching file becomes a new parent component to the copied child component, thereby recreating the set of hierarchical files on the target datastore using the matching file.
Each hierarchical disk component can be thought of as being a child component and/or a parent component in a chain. Each terminal delta disk 176, 178, 180, 184 is a child component since it depends on either an intermediate delta disk or a base disk image. Base disk image 172 is a parent when it has one or more delta disks depending on it. Each intermediate delta disk 174, 182 is a parent of either another intermediate delta disk or a terminal delta disk. Each intermediate delta disk 174, 182 is also a child of either the base disk image, or another intermediate delta disk.
By way of example, suppose base disk image 172 included an installation of an operating system such as Microsoft Windows and an office production suite, including a word processor, email client, spreadsheet, etc. Intermediate delta disks 174, 182 may include additional installed applications needed for users of a particular group in an organization, such as accountants or engineers. To create an intermediate delta disk, a delta disk such as delta 174 is created which initially appears to the computer as an exact copy of base disk image 172, since no changes were written to the delta disk. Then the virtual machine is launched using the delta disk image, essentially launching the operating system installed on base disk image 172. Then, the various departmental applications may be installed to the virtual disk formed by the hierarchical disk structure formed by the delta disk and base disk image. The virtual machine may then be powered down if needed. A snapshot of the VM is created, which then makes the delta an intermediate delta disk. Terminal deltas pointing to the just-created intermediate delta can then be created for a plurality of users. Each terminal delta may be individualized with configurations necessary for them to coexist on a network, e.g., unique machine numbers, MAC addresses, etc. which are managed using well-understood techniques, described for example, in U.S. Pat. No. 7,356,679, entitled “Computer Image Capture, Customization and Deployment,” which issued on Apr. 8, 2008.
Modifying a base disk image or intermediate delta disk that has delta disks depending from them could corrupt the virtual disk images represented by the terminal delta disks. Referring to FIG. 2 , if intermediate delta disk 182 were to be written to, it could corrupt the image provided by terminal delta disk 184, since terminal delta disk 184 represents changes to the image presented by intermediate delta disk 182 and these changes can be inconsistent with any changes made directly to intermediate delta disk 182. Therefore, intermediate delta disks and the base disk image are generally locked. In one embodiment, they are locked by tagging the file using a “read only” tag provided by the filing system, which thereafter prevents any application from writing to these files. If delta disk 184 were to be deleted, then the image can be reverted back to the state prior to the changes embodied in terminal delta disk 184 by simply referencing intermediate delta disk 182, which becomes a terminal delta disk (and read-only protection removed) since no additional delta disks depend from intermediate delta disk 182. Other protections for intermediate delta disks and base disk image files may be provided in addition to, or instead of file system “read-only” tagging. For example, a database may be used to track interrelations between components of a hierarchical disk, and software accessing these components can be written to ensure that base disk images and intermediate delta disks are never written to.
In an alternative embodiment, a content ID may be stored in a database associated with a host server such as 305 a and 305 b, in a distributed system of hypervisors. Such a distributed system of hypervisors may have a virtual machine manager 302, but one is not required. It will be appreciated that this alternative embodiment promote scalability and fault tolerance if redundant copies of a content ID were persistently stored, albeit at the time expense of additional communication regarding content IDs between the hypervisors themselves or between the hypervisors and the virtual machine manager, if present.
In one embodiment, each content ID is 128-bit number that is randomly generated, e.g., using a pseudorandom or random number generator. It is also possible to sequentially assign content IDs. In this respect, one might regard a content ID as somewhat similar to a Universally Unique Identifier (UUID) or a Globally Unique Identifier (GUID). However, in one embodiment, VM manager 302 assigns a content ID to a component (i.e., a delta disk or base disk image) in a virtual disk hierarchy when the hypervisor operating in conjunction with the virtual machine associated with the component performs a “file open” operation and an initial “file write” operation on the component. The content ID remains unchanged during additional “file write” operations by the virtual machine. However, if the virtual machine performs a “file close” operation on the component and then performs another “file open” and “file write” operation on the component, the system, e.g., VM manager 302, will assign a new content ID to the component.
Content ID collisions can occur if two components happen to be given the same content ID even though the contents are not identical. Such a collision, although extremely unlikely, could cause serious data corruption. One approach to prevent content ID collisions would be to monitor any content ID changes in the system and look for collisions. If the content ID of a given disk component is changed to a particular value, and another disk component in the system already has a content ID of that value, then you can conservatively assume that this is a collision, since it is very unlikely that a disk write caused the content to suddenly become the same as another disk component. In the case of a collision, a new content ID is assigned to the changed disk component. An alternate method can be used if deemed necessary for newly recognized disk images, i.e., disk images that have no content ID assigned. For example, file length comparison and contents of a disk block at a random file offset can be used to eliminate each other disk component as matching, in which case a new content ID would be assigned.
For purposes of illustration, suppose VM 404 accesses a virtual disk image implemented by a disk hierarchy including base disk 155 and delta disk 162. In an embodiment where the hypervisor is provided by VMware Inc., both the base disk image 155 and delta disk 162 might be a VMDK file, i.e., contents of a virtual machine's hard disk drive may be encapsulated using the VMDK file format. Also, VM 404 itself may be embodied in VMX file 157. VMX files are data files for storing the configuration settings for VMs. Thus, VMX file 157 may be transferred to target data store 360 b along with the virtual disk image associated with VM 404. Additional files associated with the migrating VM may also be transferred.
In one embodiment, on receiving a request via API 303 to move a particular VM 404 from VM host 305 a to VM host 305 b, VM manager 302 issues a query to database 304 to identify the components of disk hierarchy associated with the specified VM. VM Manager then checks to see which components, if any, are already present on target datastore 306 b, which contains files for VMs on VM host 305 b.
The check is performed by accessing the content ID of hierarchical disk components present on datastore 306 b, and comparing the content ID of base disk image 155 with those disk components already on datastore 306 b. If datastore 306 b includes base disk image 155 having a content ID that matches the value of the content ID base disk image 155 on datastore 306 a, an extremely strong inference arises that the two base disk images have the same contents. In this case, VM manager 302 does not copy the base disk image 155 from datastore 306, but merely associates delta disk 162 on target datastore 306 b with the base disk image 155 on target datastore 306 b. By “associate,” is meant that the newly associated base disk image on the target datastore would then be referenced for disk reads when the information sought is not present in the delta disk. The reference may be written to a field within or property of the delta disk itself or it may be maintained separately. On the other hand, if base disk image 155 is not present on datastore 306 b, i.e., there are no disk component on datastore 306 b having a matching content ID, then base disk image 155 is copied from datastore 306 a to datastore 306 b, as shown by broken arrow 314.
The transfer of files as described above may be carried out on behalf of VM manager 302. For example, VM manager 302 may command the hypervisor 202 a or 202 b to copy only delta disk 162 to datastore 306 b and then associate copied delta disk 162 on datastore 306 b with the copy of base disk image 155 already on datastore 306 b. In an alternative embodiment, VM manager 302 may be in communication with datastores 306 a, 306 b, and perform the move and/or association directly rather than so commanding one of the VM hosts.
In one embodiment, the VM being transferred might be in a “powered off” or “suspended” state, such that the virtual machine is not currently executing or in a scheduling queue. In another embodiment, the VM may be executing during migration of the disk, e.g., as described in United States Patent Application Publication 2009/0037680, which published Feb. 5, 2009. In another embodiment, the virtual machine may be migrated “live,” i.e., without significant interruption of the operation of the VM along with disk migration. Live migration of a VM is described in U.S. Pat. No. 7,484,208, which issued Jan. 27, 2009. Live migration might be performed when VM manager 302 engages in dynamic load balancing of virtual machines or some other form of distributed resource management.
Two virtual machines are referred to as “linked clones” when they share a common base disk image or intermediate delta disk. Some hypervisors support linked clones. It will be appreciated that a linked clone includes a specific type of virtual disk hierarchy, e.g., a virtual disk hierarchy with at least one virtual delta disks associated with a virtual base disk. A linked clone might be created by a disk snapshot, and facilitate the rapid deployment of the same guest system software in a multitude of virtual machines.
At some point in time, a VM manager (not shown) receives a request to transfer VM B 606 from source datastore 620 to target datastore 630. It will be appreciated that this transfer involves, among other things, copying the virtual base disk 602 with content ID X to target datastore 630, as well as the delta disk 604. The reference count for base disk 602 on target datastore 630 is initialized at one, since newly copied delta B 604 depends on base disk 602. As indicated by dashed outlines, VM B, i.e., the file or files containing configuration and state information, such as the VMX file described above, and delta B 604 may be deleted from source datastore 620 after the virtual machine's transfer to target datastore 630. When delta B 604 is deleted, the reference count for base disk 602 on source datastore 620 is decremented from two to one, indicating that only one delta disk (delta A 603) depends from base disk 602 on source datastore 620. The reference count may be referenced to identify whether delta disks depend on base disk image 602. If the reference count is zero, then no delta disks depend from base disk image 602, it may be deleted from source datastore 620. In the present example, base disk image 602 is not deleted from source datastore 620, since at this time the reference count is equal to a number that is greater than zero, i.e., one. VM A 605 still resides in source datastore 620 and relies on the presence of a copy of base disk image 602 in source datastore 620. When delta B 604 and base disk 602 are copied from source datastore 620 to target datastore 630, they each retain their respective content ID such that both copies of base disk 602 have the same content ID.
At a later point in time, the system transfers VM A 605 and its delta disk 603 to target datastore 630 as shown in FIG. 4B . A check of content IDs of components present in target datastore 630 reveals that a copy of base disk image 602 is already present on target datastore 630. Therefore, the system does not create a second copy of that virtual base disk, saving both copying time and storage space on target datastore 630. To complete the transfer of the linked clones, the system associates (or “reparents”) delta disk 603 on target datastore 630 with the base disk image 602 already present on target datastore 630, thereby re-creating the linked clones on target datastore 630. When delta A 603 is parented to base disk 602, the reference count for the copy of base disk 602 on target datastore 630 is incremented from one to two. At this time the disk hierarchy originally present on source datastore 620 has been recreated on target datastore 630.
Having copied VM A 605 and delta disk A 603 to target datastore 630, and having recognized the presence of a copy of base disk image 602 on target datastore 630, each component may be deleted from source datastore 620, as indicated by the broken outlines of these components in FIG. 4B . When delta A 603 is deleted from source datastore 620, the reference count for base disk 602 is decremented from one to zero, indicating there are no longer any delta disks depending on base disk 602. When base disk 602 is copied to target datastore 620, a check of the reference count for source datastore 620 reveals that base disk 602 may be deleted from source datastore 620 since the reference count is now zero, indicating no delta disks remain on source datastore 620 that depend on base disk 602. Note that deletion of hierarchical disk components from source datastore 620 is optional. For example, the copy operation may be performed for data redundancy, i.e., as a “backup” operation, in which case maintaining original copies of the components on source datastore 620 would be desirable.
In operation 708, the system then enters a loop and determines whether the current virtual delta disk (e.g., the virtual delta disk that was copied to the target datastore and deleted from the source datastore) is dependent on an intermediate delta disk or base disk on the source datastore. If the determination in operation 708 is negative, the system exits the loop and proceeds to operation 714. If the determination in operation 708 is positive, the system proceeds with operation 710 wherein it is determined whether the target datastore contains a copy of the intermediate delta disk or base disk on which the current delta disk depends. As described above with reference to FIG. 3 , this determination involves comparing a content ID of the dependent disk with hierarchical disk components already present on the target datastore. If there is no match, then the determination in operation 710 is negative, and the system proceeds to operation 712.
In operation 712, the intermediate delta disk or base disk on which the current delta disk depends is copied to the target datastore, and the current delta is reparented to the just copied intermediate delta or base disk. By “reparented,” it is meant that current delta becomes the child of the just-copied intermediate delta or base disk. If the intermediate delta/base disk on the source datastore no longer has any dependent deltas, then it may be deleted from the source datastore. The procedure then traverses to the next component in the source hierarchy, such that the previous dependent intermediate delta/base disk becomes the new “current delta.” The procedure then loops back to operation 708 which continues as previously described.
In operation 710, if the target datastore does contain a copy of the intermediate delta/base disk, then the procedure flows to operation 716, wherein the copied current delta on the target datastore is reparented to the copy of the intermediate delta/base disk already present on the target datastore. The source datastore (or database) may be checked to determine whether the copy of the intermediate delta/base disk on the source datastore has any dependent delta disks. If not, then the copy of the intermediate delta/base disk on the source datastore may optionally be deleted. The procedure then ends as indicated by done block 720.
Returning to operation 708, if the current delta is not dependent on an intermediate delta/base disk, then the current delta is by definition actually a base disk, and the procedure flows to operation 714.
The path in flowchart 700 leading through operations 708, 710, and 716, then done block 720, is highlighted in FIG. 5 . This highlighted path shows that when a content ID match is found, the just-copied delta is reparented to the matched file and the procedure is completed. No other ancestors need to be copied, irrespective of how many levels of components are left in the hierarchy. This is very useful as finding the match for the first parent itself means that only the terminal delta needs to be copied. Thus, the relocation happens very fast compared to prior methods of disk migration. In addition to faster completion time, the above-described procedure reduces the amount of data to be copied when VMs are sharing base disks and significantly reduces storage requirements on the target datastore. It will be appreciated that the deletion operation mentioned above with reference to operations 706, 712, and 716 are optional, insofar as there might be circumstances where a system administrator might want to create multiple versions of the same virtual disk hierarchy, e.g., for purposes of redundancy (i.e., backups), fault tolerance, etc.
Any of the operations described herein that form part of the inventions are useful machine operations. The inventions also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for that purpose or it may be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines may be used with computer programs written in accordance with the teachings herein or it may be more convenient to construct a more specialized apparatus to perform the operations.
The inventions can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network-coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
It will be further appreciated that the instructions represented by the operations in the above figures are not required to be performed in the order illustrated and that all of the processing represented by the operations might not be necessary to practice the inventions. Further, the processes described in any of the above figures can also be implemented in software stored in any one of or combinations of the RAM, the ROM, or the hard disk drive.
Although the foregoing inventions have been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the appended claims. In this regard, it will be appreciated that there are many other possible orderings of the operations in the processes described above and many possible modularizations of those orderings. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the appended claims, elements and/or operations do not imply any particular order of operation, unless explicitly stated in the claims or implicitly required by the disclosure.
Claims (15)
1. A software encoded in one or more non-transitory storage media for execution by a processor and when executed is operable to:
receive one or more calls to an application programming interface, wherein the one or more calls request a copying of a virtual machine and a virtual disk hierarchy from a source datastore to a target datastore and wherein the one or more calls indicate that a sharing of a non-writable component of the virtual disk hierarchy is allowable;
copy a child component of the virtual disk hierarchy to the target datastore without copying the parent component of the virtual disk hierarchy if a copy of the parent component is already present on the target datastore, as determined by matching a content identifier for the parent component with a content identifier for a component stored on the target datastore; and
associate the copied child component of the virtual disk hierarchy with the copy of the parent component,
wherein, in the source datastore and in the target datastore, virtual disks associated with one or more virtual machines are maintained in virtual disk hierarchies, each virtual disk hierarchy including, as components of the virtual disk hierarchy, a base disk and one or more delta disks which contain changes to the base disk and which are particular to respective virtual machines associated with the virtual disk, and each component of each virtual disk hierarchy having a corresponding content identifier that is updated in response to contents of the component being modified as a result of one or more write operations to the component.
2. The software of claim 1 , wherein a new content identifier is assigned to each of the components on the target datastore each time a file open and file write is performed on the component.
3. The software of claim 2 , wherein the new content identifier is randomly generated.
4. The software of claim 1 , wherein the software when executed is further operable to:
delete the child component on the source datastore;
determine whether there is an additional child component present on the source datastore that is dependent on the parent component; and
delete the parent component from the source datastore when there is no additional child component present on the source datastore that is dependent on the parent component.
5. The software of claim 4 , wherein the determining as to whether there is an additional child component present on the source datastore that is dependent on the parent component comprises checking a reference count associated with the parent component on the source datastore.
6. The software of claim 1 , wherein the software when executed is further operable to:
if the copy of the parent component is not already present on the target datastore, copy the parent component from the source datastore to the target datastore and associate the copied child component with the copied parent component so that the copied parent component becomes the parent component to the copied child component at the target datastore.
7. A method for recreating a virtual disk hierarchy that is present on a source datastore on a target datastore, the method comprising:
receiving one or more calls to an application programming interface, wherein the one or more calls request a copying of a virtual machine and the virtual disk hierarchy from the source datastore to the target datastore and wherein the one or more calls indicate that a sharing of a non-writable component of the virtual disk hierarchy is allowable;
copying a child component of the virtual disk hierarchy to the target datastore without copying the parent component of the virtual disk hierarchy if a copy of the parent component is already present on the target datastore, as determined by matching a content identifier for the parent component with a content identifier for a component stored on the target datastore; and
associating the copied child component of the virtual disk hierarchy with the copy of the parent component,
wherein, in the source datastore and in the target datastore, virtual disks associated with one or more virtual machines are maintained in virtual disk hierarchies, each virtual disk hierarchy including, as components, a base disk and one or more delta disks which contain changes to the base disk and which are particular to respective virtual machines associated with the virtual disk, and each component of each virtual disk hierarchy having a corresponding content identifier that is updated in response to contents of the component being modified as a result of one or more write operations to the component.
8. The method of claim 7 , wherein a new content identifier is assigned to each of the components on the target datastore each time a file open and file write is performed on the component.
9. The method of claim 8 , wherein the new content identifier is randomly generated.
10. The method of claim 7 , the method further comprising:
deleting the child component on the source datastore;
determining whether there is an additional child component present on the source datastore that is dependent on the parent component; and
deleting the parent component from the source datastore when there is no additional child component present on the source datastore that is dependent on the parent component.
11. The method of claim 10 , wherein the determining as to whether there is an additional child component present on the source datastore that is dependent on the parent component comprises checking a reference count associated with the parent component on the source datastore.
12. The method of claim 7 , the method further comprising:
if the copy of the parent component is not already present on the target datastore, copying the parent component from the source datastore to the target datastore and associating the copied child component with the copied parent component so that the copied parent component becomes the parent component to the copied child component at the target datastore.
13. A system comprising:
a source datastore including or residing on one or more physical storage devices;
a target datastore including or residing on one or more physical storage devices; and
one or more computers in which virtual machines are executed, each computer comprising:
a processor, and
a memory, wherein the memory includes a program configured to perform operations comprising:
receiving one or more calls to an application programming interface, wherein the one or more calls request a copying of a virtual machine and a virtual disk hierarchy from the source datastore to the target datastore and wherein the one or more calls indicate that a sharing of a non-writable component of the virtual disk hierarchy is allowable,
copying a child component of the virtual disk hierarchy to the target datastore without copying the parent component of the virtual disk hierarchy if a copy of the parent component is already present on the target datastore, as determined by matching a content identifier for the parent component with a content identifier for a component stored on the target datastore, and
associating the copied child component of the virtual disk hierarchy with the copy of the parent component,
wherein, in the source datastore and in the target datastore, virtual disks associated with one or more virtual machines are maintained in virtual disk hierarchies, each virtual disk hierarchy including, as components, a base disk and one or more delta disks which contain changes to the base disk and which are particular to respective virtual machines associated with the virtual disk, and each component of each virtual disk hierarchy having a corresponding content identifier that is updated in response to contents of the component being modified as a result of one or more write operations to the component.
14. The system of claim 13 , wherein a new content identifier is assigned to each of the components on the target datastore each time a file open and file write is performed on the component.
15. The system of claim 13 , the operations further comprising:
deleting the child component on the source datastore;
determining whether there is an additional child component present on the source datastore that is dependent on the parent component; and
deleting the parent component from the source datastore when there is no additional child component present on the source datastore that is dependent on the parent component.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/934,127 US9037621B2 (en) | 2009-05-20 | 2013-07-02 | Efficient reconstruction of virtual disk hierarchies across storage domains |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/469,577 US8478801B2 (en) | 2009-05-20 | 2009-05-20 | Efficient reconstruction of virtual disk hierarchies across storage domains |
US13/934,127 US9037621B2 (en) | 2009-05-20 | 2013-07-02 | Efficient reconstruction of virtual disk hierarchies across storage domains |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/469,577 Continuation US8478801B2 (en) | 2009-05-20 | 2009-05-20 | Efficient reconstruction of virtual disk hierarchies across storage domains |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130298125A1 US20130298125A1 (en) | 2013-11-07 |
US9037621B2 true US9037621B2 (en) | 2015-05-19 |
Family
ID=43125282
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/469,577 Active 2031-01-15 US8478801B2 (en) | 2009-05-20 | 2009-05-20 | Efficient reconstruction of virtual disk hierarchies across storage domains |
US13/934,127 Active US9037621B2 (en) | 2009-05-20 | 2013-07-02 | Efficient reconstruction of virtual disk hierarchies across storage domains |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/469,577 Active 2031-01-15 US8478801B2 (en) | 2009-05-20 | 2009-05-20 | Efficient reconstruction of virtual disk hierarchies across storage domains |
Country Status (1)
Country | Link |
---|---|
US (2) | US8478801B2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150113531A1 (en) * | 2013-10-18 | 2015-04-23 | Power-All Networks Limited | System for migrating virtual machine and method thereof |
US9621654B2 (en) * | 2013-11-14 | 2017-04-11 | Vmware, Inc. | Intelligent data propagation using performance monitoring |
CN106874066A (en) * | 2017-01-20 | 2017-06-20 | 中兴通讯股份有限公司 | A kind of virtual machine migration method and device, electronic equipment |
US20170279797A1 (en) * | 2016-03-22 | 2017-09-28 | International Business Machines Corporation | Container Independent Secure File System for Security Application Containers |
US10114570B2 (en) | 2017-01-27 | 2018-10-30 | Red Hat Israel, Ltd. | Deleting disks while maintaining snapshot consistency in a virtualized data-center |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8086811B2 (en) | 2008-02-25 | 2011-12-27 | International Business Machines Corporation | Optimizations of a perform frame management function issued by pageable guests |
US8984505B2 (en) * | 2008-11-26 | 2015-03-17 | Red Hat, Inc. | Providing access control to user-controlled resources in a cloud computing environment |
US8943203B1 (en) * | 2009-07-10 | 2015-01-27 | Netapp, Inc. | System and method for storage and deployment of virtual machines in a virtual server environment |
EP2461248A4 (en) | 2009-07-31 | 2013-02-13 | Nec Corp | Control server, service-providing system, and method of providing a virtual infrastructure |
US9959131B2 (en) * | 2009-08-03 | 2018-05-01 | Quantum Corporation | Systems and methods for providing a file system viewing of a storeage environment |
US9495190B2 (en) * | 2009-08-24 | 2016-11-15 | Microsoft Technology Licensing, Llc | Entropy pools for virtual machines |
US8473531B2 (en) | 2009-09-03 | 2013-06-25 | Quantum Corporation | Presenting a file system for a file containing items |
US8161077B2 (en) | 2009-10-21 | 2012-04-17 | Delphix Corp. | Datacenter workflow automation scenarios using virtual databases |
US8150808B2 (en) | 2009-10-21 | 2012-04-03 | Delphix Corp. | Virtual database system |
US9106591B2 (en) | 2009-12-24 | 2015-08-11 | Delphix Corporation | Adaptive resource management using survival minimum resources for low priority consumers |
US9037547B1 (en) * | 2010-09-15 | 2015-05-19 | Symantec Corporation | Backup time deduplication of common virtual disks from virtual machine backup images |
US10284437B2 (en) * | 2010-09-30 | 2019-05-07 | Efolder, Inc. | Cloud-based virtual machines and offices |
US9665594B2 (en) * | 2011-01-14 | 2017-05-30 | Apple Inc. | Local backup |
US8856486B2 (en) * | 2011-02-23 | 2014-10-07 | Citrix Systems, Inc. | Deploying a copy of a disk image from source storage to target storage |
US9542215B2 (en) * | 2011-09-30 | 2017-01-10 | V3 Systems, Inc. | Migrating virtual machines from a source physical support environment to a target physical support environment using master image and user delta collections |
US9519496B2 (en) * | 2011-04-26 | 2016-12-13 | Microsoft Technology Licensing, Llc | Detecting and preventing virtual disk storage linkage faults |
US9176744B2 (en) * | 2011-05-20 | 2015-11-03 | Citrix Systems, Inc. | Quickly provisioning a virtual machine by identifying a path to a differential file during pre-boot |
US8863124B1 (en) | 2011-08-10 | 2014-10-14 | Nutanix, Inc. | Architecture for managing I/O and storage for a virtualization environment |
US8549518B1 (en) | 2011-08-10 | 2013-10-01 | Nutanix, Inc. | Method and system for implementing a maintenanece service for managing I/O and storage for virtualization environment |
US9009106B1 (en) | 2011-08-10 | 2015-04-14 | Nutanix, Inc. | Method and system for implementing writable snapshots in a virtualized storage environment |
US9652265B1 (en) | 2011-08-10 | 2017-05-16 | Nutanix, Inc. | Architecture for managing I/O and storage for a virtualization environment with multiple hypervisor types |
US9747287B1 (en) | 2011-08-10 | 2017-08-29 | Nutanix, Inc. | Method and system for managing metadata for a virtualization environment |
US8850130B1 (en) | 2011-08-10 | 2014-09-30 | Nutanix, Inc. | Metadata for managing I/O and storage for a virtualization |
US8601473B1 (en) | 2011-08-10 | 2013-12-03 | Nutanix, Inc. | Architecture for managing I/O and storage for a virtualization environment |
US8583852B1 (en) | 2011-09-01 | 2013-11-12 | Symantec Operation | Adaptive tap for full virtual machine protection |
US9483358B1 (en) | 2011-09-30 | 2016-11-01 | EMC IP Holding Company LLC | Synthetic block based backup |
US8898407B1 (en) | 2011-09-30 | 2014-11-25 | Emc Corporation | Incremental block based backup |
US8769224B1 (en) | 2011-09-30 | 2014-07-01 | Emc Corporation | Discovering new physical machines for physical to virtual conversion |
US8738870B1 (en) | 2011-09-30 | 2014-05-27 | Emc Corporation | Block based backup |
US8856078B2 (en) * | 2012-02-21 | 2014-10-07 | Citrix Systems, Inc. | Dynamic time reversal of a tree of images of a virtual hard disk |
US9244717B2 (en) * | 2012-03-29 | 2016-01-26 | Vmware, Inc. | Method and system for visualizing linked clone trees |
US9223501B2 (en) * | 2012-04-23 | 2015-12-29 | Hitachi, Ltd. | Computer system and virtual server migration control method for computer system |
US9772866B1 (en) | 2012-07-17 | 2017-09-26 | Nutanix, Inc. | Architecture for implementing a virtualization environment and appliance |
US9778860B2 (en) | 2012-09-12 | 2017-10-03 | Microsoft Technology Licensing, Llc | Re-TRIM of free space within VHDX |
US9116737B2 (en) * | 2013-04-30 | 2015-08-25 | Vmware, Inc. | Conversion of virtual disk snapshots between redo and copy-on-write technologies |
US10025674B2 (en) * | 2013-06-07 | 2018-07-17 | Microsoft Technology Licensing, Llc | Framework for running untrusted code |
US9043576B2 (en) * | 2013-08-21 | 2015-05-26 | Simplivity Corporation | System and method for virtual machine conversion |
WO2015066702A2 (en) * | 2013-11-04 | 2015-05-07 | Falconstor, Inc. | Write performance preservation with snapshots |
US9268836B2 (en) * | 2013-11-14 | 2016-02-23 | Vmware, Inc. | Intelligent data propagation in a highly distributed environment |
US9723065B2 (en) * | 2014-10-13 | 2017-08-01 | Vmware, Inc. | Cross-cloud object mapping for hybrid clouds |
US9569110B2 (en) * | 2014-11-18 | 2017-02-14 | International Business Machines Corporation | Efficient management of cloned data |
US9507623B2 (en) * | 2014-12-15 | 2016-11-29 | Vmware, Inc. | Handling disk state inheritance for forked virtual machines |
US10684876B2 (en) | 2015-05-14 | 2020-06-16 | Netapp, Inc. | Migration of virtual machine data using native data paths |
US10628194B2 (en) * | 2015-09-30 | 2020-04-21 | Netapp Inc. | Techniques for data migration |
US10133874B1 (en) * | 2015-12-28 | 2018-11-20 | EMC IP Holding Company LLC | Performing snapshot replication on a storage system not configured to support snapshot replication |
US10235061B1 (en) | 2016-09-26 | 2019-03-19 | EMC IP Holding Company LLC | Granular virtual machine snapshots |
US10817321B2 (en) * | 2017-03-21 | 2020-10-27 | International Business Machines Corporation | Hardware independent interface for cognitive data migration |
US10809935B2 (en) * | 2018-12-17 | 2020-10-20 | Vmware, Inc. | System and method for migrating tree structures with virtual disks between computing environments |
US11609775B2 (en) | 2019-04-30 | 2023-03-21 | Rubrik, Inc. | Systems and methods for continuous data protection comprising storage of completed I/O requests intercepted from an I/O stream using touch points |
US11663092B2 (en) | 2019-04-30 | 2023-05-30 | Rubrik, Inc. | Systems and methods for continuous data protection |
US11500664B2 (en) * | 2019-04-30 | 2022-11-15 | Rubrik, Inc. | Systems and method for continuous data protection and recovery by implementing a set of algorithms based on the length of I/O data streams |
US11663089B2 (en) | 2019-04-30 | 2023-05-30 | Rubrik, Inc. | Systems and methods for continuous data protection |
US11106482B2 (en) * | 2019-05-31 | 2021-08-31 | Microsoft Technology Licensing, Llc | Connectivity migration in a virtual execution system |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930806A (en) | 1997-05-07 | 1999-07-27 | Fujitsu Limited | Method and system for data migration from network database to relational database |
US5970496A (en) * | 1996-09-12 | 1999-10-19 | Microsoft Corporation | Method and system for storing information in a computer system memory using hierarchical data node relationships |
US20030182325A1 (en) | 2002-03-19 | 2003-09-25 | Manley Stephen L. | System and method for asynchronous mirroring of snapshots at a destination using a purgatory directory and inode mapping |
US20040243571A1 (en) * | 1999-08-26 | 2004-12-02 | Microsoft Corporation | Method and system for detecting object inconsistency in a loosely consistent replicated directory service |
US20060101041A1 (en) * | 2001-09-28 | 2006-05-11 | Oracle International Corporation | Providing a consistent hierarchical abstraction of relational data |
US7356679B1 (en) | 2003-04-11 | 2008-04-08 | Vmware, Inc. | Computer image capture, customization and deployment |
US20080098154A1 (en) | 2002-07-11 | 2008-04-24 | Microsoft Corporation | Method for forking or migrating a virtual machine |
US20080215796A1 (en) * | 2003-12-08 | 2008-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Virtual Appliance Management |
US7484208B1 (en) | 2002-12-12 | 2009-01-27 | Michael Nelson | Virtual machine migration |
US20090037680A1 (en) | 2007-07-31 | 2009-02-05 | Vmware, Inc. | Online virtual machine disk migration |
US20090222691A1 (en) | 2008-02-29 | 2009-09-03 | Riemers Bill C | Data Migration Manager |
US20100049930A1 (en) * | 2008-08-25 | 2010-02-25 | Vmware, Inc. | Managing Backups Using Virtual Machines |
US20100057759A1 (en) | 2008-08-28 | 2010-03-04 | Make Technologies, Inc. | Linking of Parent-Child Data Records in a Legacy software Modernization System |
US20100205303A1 (en) * | 2009-02-10 | 2010-08-12 | Pradeep Kumar Chaturvedi | Virtual machine software license management |
US20100205224A1 (en) * | 2009-02-12 | 2010-08-12 | Oracle International Corporation | System and method for creating and managing universally unique identifiers for services |
US20100262586A1 (en) | 2009-04-10 | 2010-10-14 | PHD Virtual Technologies | Virtual machine data replication |
US8429360B1 (en) * | 2009-09-28 | 2013-04-23 | Network Appliance, Inc. | Method and system for efficient migration of a storage object between storage servers based on an ancestry of the storage object in a network storage system |
-
2009
- 2009-05-20 US US12/469,577 patent/US8478801B2/en active Active
-
2013
- 2013-07-02 US US13/934,127 patent/US9037621B2/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5970496A (en) * | 1996-09-12 | 1999-10-19 | Microsoft Corporation | Method and system for storing information in a computer system memory using hierarchical data node relationships |
US5930806A (en) | 1997-05-07 | 1999-07-27 | Fujitsu Limited | Method and system for data migration from network database to relational database |
US20040243571A1 (en) * | 1999-08-26 | 2004-12-02 | Microsoft Corporation | Method and system for detecting object inconsistency in a loosely consistent replicated directory service |
US20060101041A1 (en) * | 2001-09-28 | 2006-05-11 | Oracle International Corporation | Providing a consistent hierarchical abstraction of relational data |
US20030182325A1 (en) | 2002-03-19 | 2003-09-25 | Manley Stephen L. | System and method for asynchronous mirroring of snapshots at a destination using a purgatory directory and inode mapping |
US7225204B2 (en) | 2002-03-19 | 2007-05-29 | Network Appliance, Inc. | System and method for asynchronous mirroring of snapshots at a destination using a purgatory directory and inode mapping |
US20080098154A1 (en) | 2002-07-11 | 2008-04-24 | Microsoft Corporation | Method for forking or migrating a virtual machine |
US7484208B1 (en) | 2002-12-12 | 2009-01-27 | Michael Nelson | Virtual machine migration |
US7356679B1 (en) | 2003-04-11 | 2008-04-08 | Vmware, Inc. | Computer image capture, customization and deployment |
US20080215796A1 (en) * | 2003-12-08 | 2008-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Virtual Appliance Management |
US20090037680A1 (en) | 2007-07-31 | 2009-02-05 | Vmware, Inc. | Online virtual machine disk migration |
US20090222691A1 (en) | 2008-02-29 | 2009-09-03 | Riemers Bill C | Data Migration Manager |
US20100049930A1 (en) * | 2008-08-25 | 2010-02-25 | Vmware, Inc. | Managing Backups Using Virtual Machines |
US20100057759A1 (en) | 2008-08-28 | 2010-03-04 | Make Technologies, Inc. | Linking of Parent-Child Data Records in a Legacy software Modernization System |
US20100205303A1 (en) * | 2009-02-10 | 2010-08-12 | Pradeep Kumar Chaturvedi | Virtual machine software license management |
US20100205224A1 (en) * | 2009-02-12 | 2010-08-12 | Oracle International Corporation | System and method for creating and managing universally unique identifiers for services |
US20100262586A1 (en) | 2009-04-10 | 2010-10-14 | PHD Virtual Technologies | Virtual machine data replication |
US20100262585A1 (en) | 2009-04-10 | 2010-10-14 | PHD Virtual Technologies | Virtual machine file-level restoration |
US8429360B1 (en) * | 2009-09-28 | 2013-04-23 | Network Appliance, Inc. | Method and system for efficient migration of a storage object between storage servers based on an ancestry of the storage object in a network storage system |
Non-Patent Citations (3)
Title |
---|
"VMware Infrastructure SDK Programming Guide," Revision 20060906, Version 2.0.1 Item: SDK-ENG-Q306-291. |
Article entitled "Understanding and Using Microsoft Windows Server 2008 Hyper-V Snapshots", by Carbone, dated Jul. 23, 2008. * |
Article entitled "Workstation User's Manual," Copyright 2007, by VMware. |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150113531A1 (en) * | 2013-10-18 | 2015-04-23 | Power-All Networks Limited | System for migrating virtual machine and method thereof |
US9621654B2 (en) * | 2013-11-14 | 2017-04-11 | Vmware, Inc. | Intelligent data propagation using performance monitoring |
US20170279797A1 (en) * | 2016-03-22 | 2017-09-28 | International Business Machines Corporation | Container Independent Secure File System for Security Application Containers |
US10498726B2 (en) * | 2016-03-22 | 2019-12-03 | International Business Machines Corporation | Container independent secure file system for security application containers |
US11159518B2 (en) | 2016-03-22 | 2021-10-26 | International Business Machines Corporation | Container independent secure file system for security application containers |
CN106874066A (en) * | 2017-01-20 | 2017-06-20 | 中兴通讯股份有限公司 | A kind of virtual machine migration method and device, electronic equipment |
CN106874066B (en) * | 2017-01-20 | 2021-01-26 | 中兴通讯股份有限公司 | Virtual machine migration method and device and electronic equipment |
US10114570B2 (en) | 2017-01-27 | 2018-10-30 | Red Hat Israel, Ltd. | Deleting disks while maintaining snapshot consistency in a virtualized data-center |
Also Published As
Publication number | Publication date |
---|---|
US8478801B2 (en) | 2013-07-02 |
US20130298125A1 (en) | 2013-11-07 |
US20100299368A1 (en) | 2010-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9037621B2 (en) | Efficient reconstruction of virtual disk hierarchies across storage domains | |
US11726845B2 (en) | Private page cache-based sharing of access to application image layers by application containers | |
US11409705B2 (en) | Log-structured storage device format | |
US11809753B2 (en) | Virtual disk blueprints for a virtualized storage area network utilizing physical storage devices located in host computers | |
US10061520B1 (en) | Accelerated data access operations | |
AU2014311869B2 (en) | Partition tolerance in cluster membership management | |
US10025806B2 (en) | Fast file clone using copy-on-write B-tree | |
US8719767B2 (en) | Utilizing snapshots to provide builds to developer computing devices | |
US9305014B2 (en) | Method and system for parallelizing data copy in a distributed file system | |
US10860536B2 (en) | Graph driver layer management | |
US10303499B2 (en) | Application aware graph driver | |
US8473463B1 (en) | Method of avoiding duplicate backups in a computing system | |
US10007533B2 (en) | Virtual machine migration | |
US8819357B2 (en) | Method and system for ensuring cache coherence of metadata in clustered file systems | |
US20130185509A1 (en) | Computing machine migration | |
WO2019061352A1 (en) | Data loading method and device | |
US10599360B2 (en) | Concurrent and persistent reservation of data blocks during data migration | |
US10740039B2 (en) | Supporting file system clones in any ordered key-value store | |
US9787525B2 (en) | Concurrency control in a file system shared by application hosts | |
US11263252B2 (en) | Supporting file system clones in any ordered key-value store using inode back pointers | |
CN115878374A (en) | Backing up data for namespaces assigned to tenants | |
US20230289263A1 (en) | Hybrid data transfer model for virtual machine backup and recovery | |
US10831520B2 (en) | Object to object communication between hypervisor and virtual machines | |
CN117093332B (en) | Method and device for realizing cloning of virtual machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |