US9244626B2 - System and method for hijacking inodes based on replication operations received in an arbitrary order - Google Patents

System and method for hijacking inodes based on replication operations received in an arbitrary order Download PDF

Info

Publication number
US9244626B2
US9244626B2 US14/160,770 US201414160770A US9244626B2 US 9244626 B2 US9244626 B2 US 9244626B2 US 201414160770 A US201414160770 A US 201414160770A US 9244626 B2 US9244626 B2 US 9244626B2
Authority
US
United States
Prior art keywords
metadata
destination
inode
replication
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US14/160,770
Other versions
US20140136805A1 (en
Inventor
Devang K. Shah
Alan S. Driscoll
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NetApp Inc
Original Assignee
NetApp Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NetApp Inc filed Critical NetApp Inc
Priority to US14/160,770 priority Critical patent/US9244626B2/en
Publication of US20140136805A1 publication Critical patent/US20140136805A1/en
Priority to US15/004,470 priority patent/US9858001B2/en
Application granted granted Critical
Publication of US9244626B2 publication Critical patent/US9244626B2/en
Priority to US15/850,538 priority patent/US10852958B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F17/30174
    • G06F17/30371
    • G06F17/30575
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • a network storage system is a processing system that is used to store and retrieve data on behalf of one or more hosts on a network.
  • a storage system operates on behalf of one or more hosts to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes.
  • Some storage systems are designed to service file-level requests from hosts, as is commonly the case with file servers used in a network attached storage (NAS) environment.
  • Other storage systems are designed to service block-level requests from hosts, as with storage systems used in a storage area network (SAN) environment.
  • Still other storage systems are capable of servicing both file-level requests and block-level requests, as is the case with certain storage servers made by NetApp, Inc. of Sunnyvale, Calif.
  • Data replication is a technique for backing up data in which a given data set at a source is replicated at a destination that is often geographically remote from the source.
  • the replica data set created at the destination storage system is called a “mirror” of the original data set on the source storage system.
  • replication involves the use of at least two storage systems, e.g., one at the source and another at the destination, which communicate with each other through a computer network or other type of data interconnect.
  • Each data block in a given set of data can be represented by both a physical block, pointed to by a corresponding physical block pointer, and a logical block, pointed to by a corresponding logical block pointer. These two blocks are actually the same data block.
  • the physical block pointer indicates the actual physical location of the data block on a storage medium
  • the logical block pointer indicates the logical position of the data block within the data set (e.g., a file) relative to other data blocks.
  • replication is done at a logical block level.
  • the replica at the destination storage system has the identical structure of logical block pointers as the original data set at the source storage system, but it may (and typically does) have a different structure of physical block pointers than the original data set at the source storage system.
  • the file system of the source storage system is analyzed to determine changes that have occurred to the file system. The changes are transferred to the destination storage system. This typically includes “walking” the directory trees at the source storage system to determine the changes to various file system objects within each directory tree, as well as identifying the changed file system object's location within the directory tree structure.
  • the changes are then sent to the destination storage system in a certain order (e.g., directories before subdirectories, and subdirectories before files, etc.) so that the directory tree structure of the source storage system is preserved at the destination storage system.
  • Updates to directories of the source file system are received and processed at the destination storage system before updates to the files in each of the directories can be received and processed.
  • the destination system will be directed to create a directory at a file system location that already contains a file, resulting in an error condition.
  • the replication system will send a first message indicating the create operation and a second operation indicating the modify operation. If the messages are received out of order, the destination system will be directed to modify a file at an unused file system location, resulting in an error condition.
  • the present disclosure is directed to an apparatus and method for hijacking inodes based on file system replication operations (hereinafter referred to as “replication operations”) received in an arbitrary order.
  • the replication operations may be received at a destination storage system from a source storage system as part of a replication process.
  • the order in which the replication operations are received is referred to as “arbitrary” because the order is not restricted by chronological order, file system hierarchy, or any other ordering requirement.
  • the system determines an inode (i.e., a metadata container) on the destination storage system that the replication operation is intended to modify or replace (referred to as the “destination inode”).
  • the system looks for an inconsistency between the replication operation and the destination inode based on the type of the operation or by comparing the destination inode's metadata to the data in the replication operation. If an inconsistency is detected, the system determines that the replication operation is a replacement operation.
  • a “replacement operation” is a type of replication operation that is received in a chronologically different order from the order the corresponding change occurred on the source storage system and must be handled specially.
  • the system “hijacks” the destination inode; i.e., in response to the inconsistency, it replaces at least a part of the inode's metadata contents based on the replication operation.
  • the replication operation does not include enough information to fully populate the destination inode's metadata.
  • the system deletes metadata that was not replaced and/or initializes the metadata to default values and waits for a second replication operation that contains the remaining metadata.
  • the system also frees any data blocks associated with the previous version of the inode. Freeing data blocks means removing references to the data blocks in the destination inode and may also include making the data blocks available to be written to.
  • the hijack system By detecting inconsistencies and hijacking the destination inode where appropriate, the hijack system enables the replication process to function without requiring replication operations to be sent in a particular order. Thus, the hijack system avoids the problems discussed above, which occur when the replication system is required to transmit changes based on the file system hierarchy. According to the system introduced here, inconsistent operations are detected before they are applied to the file system of the destination storage system. The system then modifies the destination inode in place, without having to wait for a delete (or create) operation to be provided. Thus, the system avoids the need for the destination storage system to buffer replication operations to wait for other related operations to arrive.
  • the system can ignore the operation, reducing the number of operations that the destination storage system has to execute during a replication.
  • the source storage system may omit transmitting the delete operations entirely. This reduces processing on the source storage system and network bandwidth on the interconnect between the storage systems.
  • the hijack system can also partially initialize an inode based on a first out-of-order operation and complete the initialization when a later replication operation is received, such as when create and modify operations for a particular inode are received out of order.
  • FIG. 1 is a network environment in which multiple network storage systems cooperate.
  • FIG. 2 is an example of the hardware architecture of a storage system.
  • FIG. 3 is a block diagram of a storage operating system.
  • FIG. 4 depicts a buffer tree of a file.
  • FIG. 5 depicts a buffer tree including an inode file.
  • FIG. 6 is a logical block diagram of an inode hijack system.
  • FIG. 7 is a flow chart of a process for executing the inode hijack system.
  • FIG. 8 is an example structure of a replication operation.
  • the hijack system may be used to assist a data replication process from a source storage system to a destination storage system.
  • the source storage system determines a set of changes made between two points in time and transmits replication operations based on the changes in an arbitrary order.
  • Each replication operation specifies type of operation (e.g., create, modify, delete) and related information, including a target inode number for the operation.
  • the target inode number identifies the inode of the logical data container (e.g., file, directory, or logical unit number (LUN)) that is the target of the replication operation.
  • LUN logical unit number
  • the system When the system receives a replication operation, it looks up a destination inode corresponding to the target inode number in the replication operation. The system then determines whether the replication operation is inconsistent with the destination inode. This may be determined based on the type of operation or by comparing data in the replication operation to the destination inode's metadata. For example, an inconsistency exists if the replication operation is directed to a first inode type while the target inode has a second inode type. Similarly, an inconsistency exists if the replication operation is a modify operation that is directed to an unused inode (i.e., an inode that is not associated with a file system object). An inconsistency also exists if the replication operation specifies an inode generation number (defined below) that differs from the destination inode's generation number.
  • an inode generation number defined below
  • the system determines that the replication operation is a replacement operation.
  • the system hijacks the destination inode by replacing the destination inode's metadata with data determined based on the replication operation. As a part of this process, the system frees data blocks previously associated with the destination inode and replaces the metadata.
  • the system may also change the generation number and/or type of the inode.
  • the replication operation does not include enough information to fully populate the destination inode's metadata. In these cases, the system deletes metadata that was not replaced and waits for a second replication operation that contains the remaining metadata.
  • the system may also initialize some or all of the deleted metadata to default values (e.g., zero or null values).
  • FIG. 1 depicts a configuration of network storage systems in which the techniques being introduced here can be implemented according to an illustrative embodiment.
  • a source storage system 2 A is coupled to a source storage subsystem 4 A and to a set of hosts 1 through an interconnect 3 .
  • the interconnect 3 may be, for example, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a global area network such as the Internet, a Fibre Channel fabric, or any combination of such interconnects.
  • Each of the hosts 1 may be, for example, a conventional personal computer (PC), server-class computer, workstation, handheld computing/communication device, or other computing/communications device.
  • PC personal computer
  • server-class computer workstation
  • handheld computing/communication device or other computing/communications device.
  • the source storage system 2 A includes a storage operating system 7 A, a storage manager 10 A, a snapshot differential module 12 , and a source replication engine 8 A.
  • Each of the storage operating system 7 A, the storage manager 10 A, the snapshot differential module 12 , and the source replication engine 8 A are computer hardware components of the storage system, which can be implemented as special purpose hardware circuitry (e.g., “hardwired”), general purpose hardware circuitry that is programmed with software and/or firmware, or any combination thereof.
  • Storage of data in the source storage subsystem 4 A is managed by the storage manager 10 A of the source storage system 2 A.
  • the source storage system 2 A and the source storage subsystem 4 A are collectively referred to as a source storage system.
  • the storage manager 10 A receives and responds to various read and write requests from the hosts 1 , directed to data stored in or to be stored in the source storage subsystem 4 A.
  • the storage manager 10 A may be implemented as a part of the storage operating system 7 A or as a separate component, as shown in FIG. 1 .
  • the source storage subsystem 4 A includes a number of nonvolatile mass storage devices 5 , which can be, for example, magnetic disks, optical disks, tape drives, solid-state memory, such as flash memory, or any combination of such devices.
  • the mass storage devices 5 in the source storage subsystem 4 A can be organized as a RAID group, in which case the source storage system 2 A can access the source storage subsystem 4 A using a conventional RAID algorithm for redundancy.
  • the storage manager 10 A processes write requests from the hosts 1 and stores data to unused storage locations in the mass storage devices 5 of the source storage subsystem 4 A.
  • the storage manager 10 A implements as a “write anywhere” file system such as the proprietary Write Anywhere File Layout (WAFLTM) file system developed by Network Appliance, Inc., Sunnyvale, Calif.
  • WAFLTM Write Anywhere File Layout
  • Such a file system is not constrained to write any particular data or metadata to a particular storage location or region. Rather, such a file system can write to any unallocated block on any available mass storage device and does not overwrite data on the devices. If a data block on disk is updated or modified with new data, the data block is thereafter stored (written) to a new location on disk instead of modifying the block in place to optimize write performance.
  • the storage manager 10 A of the source storage system 2 A is responsible for managing storage of data in the source storage subsystem 4 A, servicing requests from the hosts 1 , and performing various other types of storage-related operations.
  • the storage manager 10 A, the source replication engine 8 A, and the snapshot differential module 12 are logically on top of the storage operating system 7 A.
  • the source replication engine 8 A operates in cooperation with a remote destination replication engine 8 B, described below, to perform logical replication of data stored in the source storage subsystem 4 A.
  • one or more of the storage manager 10 A, the source replication engine 8 A and the snapshot differential module 12 may be implemented as elements within the storage operating system 7 A.
  • the source storage system 2 A is connected to a destination storage system 2 B through an interconnect 6 , for purposes of replicating data.
  • the interconnect 6 may include one or more intervening devices and/or may include one or more networks.
  • the destination storage system 2 B includes a storage operating system 7 B, the destination replication engine 8 B and a storage manager 10 B.
  • the storage manager 10 B controls storage-related operations on the destination storage system 2 B.
  • the storage manager 10 B and the destination replication engine 8 B are logically on top of the storage operating system 7 B.
  • the storage manager 10 B and the destination replication engine 8 B may be implemented as elements within the storage operating system 7 B.
  • the destination storage system 2 B and the destination storage subsystem 4 B are collectively referred to as the destination storage system.
  • the destination replication engine 8 B works in cooperation with the source replication engine 8 A to replicate data from the source storage system to the destination storage system.
  • the storage operating systems 7 A and 7 B, replication engines 8 A and 8 B, storage managers 10 A and 10 B, and snapshot differential module 12 are all implemented in the form of software. In other embodiments, however, any one or more of these elements may be implemented in hardware alone (e.g., specially designed dedicated circuitry), firmware, or any combination of hardware, software and firmware.
  • the storage systems 2 A and 2 B each may be, for example, a storage system that provides file-level data access services to the hosts 1 , such as commonly done in a NAS environment, or block-level data access services, such as commonly done in a SAN environment, or each may be capable of providing both file-level and block-level data access services to the hosts 1 .
  • the storage systems 2 are illustrated as monolithic systems in FIG. 1 , they can have a distributed architecture.
  • the storage systems 2 each can be designed as physically separate network modules (e.g., “N-module”) and data modules (e.g., “D-module”) (not shown), which communicate with each other over a physical interconnect.
  • N-module network modules
  • D-module data modules
  • Such an architecture allows convenient scaling, such as by deploying two or more N-modules and D-modules, all capable of communicating with each other over the interconnect.
  • FIG. 2 is a high-level block diagram of an illustrative embodiment of a storage system 2 .
  • the storage system 2 includes one or more processors 130 and a memory 124 coupled to an interconnect bus 125 .
  • the interconnect bus 125 shown in FIG. 2 is an abstraction that represents any one or more separate physical interconnect buses, point-to-point connections, or both, connected by appropriate bridges, adapters, and/or controllers.
  • the interconnect bus 125 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus or PCI-Express bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus, also called “Firewire.”
  • PCI Peripheral Component Interconnect
  • ISA industry standard architecture
  • SCSI small computer system interface
  • USB universal serial bus
  • I2C IIC
  • IEEE Institute of Electrical and Electronics Engineers
  • the processor(s) 130 is/are the central processing unit(s) (CPU) of the storage systems 2 and, therefore, control the overall operation of the storage systems 2 . In certain embodiments, the processor(s) 130 accomplish this by executing software or firmware stored in the memory 124 .
  • the processor(s) 130 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), trusted platform modules (TPMs), or the like, or a combination of such devices.
  • the memory 124 is or includes the main memory of the storage systems 2 .
  • the memory 124 represents any form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or any combination of such devices.
  • a network adapter 126 Also connected to the processor(s) 130 through the interconnect bus 125 is a network adapter 126 and a storage adapter 128 .
  • the network adapter 126 provides the storage systems 2 with the ability to communicate with remote devices, such as the hosts 1 , over the interconnect 3 of FIG. 1 , and may be, for example, an Ethernet adapter or Fibre Channel adapter.
  • the storage adapter 128 allows the storage systems 2 to access storage subsystems 4 A or 4 B, and may be, for example, a Fibre Channel adapter or SCSI adapter.
  • FIG. 3 is a block diagram of a storage operating system according to an illustrative embodiment.
  • storage operating system generally refers to the computer-executable code operable on a computer to perform a storage function that manages data access and other related functions.
  • Storage operating system 7 can be implemented as a microkernel, an application program operating over a general-purpose operating system such as UNIX® or Windows NT®, or as a general-purpose operating system configured for the storage applications as described herein.
  • the storage operating system includes a network protocol stack 310 having a series of software layers including a network driver layer 350 (e.g., an Ethernet driver), a network protocol layer 360 (e.g., an Internet Protocol layer and its supporting transport mechanisms: the TCP layer and the User Datagram Protocol layer), and a file system protocol server layer 370 (e.g., a CIFS server, a NFS server, etc.).
  • the storage operating system 7 includes a storage access layer 320 that implements a storage media protocol such as a RAID protocol, and a media driver layer 330 that implements a storage media access protocol such as, for example, a Small Computer Systems Interface (SCSI) protocol.
  • SCSI Small Computer Systems Interface
  • the storage access layer 320 may alternatively be implemented as a parity protection RAID module and embodied as a separate hardware component such as a RAID controller.
  • Bridging the storage media software layers with the network and file system protocol layers is the storage manager 10 that implements one or more file system(s) 340 .
  • a file system is a structured (e.g., hierarchical) set of stored files, directories, and/or other data containers.
  • the storage manager 10 implements data layout algorithms that improve read and write performance to the mass storage devices 5 , such as WAFL systems discussed above.
  • data is stored in the form of volumes, where each volume contains one or more directories, subdirectories, and/or files.
  • the term “aggregate” is used to refer to a pool of physical storage, that combines one or more physical mass storage devices (e.g., disks) or parts thereof, into a single storage object. An aggregate also contains or provides storage for one or more other data sets at a higher level of abstraction, such as volumes.
  • a “volume” is a set of stored data associated with a collection of mass storage devices, such as disks, which obtains its storage from (i.e., is contained within) an aggregate, and which is managed as an independent administrative unit, such as a complete file system.
  • a volume includes one or more file systems, such as an active file system and, optionally, one or more persistent point-in-time images of the active file system captured at various instances in time.
  • a “file system” is an independently managed, self-contained, organized structure of data units (e.g., files, blocks, or LUNs). Although a volume or file system (as those terms are used herein) may store data in the form of files, that is not necessarily the case. That is, a volume or file system may store data in the form of other units of data, such as blocks or LUNs.
  • each aggregate uses a physical volume block number (PVBN) space that defines the physical storage space of blocks provided by the storage devices of the physical volume, and likewise, each volume uses a virtual volume block number (VVBN) space to organize those blocks into one or more higher-level objects, such as directories, subdirectories, and files.
  • PVBN physical volume block number
  • VVBN virtual volume block number
  • a PVBN therefore, is an address of a physical block in the aggregate
  • a VVBN is an address of a block in a volume (the same block as referenced by the corresponding PVBN), i.e., the offset of the block within the volume.
  • the storage manager 10 tracks information for all of the VVBNs and PVBNs in each storage system 2 .
  • the storage manager 10 may manage multiple volumes on a common set of physical storage in the aggregate.
  • data within the storage system is managed at a logical block level.
  • the storage manager maintains a logical block number (LBN) for each data block.
  • LBN logical block number
  • the LBNs are called file block numbers (FBNs).
  • Each FBN indicates the logical position of the block within a file, relative to other blocks in the file, i.e., the offset of the block within the file.
  • FBN 0 represents the first logical block in a particular file
  • FBN 1 represents the second logical block in the file, and so forth.
  • the PVBN and VVBN of a data block are independent of the FBN(s) that refer to that block.
  • the FBN of a block of data at the logical block level is assigned to a PVBN-VVBN pair.
  • each file is represented in the storage system in the form of a hierarchical structure called a buffer tree.
  • buffer tree is defined as a hierarchical metadata structure containing references (or pointers) to logical blocks of data in the file system.
  • a buffer tree is a hierarchical structure which is used to store file data as well as metadata about a file, including pointers for use in locating the data blocks for the file.
  • a buffer tree includes one or more levels of indirect blocks (called “L1 blocks”, “L2 blocks”, etc.), each of which contains one or more pointers to lower-level indirect blocks and/or to the direct blocks (called “L0 blocks”) of the file. All of the data in the file is stored only at the lowest level (L0) blocks.
  • the root of a buffer tree is the “inode” of the file.
  • An inode is a metadata container that is used to store metadata about the file, such as ownership, access permissions, file size, file type, and pointers to the highest level of indirect blocks for the file.
  • Each file has its own inode.
  • the inode is stored in a separate inode file, which may itself be structured as a buffer tree.
  • This essentially results in buffer trees within buffer trees, where subdirectories are nested within higher-level directories and entries of the directories point to files, which also have their own buffer trees of indirect and direct blocks.
  • Directory entries include the name of a file in the file system, and directories are said to point to (reference) that file. Alternatively, a directory entry can point to another directory in the file system. In such a case, the directory with the entry is said to be the “parent directory,” while the directory that is referenced by the directory entry is said to be the “child directory” or “subdirectory.”
  • FIG. 4 depicts a buffer tree 400 of a file according to an illustrative embodiment.
  • a file is assigned an inode 422 , which references Level 1 (L1) indirect blocks 424 A and 424 B.
  • Each indirect block 424 stores at least one PVBN and a corresponding VVBN for each PVBN.
  • a PVBN is a block number in an aggregate (i.e., offset from the beginning of the storage locations in an aggregate)
  • a VVBN is a block number in a volume (offset from the beginning of the storage locations in a volume); however, there is only one copy of the L0 data block physically stored in the physical mass storage of the storage system.
  • each indirect block 424 references a physical block 427 A and 427 B, respectively, in the storage device (i.e., in the aggregate L0 blocks 433 ), and the corresponding VVBN references a logical block 428 A and 428 B, respectively, in the storage device (i.e., in volume L0 blocks 431 ).
  • volumes can also be represented by files called “container files.” In such a case, the VVBN references a block number offset from the beginning of the container file representing the volume.
  • Physical blocks 427 and logical blocks 428 are actually the same L0 data for any particular PVBN-VVBN pair; however, they are accessed in different ways: the PVBN is accessed directly in the aggregate, while the VVBN is accessed virtually via the container file representing the volume.
  • FIG. 5 depicts a buffer tree 500 including an inode file 541 according to an illustrative embodiment.
  • the inodes of the files and directories in that volume are stored in the inode file 541 .
  • a separate inode file 541 is maintained for each volume.
  • the inode file 541 in one embodiment, is a data structure representing a master list of file system objects (e.g., directories, subdirectories and files) of the file system in the storage system and each inode entry identifies a particular file system object within the file system.
  • Each inode 422 in the inode file 541 is the root of a buffer tree 400 of the file corresponding to the inode 422 .
  • the location of the inode file 541 for each volume is stored in a volume information (“VolumeInfo”) block 542 associated with that volume.
  • the VolumeInfo block 542 is a metadata container that contains metadata that applies to the volume as a whole. Examples of such metadata include, for example, the volume's name, its type, its size, any space guarantees to apply to the volume, and the VVBN of the inode file of the volume.
  • File system objects can be, for example, files, directories, subdirectories, and/or LUNs of the file system.
  • File system object inodes are arranged sequentially in the inode file, and a file system object's position in the inode file is given by its inode number.
  • An inode includes a master location catalog for the file, directory, or other file system object and various bits of information about the file system object called metadata.
  • the metadata includes, for example, the file system object's creation date, security information such as the file system object's owner and/or protection levels, and its size.
  • the metadata also includes a “type” designation to identify the type of the file system object.
  • Directory inodes include a directory entry for each file system object contained in the directory (referred to as “child” objects). Each directory entry then includes the name of the child file system object the directory entry references and the object's inode and generation numbers. In addition to inodes associated with file system objects, the file system may also maintain “unused” inodes for each inode number that is not associated with a file system object.
  • the metadata also includes the “generation number” of the file system object. As time goes by, file system objects are created or deleted, and slots in the inode file are recycled. When a file system object is created, its inode is given a new generation number, which is guaranteed to be different from (e.g., larger than) the previous file system object at that inode number (if any). If repeated accesses are made to the file system object by its inode number (e.g., from clients, applications, etc.), the generation number can be checked to avoid inadvertently accessing a different file system object after the original file system object was deleted.
  • the metadata also includes “parent information,” which includes the inode number of the file system object's parent directory. A file system object can have multiple parent directories.
  • the data set is a file system of the storage system, and replication is performed using snapshots.
  • a “snapshot” is a persistent image (usually read-only) of the file system at a point in time and can be generated by the snapshot differential module 12 .
  • the snapshot differential module 12 At a point in time, the snapshot differential module 12 generates a first snapshot of the file system of the source storage system, referred to as the baseline snapshot. This baseline snapshot is then provided to the source replication engine 8 A for a baseline replication process.
  • the system executes the baseline replication process by generating a set of replication operations corresponding to the file system objects in the baseline snapshot.
  • the replication operation will be executed on the destination storage system 2 B to replicate the initial state of the storage system.
  • the system may generate one or more replication operations for each file system object on the source storage system 2 A.
  • the replication operations may be sent in any arbitrary order and are not restricted to chronological order or the file system hierarchy.
  • the snapshot differential module 12 generates additional snapshots of the file system from time to time.
  • the source replication engine 8 A executes another replication process (which may be at the request of the destination replication engine 8 B). To do so, the source replication engine 8 A needs to be updated with the changes to the file system of the source storage system since a previous replication process was performed.
  • the snapshot differential module 12 compares the most recent snapshot of the file system of the source storage system to the snapshot of a previous replication process to determine differences between a recent snapshot and the previous snapshot.
  • the snapshot differential module 12 identifies any data that has been added or modified since the previous snapshot operation, and sends those additions or modifications to the source replication engine 8 A for replication.
  • the source replication engine 8 A then generates replication operations for each of the additions or modifications.
  • the replication operations are transmitted to the destination replication engine 8 B for execution on the destination storage system 2 B. As with the baseline replication process, the replication operations may be sent in any arbitrary order.
  • a replication process transfers information about a set of replication operations from a source file system to the replica destination file system.
  • a replication operation includes data operations, directory operations, and inode operations.
  • a “data operation” transfers 1) a block of file data, 2) the inode number of the block of data, 3) the generation number of the file, 4) the position of the block within the file (e.g., FBN), and 5) the type of the file.
  • a “directory operation” transfers 1) the inode number of the directory, 2) the generation number of the directory, and 3) enough information to reconstitute an entry in that directory, including 1) the name, 2) inode number, and 3) generation number of the file system object the directory entry points to.
  • an “inode operation” transfers 1) the metadata of an inode, 2) its inode number, and 3) the generation of the inode.
  • the source storage system sends a sequence of data operations, directory operations, and inode operations to the destination, which is expected to process the operations and send acknowledgments to the source.
  • the inode number (or numbers) in each replication operation is referred to as the “target inode number.”
  • a “destination inode” is an inode on the destination storage system having the same inode number as the target inode number in a received replication operation.
  • a replication of a file system may be either an “initialization,” in which the destination file system starts from scratch with no files or directories, or an “update,” in which the destination file system already has some files and directories from an earlier replication process of an earlier version of the source.
  • the source file system does not need to send every file and directory to the destination; rather, it sends only the changes that have taken place since the earlier version was replicated.
  • an inode operation may be used to indicate that a file has been deleted, and also possibly that another file has been created at the same inode number.
  • Inode operations have various types, including delete (where the file system object associated with the inode number is deleted), create (where a new file system object is created at the target inode number), and modify (where the contents or metadata of the file system object are modified). Similarly, in an initialization, the system sends create and modify operations to build the file and directory structure.
  • the destination storage system may receive the replication operations in an arbitrary order. This simplifies processing for the source replication engine 8 A by allowing it to send replication operations as they are created, rather than imposing additional timing requirements.
  • the arbitrary order results in the destination replication engine 8 B receiving replication operations that are inconsistent with the existing file system on the destination storage system. This may result when the source storage system deleted a file system object (freeing its inode) and created a new file system object having the same inode number. If the destination replication engine 8 B receives the create operation before the delete operation, it determines that an inconsistency exists because the create operation is directed to an inode number that is already in use.
  • An inconsistency may also result if the source storage system created a new file system object at an unused inode and later modified the inode. If the operations are received out of order, the destination replication engine 8 B determines that an inconsistency exists because the modify operation is directed to an unused inode. This also occurs when the system receives a replication operation directed to a first inode type (e.g., a directory) while the target inode is a second inode type (e.g., a file).
  • a first inode type e.g., a directory
  • the target inode is a second inode type (e.g., a file).
  • One possible solution would require the destination replication engine 8 B to store the inconsistent operations until the corresponding delete operation is received. However, this would be inefficient and would defeat the purpose of providing the replication operations in an arbitrary order.
  • FIG. 6 illustrates a logical block diagram of the hijack system 600 .
  • the system 600 can be implemented by the destination replication engine 8 B ( FIG. 1 ) executing on the destination storage system or by other hardware that has access to the file system of the destination storage subsystem 4 B. Aspects of the system may be implemented as special purpose hardware circuitry, programmable circuitry, or a combination of these.
  • the system 600 includes a number of modules to facilitate the functions of the system.
  • the various modules are described as residing in a single system, the modules are not necessarily physically co-located. In some embodiments, the various modules could be distributed over multiple physical devices, and the functionality implemented by the modules may be provided by calls to remote services. Similarly, the data structures could be stored in local storage or remote storage and distributed in one or more physical devices. Assuming a programmable implementation, the code to support the functionality of this system may be stored on a computer-readable medium such as an optical drive, flash memory, or a hard drive. One skilled in the art will appreciate that at least some of these individual components and subcomponents may be implemented using ASICs, PLDs, or a general-purpose processor configured with software and/or firmware.
  • the system 600 includes a network interface 604 , which is configured to receive replication operations from the source storage system 2 A.
  • the network interface 604 may be implemented using the network adapter 126 ( FIG. 2 ).
  • the system 600 also includes a storage interface 606 , which is configured to communicate with a destination storage subsystem 4 B to execute the replication operations and which can be the storage adapter 128 in FIG. 2 .
  • the system 600 has a processing component 602 , which processes received replication operations and controls the destination storage subsystem based on the operations.
  • the processing component 602 could be implemented by the processor 130 of FIG. 2 .
  • each replication operation includes an inode number of a file system object that is created, modified, or deleted by the operation.
  • the replication operation may also include inode numbers of one or more parent inodes of the file system object to be created.
  • the processing component 602 includes a lookup component 610 , which is configured to determine one or more destination inodes on the destination storage system corresponding to the target inode numbers in the replication operation.
  • the lookup component 610 determines the target inode numbers based on the replication operation and accesses the file system to retrieve information stored in the corresponding destination inodes.
  • This information includes file system object metadata, such as type, generation, creation date, modification date, etc.
  • the processing component 602 also includes an evaluation component 612 , which is configured to detect an inconsistency between the replication operation and the destination inode. Based on the detected inconsistency, the evaluation component 612 determines that a replacement operation has occurred. As discussed above, an inconsistency exists when the system receives a replication operation that cannot properly be executed on the target inode. Inconsistencies may be detected for various reasons. Examples of inconsistencies include:
  • the processing component 602 includes a hijack component 614 , which is configured to hijack the destination inode based on the information in the replication operation.
  • a hijack process the system replaces metadata in the destination inode based on the metadata in the replication operation.
  • the hijack operation often implicitly supersedes a delete operation that will arrive at some later point in the replication process.
  • the source storage system 2 A may elect not to send the delete operation. Advantages of this include saving processing on the source storage system 2 A and saving network bandwidth on the interconnect 6 .
  • the hijack component 614 frees any data blocks associated with the destination inode (if the destination inode is a file). In one embodiment, the hijack component 614 frees data blocks by modifying the destination inode to replace references to the data blocks with null references. The hijack component 614 may also direct the storage manager 10 B to allow the data blocks to be written to. The hijack component 614 then replaces the file system metadata in the destination inode with metadata received in the replication operation. The hijack component 614 may also delete metadata that cannot be replaced based on the information in the received replication operation and/or replace the metadata with default values. The system can fill in the metadata at a later time when additional replication operations are received.
  • a create operation This may occur, for example, during a baseline replication when the source replication engine 8 A generates two separate operations directed at a specific inode number: a create operation and a modify operation. If the modify operation is received first, the system determines that an inconsistency exists because the modify operation is directed to an unused inode. However, the system may be unable to completely fill in the metadata associated with the new inode. In this situation, the hijack component 614 hijacks the destination inode and fills in the data included in the modify operation while erasing remaining data from the prior inode. The system 600 can then replace the remainder of the metadata when the create operation is received.
  • the processing component 602 also includes a file system control component 616 , which is configured to execute various file system cleanup operations after the hijack process has been executed.
  • the file system control component 616 is configured to invalidate any file handles that are currently pointing to the destination inode after it is hijacked. This is done because the hosts 1 frequently cache file handles pointing to a particular inode to avoid having to make repeated requests to the file system for a file handle.
  • the file system control component 616 invalidates these file handles to avoid generating file system errors when the host 1 attempts to use the handle.
  • FIG. 7 is a flow chart of a process 700 for executing the inode hijack system, which may be executed by the system 600 .
  • the process 700 operates to detect inconsistencies in replication operations received through the network interface 604 and to hijack a destination inode where necessary.
  • Processing begins in step 704 , where the system receives a replication operation.
  • replication operations are received in an arbitrary order that is not restricted by chronological order or file system hierarchy.
  • FIG. 8 is an example structure of a replication operation. As shown in FIG.
  • the replication operation data structure includes information defining the replication operation, including the operation type 802 , target inode number 804 , target inode generation 806 , and metadata associated with the operation 808 (e.g., create/modify time, inode type, and parent information).
  • the replication operation data structure may also include other fields as needed to support the replication process.
  • step 706 the system determines one or more destination inodes corresponding to the replication operation. As described above, this includes looking up inodes based on the target inode number(s). In some embodiments, the destination inodes include unused inodes corresponding to the target inode number(s). The system then provides the destination inodes to the remaining components in the system for processing.
  • Processing then proceeds to step 708 , where it attempts to detect an inconsistency between the replication operation and the destination inode(s).
  • the system may determine that an inconsistency exists based on the type of operation (e.g., a create operation directed to an existing inode, a modify operation directed to an unused inode, etc.).
  • the system can also detect inconsistencies by comparing information in the destination inode(s) (e.g., inode generation, inode type, etc.) to information from the replication operation, where an inconsistency is identified if the information does not match.
  • the system then proceeds to decision step 710 , where it determines whether the replication operation is a replacement operation. A replacement operation is identified when the system has detected an inconsistency between the replication operation and the destination inode. If the system determines that the replication operation is not a replacement operation, the process 700 ends and the replication operation is executed.
  • step 712 the system executes the hijack operation.
  • the system hijacks the destination inode by replacing existing data with data from the replication operation.
  • the system first proceeds to step 714 , where it frees any data blocks associated with the destination inode if the destination inode is a file.
  • the system frees data blocks by modifying the metadata in the destination inode to replace references to each data block with a null or default reference.
  • the system may also notify the storage manager 10 that the data blocks can be reused.
  • the processing of step 714 may be executed synchronously or asynchronously.
  • the process 700 pauses execution until every data block associated with the destination inode has been freed by the file system.
  • the system does not have to wait for the file system to free each block.
  • the system copies the data block references stored in the destination inode's buffer tree to a temporary file.
  • the system then directs the file system to free the data associated with the temporary file as a background process.
  • the hijack operation is directed to a directory inode, the system may also delete the contents of the directory.
  • the system frees all data blocks within the directory.
  • the system may delete the inodes for all file system objects within the directory.
  • the source storage system 2 A can then omit delete operations directed to the file system objects within the directory, which reduces processing at the source storage system 2 A and bandwidth use on the interconnect 6 .
  • Step 716 the system replaces the metadata in the destination inode with metadata from the replication operation.
  • the replication operation is a file create or file modify operation, this includes associating a new set of data blocks with the inode.
  • the system may also erase any metadata associated with the original inode that is not directly replaced. As discussed above, in some cases the metadata needed to fully populate the target inode is contained in multiple replication operations. In order to avoid any inconsistency within the inode, the system erases the contents of the previous inode and/or replaces the contents with default values rather than having metadata from two distinct versions of the inode reside simultaneously in the same inode.
  • step 718 increments the generation number of the destination inode.
  • the generation number allows hosts 1 to determine whether a file handle is pointing to the same inode that it originally referenced. If the generation numbers differ, the host knows that the handle is no longer valid (i.e., that the file system object previously referenced by the handle no longer exists). However, this step may be skipped in some cases where it is not necessary to change the generation number of the target inode. After incrementing the generation number, the process 700 ends.
  • the system receives multiple replication operations to fully populate the metadata for a particular inode. This may occur, for example, when the system receives a modify operation before it receives a create operation for the same inode (e.g., during a baseline replication). In these cases, the system replaces the inode's metadata where possible and replaces the remaining data with default values or erases the data. At a later time, the system receives a second replication operation that provides the remainder of the file system metadata. After determining that the second replication operation includes the remaining metadata, the system replaces the remaining data in the inode with data from the replication operation. For this type of operation, there is no hijack—i.e., the system does not free data blocks or change the generation number or type of the inode.

Abstract

A system and method for hijacking inodes based on replication operations received in an arbitrary order is used to assist a data replication operation from a source storage system to a destination storage system. The source storage system generates a set of replication operations as part of a replication process and transmits the replication operations in an arbitrary order. After receiving a replication operation, the system determines whether the operation is inconsistent with a corresponding destination inode. If an inconsistency exists, the system hijacks the destination inode by replacing the destination inode's metadata with data determined based on the replication operation. The system may also delete metadata from the inode and/or initialize metadata to default values if the metadata was not replaced based on the replication operation. The system then waits for a second replication operation that contains the remaining metadata and replaces the metadata based on the second replication operation. In addition, data blocks associated with the previous version of the inode are freed.

Description

PRIORITY CLAIM
This application is a continuation of U.S. patent application Ser. No. 12/559,483, entitled “SYSTEM AND METHOD FOR HIJACKING INODES BASED ON REPLICATION OPERATIONS RECEIVED IN AN ARBITRARY ORDER”, which was filed on Sep. 14, 2009, and which is incorporated by reference herein in its entirety.
BACKGROUND
A network storage system is a processing system that is used to store and retrieve data on behalf of one or more hosts on a network. A storage system operates on behalf of one or more hosts to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. Some storage systems are designed to service file-level requests from hosts, as is commonly the case with file servers used in a network attached storage (NAS) environment. Other storage systems are designed to service block-level requests from hosts, as with storage systems used in a storage area network (SAN) environment. Still other storage systems are capable of servicing both file-level requests and block-level requests, as is the case with certain storage servers made by NetApp, Inc. of Sunnyvale, Calif.
One common use of storage systems is data replication. Data replication is a technique for backing up data in which a given data set at a source is replicated at a destination that is often geographically remote from the source. The replica data set created at the destination storage system is called a “mirror” of the original data set on the source storage system. Typically replication involves the use of at least two storage systems, e.g., one at the source and another at the destination, which communicate with each other through a computer network or other type of data interconnect.
Each data block in a given set of data, such as a file in a storage system, can be represented by both a physical block, pointed to by a corresponding physical block pointer, and a logical block, pointed to by a corresponding logical block pointer. These two blocks are actually the same data block. However, the physical block pointer indicates the actual physical location of the data block on a storage medium, whereas the logical block pointer indicates the logical position of the data block within the data set (e.g., a file) relative to other data blocks.
In some replication systems, replication is done at a logical block level. In these systems, the replica at the destination storage system has the identical structure of logical block pointers as the original data set at the source storage system, but it may (and typically does) have a different structure of physical block pointers than the original data set at the source storage system. To execute a logical replication, the file system of the source storage system is analyzed to determine changes that have occurred to the file system. The changes are transferred to the destination storage system. This typically includes “walking” the directory trees at the source storage system to determine the changes to various file system objects within each directory tree, as well as identifying the changed file system object's location within the directory tree structure. The changes are then sent to the destination storage system in a certain order (e.g., directories before subdirectories, and subdirectories before files, etc.) so that the directory tree structure of the source storage system is preserved at the destination storage system. Updates to directories of the source file system are received and processed at the destination storage system before updates to the files in each of the directories can be received and processed.
A number of problems exist if the changes are received at the destination storage system in an order that is not consistent with the file system hierarchy. For example, if updates to data in files are received before the updates to the directories that contain the files, then the files are essentially “orphaned” because the destination storage system does not know which directory should be used to store the updates. That is, updates to the data in the file cannot be processed correctly before the directory referencing the file exists on the destination storage system. Similarly, if a file is deleted on the source storage system and a new directory is created at the same file system address, the replication system will send one message indicating the delete operation and another message indicating the create operation. If the messages are received out of order, the destination system will be directed to create a directory at a file system location that already contains a file, resulting in an error condition. In another case, if a file is created at an unused file system address and then modified, the replication system will send a first message indicating the create operation and a second operation indicating the modify operation. If the messages are received out of order, the destination system will be directed to modify a file at an unused file system location, resulting in an error condition.
SUMMARY
The present disclosure is directed to an apparatus and method for hijacking inodes based on file system replication operations (hereinafter referred to as “replication operations”) received in an arbitrary order. The replication operations may be received at a destination storage system from a source storage system as part of a replication process. The order in which the replication operations are received is referred to as “arbitrary” because the order is not restricted by chronological order, file system hierarchy, or any other ordering requirement. After receiving a replication operation, the system determines an inode (i.e., a metadata container) on the destination storage system that the replication operation is intended to modify or replace (referred to as the “destination inode”). The system then looks for an inconsistency between the replication operation and the destination inode based on the type of the operation or by comparing the destination inode's metadata to the data in the replication operation. If an inconsistency is detected, the system determines that the replication operation is a replacement operation. As used herein, a “replacement operation” is a type of replication operation that is received in a chronologically different order from the order the corresponding change occurred on the source storage system and must be handled specially. In response to detecting the replacement operation, the system “hijacks” the destination inode; i.e., in response to the inconsistency, it replaces at least a part of the inode's metadata contents based on the replication operation. In some cases, the replication operation does not include enough information to fully populate the destination inode's metadata. In these cases, the system deletes metadata that was not replaced and/or initializes the metadata to default values and waits for a second replication operation that contains the remaining metadata. The system also frees any data blocks associated with the previous version of the inode. Freeing data blocks means removing references to the data blocks in the destination inode and may also include making the data blocks available to be written to.
By detecting inconsistencies and hijacking the destination inode where appropriate, the hijack system enables the replication process to function without requiring replication operations to be sent in a particular order. Thus, the hijack system avoids the problems discussed above, which occur when the replication system is required to transmit changes based on the file system hierarchy. According to the system introduced here, inconsistent operations are detected before they are applied to the file system of the destination storage system. The system then modifies the destination inode in place, without having to wait for a delete (or create) operation to be provided. Thus, the system avoids the need for the destination storage system to buffer replication operations to wait for other related operations to arrive. As a result, when the delete operation is later received, the system can ignore the operation, reducing the number of operations that the destination storage system has to execute during a replication. Alternatively, the source storage system may omit transmitting the delete operations entirely. This reduces processing on the source storage system and network bandwidth on the interconnect between the storage systems. The hijack system can also partially initialize an inode based on a first out-of-order operation and complete the initialization when a later replication operation is received, such as when create and modify operations for a particular inode are received out of order.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a network environment in which multiple network storage systems cooperate.
FIG. 2 is an example of the hardware architecture of a storage system.
FIG. 3 is a block diagram of a storage operating system.
FIG. 4 depicts a buffer tree of a file.
FIG. 5 depicts a buffer tree including an inode file.
FIG. 6 is a logical block diagram of an inode hijack system.
FIG. 7 is a flow chart of a process for executing the inode hijack system.
FIG. 8 is an example structure of a replication operation.
DETAILED DESCRIPTION
A system and method for hijacking inodes based on replication operations received in an arbitrary order is disclosed (hereinafter referred to as “the hijack system” or “the system”). The system may be used to assist a data replication process from a source storage system to a destination storage system. The source storage system determines a set of changes made between two points in time and transmits replication operations based on the changes in an arbitrary order. Each replication operation specifies type of operation (e.g., create, modify, delete) and related information, including a target inode number for the operation. The target inode number identifies the inode of the logical data container (e.g., file, directory, or logical unit number (LUN)) that is the target of the replication operation.
When the system receives a replication operation, it looks up a destination inode corresponding to the target inode number in the replication operation. The system then determines whether the replication operation is inconsistent with the destination inode. This may be determined based on the type of operation or by comparing data in the replication operation to the destination inode's metadata. For example, an inconsistency exists if the replication operation is directed to a first inode type while the target inode has a second inode type. Similarly, an inconsistency exists if the replication operation is a modify operation that is directed to an unused inode (i.e., an inode that is not associated with a file system object). An inconsistency also exists if the replication operation specifies an inode generation number (defined below) that differs from the destination inode's generation number.
If an inconsistency exists, the system determines that the replication operation is a replacement operation. In response to determining that the replication operation is a replacement operation, the system hijacks the destination inode by replacing the destination inode's metadata with data determined based on the replication operation. As a part of this process, the system frees data blocks previously associated with the destination inode and replaces the metadata. The system may also change the generation number and/or type of the inode. In some cases, the replication operation does not include enough information to fully populate the destination inode's metadata. In these cases, the system deletes metadata that was not replaced and waits for a second replication operation that contains the remaining metadata. The system may also initialize some or all of the deleted metadata to default values (e.g., zero or null values).
FIG. 1 depicts a configuration of network storage systems in which the techniques being introduced here can be implemented according to an illustrative embodiment. In FIG. 1, a source storage system 2A is coupled to a source storage subsystem 4A and to a set of hosts 1 through an interconnect 3. The interconnect 3 may be, for example, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a global area network such as the Internet, a Fibre Channel fabric, or any combination of such interconnects. Each of the hosts 1 may be, for example, a conventional personal computer (PC), server-class computer, workstation, handheld computing/communication device, or other computing/communications device.
In one embodiment, the source storage system 2A includes a storage operating system 7A, a storage manager 10A, a snapshot differential module 12, and a source replication engine 8A. Each of the storage operating system 7A, the storage manager 10A, the snapshot differential module 12, and the source replication engine 8A are computer hardware components of the storage system, which can be implemented as special purpose hardware circuitry (e.g., “hardwired”), general purpose hardware circuitry that is programmed with software and/or firmware, or any combination thereof. Storage of data in the source storage subsystem 4A is managed by the storage manager 10A of the source storage system 2A. The source storage system 2A and the source storage subsystem 4A are collectively referred to as a source storage system. The storage manager 10A receives and responds to various read and write requests from the hosts 1, directed to data stored in or to be stored in the source storage subsystem 4A. The storage manager 10A may be implemented as a part of the storage operating system 7A or as a separate component, as shown in FIG. 1. The source storage subsystem 4A includes a number of nonvolatile mass storage devices 5, which can be, for example, magnetic disks, optical disks, tape drives, solid-state memory, such as flash memory, or any combination of such devices. The mass storage devices 5 in the source storage subsystem 4A can be organized as a RAID group, in which case the source storage system 2A can access the source storage subsystem 4A using a conventional RAID algorithm for redundancy.
The storage manager 10A processes write requests from the hosts 1 and stores data to unused storage locations in the mass storage devices 5 of the source storage subsystem 4A. In one embodiment, the storage manager 10A implements as a “write anywhere” file system such as the proprietary Write Anywhere File Layout (WAFL™) file system developed by Network Appliance, Inc., Sunnyvale, Calif. Such a file system is not constrained to write any particular data or metadata to a particular storage location or region. Rather, such a file system can write to any unallocated block on any available mass storage device and does not overwrite data on the devices. If a data block on disk is updated or modified with new data, the data block is thereafter stored (written) to a new location on disk instead of modifying the block in place to optimize write performance.
The storage manager 10A of the source storage system 2A is responsible for managing storage of data in the source storage subsystem 4A, servicing requests from the hosts 1, and performing various other types of storage-related operations. In one embodiment, the storage manager 10A, the source replication engine 8A, and the snapshot differential module 12 are logically on top of the storage operating system 7A. The source replication engine 8A operates in cooperation with a remote destination replication engine 8B, described below, to perform logical replication of data stored in the source storage subsystem 4A. Note that in other embodiments, one or more of the storage manager 10A, the source replication engine 8A and the snapshot differential module 12 may be implemented as elements within the storage operating system 7A.
The source storage system 2A is connected to a destination storage system 2B through an interconnect 6, for purposes of replicating data. Although illustrated as a direct connection, the interconnect 6 may include one or more intervening devices and/or may include one or more networks. In the illustrated embodiment, the destination storage system 2B includes a storage operating system 7B, the destination replication engine 8B and a storage manager 10B. The storage manager 10B controls storage-related operations on the destination storage system 2B. In one embodiment, the storage manager 10B and the destination replication engine 8B are logically on top of the storage operating system 7B. In other embodiments, the storage manager 10B and the destination replication engine 8B may be implemented as elements within the storage operating system 7B. The destination storage system 2B and the destination storage subsystem 4B are collectively referred to as the destination storage system.
The destination replication engine 8B works in cooperation with the source replication engine 8A to replicate data from the source storage system to the destination storage system. In certain embodiments, the storage operating systems 7A and 7B, replication engines 8A and 8B, storage managers 10A and 10B, and snapshot differential module 12 are all implemented in the form of software. In other embodiments, however, any one or more of these elements may be implemented in hardware alone (e.g., specially designed dedicated circuitry), firmware, or any combination of hardware, software and firmware.
The storage systems 2A and 2B each may be, for example, a storage system that provides file-level data access services to the hosts 1, such as commonly done in a NAS environment, or block-level data access services, such as commonly done in a SAN environment, or each may be capable of providing both file-level and block-level data access services to the hosts 1. Further, although the storage systems 2 are illustrated as monolithic systems in FIG. 1, they can have a distributed architecture. For example, the storage systems 2 each can be designed as physically separate network modules (e.g., “N-module”) and data modules (e.g., “D-module”) (not shown), which communicate with each other over a physical interconnect. Such an architecture allows convenient scaling, such as by deploying two or more N-modules and D-modules, all capable of communicating with each other over the interconnect.
FIG. 2 is a high-level block diagram of an illustrative embodiment of a storage system 2. The storage system 2 includes one or more processors 130 and a memory 124 coupled to an interconnect bus 125. The interconnect bus 125 shown in FIG. 2 is an abstraction that represents any one or more separate physical interconnect buses, point-to-point connections, or both, connected by appropriate bridges, adapters, and/or controllers. The interconnect bus 125, therefore, may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus or PCI-Express bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus, also called “Firewire.”
The processor(s) 130 is/are the central processing unit(s) (CPU) of the storage systems 2 and, therefore, control the overall operation of the storage systems 2. In certain embodiments, the processor(s) 130 accomplish this by executing software or firmware stored in the memory 124. The processor(s) 130 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), trusted platform modules (TPMs), or the like, or a combination of such devices. The memory 124 is or includes the main memory of the storage systems 2.
The memory 124 represents any form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or any combination of such devices. Also connected to the processor(s) 130 through the interconnect bus 125 is a network adapter 126 and a storage adapter 128. The network adapter 126 provides the storage systems 2 with the ability to communicate with remote devices, such as the hosts 1, over the interconnect 3 of FIG. 1, and may be, for example, an Ethernet adapter or Fibre Channel adapter. The storage adapter 128 allows the storage systems 2 to access storage subsystems 4A or 4B, and may be, for example, a Fibre Channel adapter or SCSI adapter.
FIG. 3 is a block diagram of a storage operating system according to an illustrative embodiment. As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a computer to perform a storage function that manages data access and other related functions. Storage operating system 7 can be implemented as a microkernel, an application program operating over a general-purpose operating system such as UNIX® or Windows NT®, or as a general-purpose operating system configured for the storage applications as described herein. In the illustrated embodiment, the storage operating system includes a network protocol stack 310 having a series of software layers including a network driver layer 350 (e.g., an Ethernet driver), a network protocol layer 360 (e.g., an Internet Protocol layer and its supporting transport mechanisms: the TCP layer and the User Datagram Protocol layer), and a file system protocol server layer 370 (e.g., a CIFS server, a NFS server, etc.). In addition, the storage operating system 7 includes a storage access layer 320 that implements a storage media protocol such as a RAID protocol, and a media driver layer 330 that implements a storage media access protocol such as, for example, a Small Computer Systems Interface (SCSI) protocol. Any and all of the modules of FIG. 3 can be implemented as a separate hardware component. For example, the storage access layer 320 may alternatively be implemented as a parity protection RAID module and embodied as a separate hardware component such as a RAID controller. Bridging the storage media software layers with the network and file system protocol layers is the storage manager 10 that implements one or more file system(s) 340. For the purposes of this disclosure, a file system is a structured (e.g., hierarchical) set of stored files, directories, and/or other data containers. In one embodiment, the storage manager 10 implements data layout algorithms that improve read and write performance to the mass storage devices 5, such as WAFL systems discussed above.
It is useful now to consider how data can be structured and organized by storage systems 2A and 2B in certain embodiments. Reference is now made to FIGS. 4 and 5 in this regard. In at least one embodiment, data is stored in the form of volumes, where each volume contains one or more directories, subdirectories, and/or files. The term “aggregate” is used to refer to a pool of physical storage, that combines one or more physical mass storage devices (e.g., disks) or parts thereof, into a single storage object. An aggregate also contains or provides storage for one or more other data sets at a higher level of abstraction, such as volumes. A “volume” is a set of stored data associated with a collection of mass storage devices, such as disks, which obtains its storage from (i.e., is contained within) an aggregate, and which is managed as an independent administrative unit, such as a complete file system. A volume includes one or more file systems, such as an active file system and, optionally, one or more persistent point-in-time images of the active file system captured at various instances in time. A “file system” is an independently managed, self-contained, organized structure of data units (e.g., files, blocks, or LUNs). Although a volume or file system (as those terms are used herein) may store data in the form of files, that is not necessarily the case. That is, a volume or file system may store data in the form of other units of data, such as blocks or LUNs.
In certain embodiments, each aggregate uses a physical volume block number (PVBN) space that defines the physical storage space of blocks provided by the storage devices of the physical volume, and likewise, each volume uses a virtual volume block number (VVBN) space to organize those blocks into one or more higher-level objects, such as directories, subdirectories, and files. A PVBN, therefore, is an address of a physical block in the aggregate, and a VVBN is an address of a block in a volume (the same block as referenced by the corresponding PVBN), i.e., the offset of the block within the volume. The storage manager 10 tracks information for all of the VVBNs and PVBNs in each storage system 2. The storage manager 10 may manage multiple volumes on a common set of physical storage in the aggregate.
In addition, data within the storage system is managed at a logical block level. At the logical block level, the storage manager maintains a logical block number (LBN) for each data block. If the storage system stores data in the form of files, the LBNs are called file block numbers (FBNs). Each FBN indicates the logical position of the block within a file, relative to other blocks in the file, i.e., the offset of the block within the file. For example, FBN 0 represents the first logical block in a particular file, while FBN 1 represents the second logical block in the file, and so forth. Note that the PVBN and VVBN of a data block are independent of the FBN(s) that refer to that block. In one embodiment, the FBN of a block of data at the logical block level is assigned to a PVBN-VVBN pair.
In certain embodiments, each file is represented in the storage system in the form of a hierarchical structure called a buffer tree. As used herein, the term “buffer tree” is defined as a hierarchical metadata structure containing references (or pointers) to logical blocks of data in the file system. A buffer tree is a hierarchical structure which is used to store file data as well as metadata about a file, including pointers for use in locating the data blocks for the file. A buffer tree includes one or more levels of indirect blocks (called “L1 blocks”, “L2 blocks”, etc.), each of which contains one or more pointers to lower-level indirect blocks and/or to the direct blocks (called “L0 blocks”) of the file. All of the data in the file is stored only at the lowest level (L0) blocks. The root of a buffer tree is the “inode” of the file. An inode is a metadata container that is used to store metadata about the file, such as ownership, access permissions, file size, file type, and pointers to the highest level of indirect blocks for the file. Each file has its own inode. The inode is stored in a separate inode file, which may itself be structured as a buffer tree. In hierarchical (or nested) directory file systems, this essentially results in buffer trees within buffer trees, where subdirectories are nested within higher-level directories and entries of the directories point to files, which also have their own buffer trees of indirect and direct blocks. Directory entries include the name of a file in the file system, and directories are said to point to (reference) that file. Alternatively, a directory entry can point to another directory in the file system. In such a case, the directory with the entry is said to be the “parent directory,” while the directory that is referenced by the directory entry is said to be the “child directory” or “subdirectory.”
FIG. 4 depicts a buffer tree 400 of a file according to an illustrative embodiment. In the illustrated embodiment, a file is assigned an inode 422, which references Level 1 (L1) indirect blocks 424A and 424B. Each indirect block 424 stores at least one PVBN and a corresponding VVBN for each PVBN. There is a one-to-one mapping between each VVBN and PVBN. Note that a PVBN is a block number in an aggregate (i.e., offset from the beginning of the storage locations in an aggregate), and a VVBN is a block number in a volume (offset from the beginning of the storage locations in a volume); however, there is only one copy of the L0 data block physically stored in the physical mass storage of the storage system. Also, to simplify description, only one PVBN-VVBN pair is shown in each indirect block 424 in FIG. 4; however, an actual implementation would likely include multiple PVBN-VVBN pairs in each indirect block 424. Each PVBN references a physical block 427A and 427B, respectively, in the storage device (i.e., in the aggregate L0 blocks 433), and the corresponding VVBN references a logical block 428A and 428B, respectively, in the storage device (i.e., in volume L0 blocks 431). In addition, volumes can also be represented by files called “container files.” In such a case, the VVBN references a block number offset from the beginning of the container file representing the volume. Physical blocks 427 and logical blocks 428 are actually the same L0 data for any particular PVBN-VVBN pair; however, they are accessed in different ways: the PVBN is accessed directly in the aggregate, while the VVBN is accessed virtually via the container file representing the volume.
FIG. 5 depicts a buffer tree 500 including an inode file 541 according to an illustrative embodiment. In FIG. 5, for each volume managed by the storage system 2, the inodes of the files and directories in that volume are stored in the inode file 541. A separate inode file 541 is maintained for each volume. The inode file 541, in one embodiment, is a data structure representing a master list of file system objects (e.g., directories, subdirectories and files) of the file system in the storage system and each inode entry identifies a particular file system object within the file system. Each inode 422 in the inode file 541 is the root of a buffer tree 400 of the file corresponding to the inode 422. The location of the inode file 541 for each volume is stored in a volume information (“VolumeInfo”) block 542 associated with that volume. The VolumeInfo block 542 is a metadata container that contains metadata that applies to the volume as a whole. Examples of such metadata include, for example, the volume's name, its type, its size, any space guarantees to apply to the volume, and the VVBN of the inode file of the volume.
File system objects can be, for example, files, directories, subdirectories, and/or LUNs of the file system. File system object inodes are arranged sequentially in the inode file, and a file system object's position in the inode file is given by its inode number. An inode includes a master location catalog for the file, directory, or other file system object and various bits of information about the file system object called metadata. The metadata includes, for example, the file system object's creation date, security information such as the file system object's owner and/or protection levels, and its size. The metadata also includes a “type” designation to identify the type of the file system object. The type could be at least one of the following types: 1) a “file”; 2) a “directory”; 3) “unused”; or 4) “not yet known.” Directory inodes include a directory entry for each file system object contained in the directory (referred to as “child” objects). Each directory entry then includes the name of the child file system object the directory entry references and the object's inode and generation numbers. In addition to inodes associated with file system objects, the file system may also maintain “unused” inodes for each inode number that is not associated with a file system object.
The metadata also includes the “generation number” of the file system object. As time goes by, file system objects are created or deleted, and slots in the inode file are recycled. When a file system object is created, its inode is given a new generation number, which is guaranteed to be different from (e.g., larger than) the previous file system object at that inode number (if any). If repeated accesses are made to the file system object by its inode number (e.g., from clients, applications, etc.), the generation number can be checked to avoid inadvertently accessing a different file system object after the original file system object was deleted. The metadata also includes “parent information,” which includes the inode number of the file system object's parent directory. A file system object can have multiple parent directories.
For various reasons, it may be desirable to maintain a replica of a data set in the source storage system. For example, in the event of a power failure or other type of failure, data lost at the source storage system can be recovered from the replica stored in the destination storage system. In at least one embodiment, the data set is a file system of the storage system, and replication is performed using snapshots. A “snapshot” is a persistent image (usually read-only) of the file system at a point in time and can be generated by the snapshot differential module 12. At a point in time, the snapshot differential module 12 generates a first snapshot of the file system of the source storage system, referred to as the baseline snapshot. This baseline snapshot is then provided to the source replication engine 8A for a baseline replication process. The system executes the baseline replication process by generating a set of replication operations corresponding to the file system objects in the baseline snapshot. The replication operation will be executed on the destination storage system 2B to replicate the initial state of the storage system. The system may generate one or more replication operations for each file system object on the source storage system 2A. The replication operations may be sent in any arbitrary order and are not restricted to chronological order or the file system hierarchy. Subsequently, the snapshot differential module 12 generates additional snapshots of the file system from time to time.
At some later time, the source replication engine 8A executes another replication process (which may be at the request of the destination replication engine 8B). To do so, the source replication engine 8A needs to be updated with the changes to the file system of the source storage system since a previous replication process was performed. The snapshot differential module 12 compares the most recent snapshot of the file system of the source storage system to the snapshot of a previous replication process to determine differences between a recent snapshot and the previous snapshot. The snapshot differential module 12 identifies any data that has been added or modified since the previous snapshot operation, and sends those additions or modifications to the source replication engine 8A for replication. The source replication engine 8A then generates replication operations for each of the additions or modifications. The replication operations are transmitted to the destination replication engine 8B for execution on the destination storage system 2B. As with the baseline replication process, the replication operations may be sent in any arbitrary order.
A replication process transfers information about a set of replication operations from a source file system to the replica destination file system. In one embodiment, a replication operation includes data operations, directory operations, and inode operations. A “data operation” transfers 1) a block of file data, 2) the inode number of the block of data, 3) the generation number of the file, 4) the position of the block within the file (e.g., FBN), and 5) the type of the file. A “directory operation” transfers 1) the inode number of the directory, 2) the generation number of the directory, and 3) enough information to reconstitute an entry in that directory, including 1) the name, 2) inode number, and 3) generation number of the file system object the directory entry points to. Finally, an “inode operation” transfers 1) the metadata of an inode, 2) its inode number, and 3) the generation of the inode. To perform a replication of an entire file system, the source storage system sends a sequence of data operations, directory operations, and inode operations to the destination, which is expected to process the operations and send acknowledgments to the source. As used herein, the inode number (or numbers) in each replication operation is referred to as the “target inode number.” A “destination inode” is an inode on the destination storage system having the same inode number as the target inode number in a received replication operation.
A replication of a file system may be either an “initialization,” in which the destination file system starts from scratch with no files or directories, or an “update,” in which the destination file system already has some files and directories from an earlier replication process of an earlier version of the source. In an update, the source file system does not need to send every file and directory to the destination; rather, it sends only the changes that have taken place since the earlier version was replicated. In an update, an inode operation may be used to indicate that a file has been deleted, and also possibly that another file has been created at the same inode number. Inode operations have various types, including delete (where the file system object associated with the inode number is deleted), create (where a new file system object is created at the target inode number), and modify (where the contents or metadata of the file system object are modified). Similarly, in an initialization, the system sends create and modify operations to build the file and directory structure.
As noted above, the destination storage system may receive the replication operations in an arbitrary order. This simplifies processing for the source replication engine 8A by allowing it to send replication operations as they are created, rather than imposing additional timing requirements. However, in many cases, the arbitrary order results in the destination replication engine 8B receiving replication operations that are inconsistent with the existing file system on the destination storage system. This may result when the source storage system deleted a file system object (freeing its inode) and created a new file system object having the same inode number. If the destination replication engine 8B receives the create operation before the delete operation, it determines that an inconsistency exists because the create operation is directed to an inode number that is already in use. An inconsistency may also result if the source storage system created a new file system object at an unused inode and later modified the inode. If the operations are received out of order, the destination replication engine 8B determines that an inconsistency exists because the modify operation is directed to an unused inode. This also occurs when the system receives a replication operation directed to a first inode type (e.g., a directory) while the target inode is a second inode type (e.g., a file). One possible solution would require the destination replication engine 8B to store the inconsistent operations until the corresponding delete operation is received. However, this would be inefficient and would defeat the purpose of providing the replication operations in an arbitrary order.
Instead, the current system solves this problem by “hijacking” the target inode. As used herein, “hijacking” occurs when the destination replication engine 8B detects an inconsistency between the replication operation and the target inode and replaces metadata in the target inode with data from the replication operation. FIG. 6 illustrates a logical block diagram of the hijack system 600. The system 600 can be implemented by the destination replication engine 8B (FIG. 1) executing on the destination storage system or by other hardware that has access to the file system of the destination storage subsystem 4B. Aspects of the system may be implemented as special purpose hardware circuitry, programmable circuitry, or a combination of these. As will be discussed in additional detail herein, the system 600 includes a number of modules to facilitate the functions of the system. Although the various modules are described as residing in a single system, the modules are not necessarily physically co-located. In some embodiments, the various modules could be distributed over multiple physical devices, and the functionality implemented by the modules may be provided by calls to remote services. Similarly, the data structures could be stored in local storage or remote storage and distributed in one or more physical devices. Assuming a programmable implementation, the code to support the functionality of this system may be stored on a computer-readable medium such as an optical drive, flash memory, or a hard drive. One skilled in the art will appreciate that at least some of these individual components and subcomponents may be implemented using ASICs, PLDs, or a general-purpose processor configured with software and/or firmware.
As shown in FIG. 6, the system 600 includes a network interface 604, which is configured to receive replication operations from the source storage system 2A. The network interface 604 may be implemented using the network adapter 126 (FIG. 2). The system 600 also includes a storage interface 606, which is configured to communicate with a destination storage subsystem 4B to execute the replication operations and which can be the storage adapter 128 in FIG. 2.
The system 600 has a processing component 602, which processes received replication operations and controls the destination storage subsystem based on the operations. The processing component 602 could be implemented by the processor 130 of FIG. 2. As discussed above, each replication operation includes an inode number of a file system object that is created, modified, or deleted by the operation. For create operations, the replication operation may also include inode numbers of one or more parent inodes of the file system object to be created.
The processing component 602 includes a lookup component 610, which is configured to determine one or more destination inodes on the destination storage system corresponding to the target inode numbers in the replication operation. Thus, the lookup component 610 determines the target inode numbers based on the replication operation and accesses the file system to retrieve information stored in the corresponding destination inodes. This information includes file system object metadata, such as type, generation, creation date, modification date, etc.
The processing component 602 also includes an evaluation component 612, which is configured to detect an inconsistency between the replication operation and the destination inode. Based on the detected inconsistency, the evaluation component 612 determines that a replacement operation has occurred. As discussed above, an inconsistency exists when the system receives a replication operation that cannot properly be executed on the target inode. Inconsistencies may be detected for various reasons. Examples of inconsistencies include:
    • The system receives a create operation directed to an inode that already exists on the destination storage system 2B;
    • The system receives a replication operation including a target inode type that differs from the inode type of the corresponding destination inode;
    • The system receives a replication operation including a target inode generation number that differs from the generation number of the corresponding destination inode; and
    • The system receives a replication operation that is inconsistent with the type of the destination inode (e.g., the replication operation adds data blocks while the corresponding destination inode is a directory, or the replication operation adds a directory entry while the corresponding destination inode is a file).
      In a special case of the last example above, an inconsistency exists when the system receives a modify operation directed to an unused inode (i.e., an inode having a type of “unused”). This may occur during a replication process when the source storage system 2A generates a create operation and a modify operation directed to an unused inode and the modify operation is received before the create operation.
The processing component 602 includes a hijack component 614, which is configured to hijack the destination inode based on the information in the replication operation. During a hijack process, the system replaces metadata in the destination inode based on the metadata in the replication operation. The hijack operation often implicitly supersedes a delete operation that will arrive at some later point in the replication process. Thus, when the system receives a replication operation containing the delete operation at a later time, the system can ignore the operation. Optionally, the source storage system 2A may elect not to send the delete operation. Advantages of this include saving processing on the source storage system 2A and saving network bandwidth on the interconnect 6. During operation, the hijack component 614 frees any data blocks associated with the destination inode (if the destination inode is a file). In one embodiment, the hijack component 614 frees data blocks by modifying the destination inode to replace references to the data blocks with null references. The hijack component 614 may also direct the storage manager 10B to allow the data blocks to be written to. The hijack component 614 then replaces the file system metadata in the destination inode with metadata received in the replication operation. The hijack component 614 may also delete metadata that cannot be replaced based on the information in the received replication operation and/or replace the metadata with default values. The system can fill in the metadata at a later time when additional replication operations are received. This may occur, for example, during a baseline replication when the source replication engine 8A generates two separate operations directed at a specific inode number: a create operation and a modify operation. If the modify operation is received first, the system determines that an inconsistency exists because the modify operation is directed to an unused inode. However, the system may be unable to completely fill in the metadata associated with the new inode. In this situation, the hijack component 614 hijacks the destination inode and fills in the data included in the modify operation while erasing remaining data from the prior inode. The system 600 can then replace the remainder of the metadata when the create operation is received.
The processing component 602 also includes a file system control component 616, which is configured to execute various file system cleanup operations after the hijack process has been executed. In particular, the file system control component 616 is configured to invalidate any file handles that are currently pointing to the destination inode after it is hijacked. This is done because the hosts 1 frequently cache file handles pointing to a particular inode to avoid having to make repeated requests to the file system for a file handle. The file system control component 616 invalidates these file handles to avoid generating file system errors when the host 1 attempts to use the handle.
FIG. 7 is a flow chart of a process 700 for executing the inode hijack system, which may be executed by the system 600. The process 700 operates to detect inconsistencies in replication operations received through the network interface 604 and to hijack a destination inode where necessary. Processing begins in step 704, where the system receives a replication operation. As discussed above, replication operations are received in an arbitrary order that is not restricted by chronological order or file system hierarchy. FIG. 8 is an example structure of a replication operation. As shown in FIG. 8, the replication operation data structure includes information defining the replication operation, including the operation type 802, target inode number 804, target inode generation 806, and metadata associated with the operation 808 (e.g., create/modify time, inode type, and parent information). One skilled in the art will appreciate that the replication operation data structure may also include other fields as needed to support the replication process.
Processing then proceeds to step 706, where the system determines one or more destination inodes corresponding to the replication operation. As described above, this includes looking up inodes based on the target inode number(s). In some embodiments, the destination inodes include unused inodes corresponding to the target inode number(s). The system then provides the destination inodes to the remaining components in the system for processing.
Processing then proceeds to step 708, where it attempts to detect an inconsistency between the replication operation and the destination inode(s). As discussed above, the system may determine that an inconsistency exists based on the type of operation (e.g., a create operation directed to an existing inode, a modify operation directed to an unused inode, etc.). The system can also detect inconsistencies by comparing information in the destination inode(s) (e.g., inode generation, inode type, etc.) to information from the replication operation, where an inconsistency is identified if the information does not match. The system then proceeds to decision step 710, where it determines whether the replication operation is a replacement operation. A replacement operation is identified when the system has detected an inconsistency between the replication operation and the destination inode. If the system determines that the replication operation is not a replacement operation, the process 700 ends and the replication operation is executed.
If the system determines that the replication operation is a replacement operation, processing proceeds to subprocess 712, in which the system executes the hijack operation. In subprocess 712, the system hijacks the destination inode by replacing existing data with data from the replication operation. In particular, the system first proceeds to step 714, where it frees any data blocks associated with the destination inode if the destination inode is a file. As discussed above, the system frees data blocks by modifying the metadata in the destination inode to replace references to each data block with a null or default reference. The system may also notify the storage manager 10 that the data blocks can be reused. The processing of step 714 may be executed synchronously or asynchronously. In a synchronous operation, the process 700 pauses execution until every data block associated with the destination inode has been freed by the file system. Alternatively, in an asynchronous operation, the system does not have to wait for the file system to free each block. For an asynchronous operation, the system copies the data block references stored in the destination inode's buffer tree to a temporary file. The system then directs the file system to free the data associated with the temporary file as a background process. If the hijack operation is directed to a directory inode, the system may also delete the contents of the directory. In some embodiments, the system frees all data blocks within the directory. Alternatively, the system may delete the inodes for all file system objects within the directory. In this embodiment, the source storage system 2A can then omit delete operations directed to the file system objects within the directory, which reduces processing at the source storage system 2A and bandwidth use on the interconnect 6.
Processing then proceeds to step 716, where the system replaces the metadata in the destination inode with metadata from the replication operation. If the replication operation is a file create or file modify operation, this includes associating a new set of data blocks with the inode. The system may also erase any metadata associated with the original inode that is not directly replaced. As discussed above, in some cases the metadata needed to fully populate the target inode is contained in multiple replication operations. In order to avoid any inconsistency within the inode, the system erases the contents of the previous inode and/or replaces the contents with default values rather than having metadata from two distinct versions of the inode reside simultaneously in the same inode.
The system then proceeds to step 718, where it increments the generation number of the destination inode. As discussed above, the generation number allows hosts 1 to determine whether a file handle is pointing to the same inode that it originally referenced. If the generation numbers differ, the host knows that the handle is no longer valid (i.e., that the file system object previously referenced by the handle no longer exists). However, this step may be skipped in some cases where it is not necessary to change the generation number of the target inode. After incrementing the generation number, the process 700 ends.
In some cases the system receives multiple replication operations to fully populate the metadata for a particular inode. This may occur, for example, when the system receives a modify operation before it receives a create operation for the same inode (e.g., during a baseline replication). In these cases, the system replaces the inode's metadata where possible and replaces the remaining data with default values or erases the data. At a later time, the system receives a second replication operation that provides the remainder of the file system metadata. After determining that the second replication operation includes the remaining metadata, the system replaces the remaining data in the inode with data from the replication operation. For this type of operation, there is no hijack—i.e., the system does not free data blocks or change the generation number or type of the inode.
From the foregoing, it will be appreciated that specific embodiments of the invention have been described herein for purposes of illustration, but that various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.

Claims (18)

The invention claimed is:
1. A method for processing storage operations in a network storage system, the method comprising:
receiving multiple replication operations for execution on a destination storage system in an arbitrary order, wherein each replication operation includes a target metadata container identifier;
detecting an inconsistency between an individual replication operation and a destination metadata container on the destination storage system when the destination storage system cannot properly execute the individual replication operation, wherein the destination metadata container corresponds to the target metadata container identifier;
in response to detecting the inconsistency, determining that a replacement operation has occurred; and
replacing a portion of a metadata section of the destination metadata container based on the replication operation;
wherein the multiple replication operations include a first replication operation defining a modify operation directed to a metadata container type that differs from the destination metadata container type and a second replication operation defining a create operation, and wherein the hijack component is further configured to:
store metadata in a first part of the destination metadata container based on the first replication operation; and
store metadata in a second part of the destination metadata container based on the second replication operation.
2. The method of claim 1, wherein detecting the inconsistency comprises:
determining that the replication operation is directed to create the destination metadata container that already exists in the destination storage system; or
determining that the replication operation is directed to modify the destination metadata container that is unused; or
determining that a target type of the replication operation differs from a metadata container type of the destination metadata container; or
determining that a target generation of the target metadata container identifier differs from a destination metadata container generation of the destination metadata container; or
determining that the replication operation is inconsistent with the type of the destination metadata container.
3. The method of claim 2, wherein the replication operation inconsistent with the type of the destination metadata container is a replication operation to add data blocks to the destination metadata container of a directory, or a replication operation to add a directory entry to the destination metadata container of a file.
4. The method of claim 1, wherein the destination metadata container is a file and wherein replacing the portion of the metadata section comprises freeing a first set of metadata blocks associated with the file and associating a second set of metadata blocks determined based on the replication operation.
5. The method of claim 1, wherein replacing the portion of the metadata section comprises replacing a first subsection of the metadata section with metadata included in the individual replication operation and replacing a second section of the metadata section with a default value.
6. The method of claim 1, wherein the individual replication operation is a first replication operation, wherein replacing the portion of the metadata section comprises replacing a first subsection of the metadata section with metadata included in the first replication operation and erasing a second subsection of the metadata section or replacing the second subsection with a default value, and further comprising:
replacing the second subsection of the metadata section based on metadata included in a second replication operation of the multiple replication operations.
7. The method of claim 1, wherein the arbitrary order is not based on chronological order or file system hierarchy.
8. An apparatus for processing storage operations in a storage system, the apparatus comprising:
a storage interface configured to communicate with a destination storage system;
a storage operation interface configured to receive multiple replication operations to be executed on the destination storage system provided in an arbitrary order from a remote device, wherein each replication operation includes a target metadata container address;
a processor;
a memory;
a lookup component configured to find a matching metadata container on the destination storage system corresponding to an individual target metadata container address;
an evaluation component configured to detect an inconsistency between an individual replication operation corresponding to the individual target metadata container address and the matching metadata container and to determine based on the detection that the individual replication operation is a replacement operation; and
a hijack component configured to replace a part of the metadata in the matching metadata container based on an individual replication operation corresponding to the individual target metadata container address;
wherein the multiple replication operations include a first replication operation defining a modify operation directed to a metadata container type that differs from the matching metadata container type and a second replication operation defining a create operation, and wherein the hijack component is further configured to:
store metadata in a first part of the matching metadata container based on the first replication operation; and
store metadata in a second part of the matching metadata container based on the second replication operation.
9. The apparatus of claim 8, wherein the evaluation component is configured to:
determine that the replication operation is directed to create the matching metadata container that already exists in the destination storage system; or
determine that one of the replication operations is directed to modify the matching metadata container that is unused; or
determine that a target type one of the replication operations differs from a metadata container type of the matching metadata container; or
determine that a target generation of one of the replication operations differs from a metadata container generation of the matching metadata container; or
determine that one of the replication operations is inconsistent with the type of the matching metadata container.
10. The apparatus of claim 8, wherein:
an individual target metadata container address of the target metadata container address includes a target metadata container indicator and a target metadata container generation;
the matching metadata container includes a destination metadata container indicator, a destination metadata container generation, and a destination metadata container type; and
the evaluation component is configured to detect an inconsistency if the destination metadata container indicator is the same as the target metadata container indicator and if the destination generation differs from the target generation or the destination metadata container type differs from the target metadata container type.
11. The apparatus of claim 8, wherein the hijack component is configured to replace the part of the metadata in the matching metadata container by:
freeing data blocks associated with the matching metadata container; and
storing metadata from the multiple replication operations into a metadata section of the matching metadata container.
12. The apparatus of claim 8, wherein the multiple replication operations are received asynchronously.
13. A method for replicating data in a destination network storage system, the method comprising:
receiving information defining multiple replication operations, wherein the information specifies a target inode identifier, an operation type, and metadata for each replication operation;
determining a destination inode on the destination network storage system based on at least a portion of the target inode identifier for an individual replication operation, wherein the destination inode includes destination inode metadata;
detecting a replacement operation by comparing the destination inode metadata to at least one of the target inode identifier, the operation type, and the metadata, wherein a replacement operation exists when the individual replication operation cannot be executed on the destination inode; and
in response to detecting the replacement operation, storing at least a part of the destination inode metadata in the destination inode with the metadata associated with the target inode identifier;
wherein the individual replication operation is a first replication operation, wherein the operation type is a modify operation directed to an inode type that differs from a type of the destination inode, wherein storing at least part of the metadata comprises storing a part of the metadata in a first section of the destination inode, wherein the information defining multiple replication operations includes a second replication operation having a create type, and further comprising:
storing metadata in a second section of the destination inode based on the second replication operation.
14. The method of claim 13, wherein the destination inode is a first inode type and the replication operation is directed to a second inode type, wherein the second inode type differs from the first inode type.
15. The method of claim 13, further comprising:
receiving information defining a second replication operation, wherein the information defines a delete operation applied to the target node identifier; and
discarding the information defining the second replication operation without executing the second replication operation.
16. The method of claim 13, wherein the target inode identifier includes a target inode indicator, a target inode generation, and a target inode type, wherein the destination inode includes a destination inode indicator, a destination inode generation, and a destination inode type, and wherein a replacement operation is detected if the destination inode indicator is the same as the target inode indicator and if the destination inode generation differs from the target inode generation or the destination inode type differs from the target inode type.
17. The method of claim 13, wherein storing at least part of the metadata comprises:
freeing data blocks associated with the destination inode; and
storing the at least part of the metadata into a metadata section of the destination inode.
18. The method of claim 13, wherein the individual replication operation is a replication operation to add data blocks to the destination inode of a directory, or a replication operation to add a directory entry to the destination inode of a file.
US14/160,770 2009-09-14 2014-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order Active US9244626B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US14/160,770 US9244626B2 (en) 2009-09-14 2014-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order
US15/004,470 US9858001B2 (en) 2009-09-14 2016-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order
US15/850,538 US10852958B2 (en) 2009-09-14 2017-12-21 System and method for hijacking inodes based on replication operations received in an arbitrary order

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/559,483 US8671072B1 (en) 2009-09-14 2009-09-14 System and method for hijacking inodes based on replication operations received in an arbitrary order
US14/160,770 US9244626B2 (en) 2009-09-14 2014-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/559,483 Continuation US8671072B1 (en) 2009-09-14 2009-09-14 System and method for hijacking inodes based on replication operations received in an arbitrary order

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/004,470 Continuation US9858001B2 (en) 2009-09-14 2016-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order

Publications (2)

Publication Number Publication Date
US20140136805A1 US20140136805A1 (en) 2014-05-15
US9244626B2 true US9244626B2 (en) 2016-01-26

Family

ID=50192837

Family Applications (4)

Application Number Title Priority Date Filing Date
US12/559,483 Active 2030-09-02 US8671072B1 (en) 2009-09-14 2009-09-14 System and method for hijacking inodes based on replication operations received in an arbitrary order
US14/160,770 Active US9244626B2 (en) 2009-09-14 2014-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order
US15/004,470 Active US9858001B2 (en) 2009-09-14 2016-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order
US15/850,538 Active 2030-11-08 US10852958B2 (en) 2009-09-14 2017-12-21 System and method for hijacking inodes based on replication operations received in an arbitrary order

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/559,483 Active 2030-09-02 US8671072B1 (en) 2009-09-14 2009-09-14 System and method for hijacking inodes based on replication operations received in an arbitrary order

Family Applications After (2)

Application Number Title Priority Date Filing Date
US15/004,470 Active US9858001B2 (en) 2009-09-14 2016-01-22 System and method for hijacking inodes based on replication operations received in an arbitrary order
US15/850,538 Active 2030-11-08 US10852958B2 (en) 2009-09-14 2017-12-21 System and method for hijacking inodes based on replication operations received in an arbitrary order

Country Status (1)

Country Link
US (4) US8671072B1 (en)

Cited By (76)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10817431B2 (en) 2014-07-02 2020-10-27 Pure Storage, Inc. Distributed storage addressing
US10838633B2 (en) 2014-06-04 2020-11-17 Pure Storage, Inc. Configurable hyperconverged multi-tenant storage system
US10942869B2 (en) 2017-03-30 2021-03-09 Pure Storage, Inc. Efficient coding in a storage system
US11030090B2 (en) 2016-07-26 2021-06-08 Pure Storage, Inc. Adaptive data migration
US11074016B2 (en) 2017-10-31 2021-07-27 Pure Storage, Inc. Using flash storage devices with different sized erase blocks
US11079962B2 (en) 2014-07-02 2021-08-03 Pure Storage, Inc. Addressable non-volatile random access memory
US11086532B2 (en) 2017-10-31 2021-08-10 Pure Storage, Inc. Data rebuild with changing erase block sizes
US11138082B2 (en) 2014-06-04 2021-10-05 Pure Storage, Inc. Action determination based on redundancy level
US11144212B2 (en) 2015-04-10 2021-10-12 Pure Storage, Inc. Independent partitions within an array
US11190580B2 (en) 2017-07-03 2021-11-30 Pure Storage, Inc. Stateful connection resets
US11188476B1 (en) 2014-08-20 2021-11-30 Pure Storage, Inc. Virtual addressing in a storage system
US11204701B2 (en) 2015-12-22 2021-12-21 Pure Storage, Inc. Token based transactions
US11204830B2 (en) 2014-08-07 2021-12-21 Pure Storage, Inc. Die-level monitoring in a storage cluster
US11240307B2 (en) 2015-04-09 2022-02-01 Pure Storage, Inc. Multiple communication paths in a storage system
US11281394B2 (en) 2019-06-24 2022-03-22 Pure Storage, Inc. Replication across partitioning schemes in a distributed storage system
US11289169B2 (en) 2017-01-13 2022-03-29 Pure Storage, Inc. Cycled background reads
US11310317B1 (en) 2014-06-04 2022-04-19 Pure Storage, Inc. Efficient load balancing
US11307998B2 (en) 2017-01-09 2022-04-19 Pure Storage, Inc. Storage efficiency of encrypted host system data
US11340821B2 (en) 2016-07-26 2022-05-24 Pure Storage, Inc. Adjustable migration utilization
US11354058B2 (en) 2018-09-06 2022-06-07 Pure Storage, Inc. Local relocation of data stored at a storage device of a storage system
US11385979B2 (en) 2014-07-02 2022-07-12 Pure Storage, Inc. Mirrored remote procedure call cache
US11385799B2 (en) 2014-06-04 2022-07-12 Pure Storage, Inc. Storage nodes supporting multiple erasure coding schemes
US11392522B2 (en) 2014-07-03 2022-07-19 Pure Storage, Inc. Transfer of segmented data
US11409437B2 (en) 2016-07-22 2022-08-09 Pure Storage, Inc. Persisting configuration information
US11416144B2 (en) 2019-12-12 2022-08-16 Pure Storage, Inc. Dynamic use of segment or zone power loss protection in a flash device
US11442645B2 (en) 2018-01-31 2022-09-13 Pure Storage, Inc. Distributed storage system expansion mechanism
US11442625B2 (en) 2014-08-07 2022-09-13 Pure Storage, Inc. Multiple read data paths in a storage system
US11489668B2 (en) 2015-09-30 2022-11-01 Pure Storage, Inc. Secret regeneration in a storage system
US11494498B2 (en) 2014-07-03 2022-11-08 Pure Storage, Inc. Storage data decryption
US11507597B2 (en) 2021-03-31 2022-11-22 Pure Storage, Inc. Data replication to meet a recovery point objective
US11544143B2 (en) 2014-08-07 2023-01-03 Pure Storage, Inc. Increased data reliability
US11550473B2 (en) 2016-05-03 2023-01-10 Pure Storage, Inc. High-availability storage array
US11550752B2 (en) 2014-07-03 2023-01-10 Pure Storage, Inc. Administrative actions via a reserved filename
US11567917B2 (en) 2015-09-30 2023-01-31 Pure Storage, Inc. Writing data and metadata into storage
US11582046B2 (en) 2015-10-23 2023-02-14 Pure Storage, Inc. Storage system communication
US11592985B2 (en) 2017-04-05 2023-02-28 Pure Storage, Inc. Mapping LUNs in a storage memory
US11593203B2 (en) 2014-06-04 2023-02-28 Pure Storage, Inc. Coexisting differing erasure codes
US11604690B2 (en) 2016-07-24 2023-03-14 Pure Storage, Inc. Online failure span determination
US11604598B2 (en) 2014-07-02 2023-03-14 Pure Storage, Inc. Storage cluster with zoned drives
US11614880B2 (en) 2020-12-31 2023-03-28 Pure Storage, Inc. Storage system with selectable write paths
US11620197B2 (en) 2014-08-07 2023-04-04 Pure Storage, Inc. Recovering error corrected data
US11652884B2 (en) 2014-06-04 2023-05-16 Pure Storage, Inc. Customized hash algorithms
US11650976B2 (en) 2011-10-14 2023-05-16 Pure Storage, Inc. Pattern matching using hash tables in storage system
US11656961B2 (en) 2020-02-28 2023-05-23 Pure Storage, Inc. Deallocation within a storage system
US11656768B2 (en) 2016-09-15 2023-05-23 Pure Storage, Inc. File deletion in a distributed system
US11675762B2 (en) 2015-06-26 2023-06-13 Pure Storage, Inc. Data structures for key management
US11704073B2 (en) 2015-07-13 2023-07-18 Pure Storage, Inc Ownership determination for accessing a file
US11704192B2 (en) 2019-12-12 2023-07-18 Pure Storage, Inc. Budgeting open blocks based on power loss protection
US11714708B2 (en) 2017-07-31 2023-08-01 Pure Storage, Inc. Intra-device redundancy scheme
US11722455B2 (en) 2017-04-27 2023-08-08 Pure Storage, Inc. Storage cluster address resolution
US11734169B2 (en) 2016-07-26 2023-08-22 Pure Storage, Inc. Optimizing spool and memory space management
US11741003B2 (en) 2017-11-17 2023-08-29 Pure Storage, Inc. Write granularity for storage system
US11740802B2 (en) 2015-09-01 2023-08-29 Pure Storage, Inc. Error correction bypass for erased pages
US11775491B2 (en) 2020-04-24 2023-10-03 Pure Storage, Inc. Machine learning model for storage system
US11775428B2 (en) 2015-03-26 2023-10-03 Pure Storage, Inc. Deletion immunity for unreferenced data
US11782625B2 (en) 2017-06-11 2023-10-10 Pure Storage, Inc. Heterogeneity supportive resiliency groups
US11789626B2 (en) 2020-12-17 2023-10-17 Pure Storage, Inc. Optimizing block allocation in a data storage system
US11797212B2 (en) 2016-07-26 2023-10-24 Pure Storage, Inc. Data migration for zoned drives
US11822444B2 (en) 2014-06-04 2023-11-21 Pure Storage, Inc. Data rebuild independent of error detection
US11836348B2 (en) 2018-04-27 2023-12-05 Pure Storage, Inc. Upgrade for system with differing capacities
US11842053B2 (en) 2016-12-19 2023-12-12 Pure Storage, Inc. Zone namespace
US11847331B2 (en) 2019-12-12 2023-12-19 Pure Storage, Inc. Budgeting open blocks of a storage unit based on power loss prevention
US11847324B2 (en) 2020-12-31 2023-12-19 Pure Storage, Inc. Optimizing resiliency groups for data regions of a storage system
US11846968B2 (en) 2018-09-06 2023-12-19 Pure Storage, Inc. Relocation of data for heterogeneous storage systems
US11847013B2 (en) 2018-02-18 2023-12-19 Pure Storage, Inc. Readable data determination
US11861188B2 (en) 2016-07-19 2024-01-02 Pure Storage, Inc. System having modular accelerators
US11868309B2 (en) 2018-09-06 2024-01-09 Pure Storage, Inc. Queue management for data relocation
US11869583B2 (en) 2017-04-27 2024-01-09 Pure Storage, Inc. Page write requirements for differing types of flash memory
US11886334B2 (en) 2016-07-26 2024-01-30 Pure Storage, Inc. Optimizing spool and memory space management
US11886288B2 (en) 2016-07-22 2024-01-30 Pure Storage, Inc. Optimize data protection layouts based on distributed flash wear leveling
US11886308B2 (en) 2014-07-02 2024-01-30 Pure Storage, Inc. Dual class of service for unified file and object messaging
US11893126B2 (en) 2019-10-14 2024-02-06 Pure Storage, Inc. Data deletion for a multi-tenant environment
US11893023B2 (en) 2015-09-04 2024-02-06 Pure Storage, Inc. Deterministic searching using compressed indexes
US11899582B2 (en) 2019-04-12 2024-02-13 Pure Storage, Inc. Efficient memory dump
US11922070B2 (en) 2016-10-04 2024-03-05 Pure Storage, Inc. Granting access to a storage device based on reservations
US11955187B2 (en) 2022-02-28 2024-04-09 Pure Storage, Inc. Refresh of differing capacity NAND

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8655848B1 (en) 2009-04-30 2014-02-18 Netapp, Inc. Unordered idempotent logical replication operations
US8484164B1 (en) * 2009-10-23 2013-07-09 Netapp, Inc. Method and system for providing substantially constant-time execution of a copy operation
US9910904B2 (en) 2011-08-30 2018-03-06 International Business Machines Corporation Replication of data objects from a source server to a target server
US8972344B2 (en) * 2012-06-07 2015-03-03 Wal-Mart Stores, Inc. Sequence engine
US9286320B2 (en) * 2013-03-06 2016-03-15 Infinidat Ltd. System and method for maintaining consistency among metadata elements of filesystem's logical objects
US9225596B2 (en) * 2013-05-02 2015-12-29 Citrix Systems, Inc. Undifferentiated service domains
US11018795B2 (en) * 2014-09-29 2021-05-25 The Regents Of The University Of California Methods and apparatus for coding for interference network
US9697227B2 (en) 2014-10-27 2017-07-04 Cohesity, Inc. Concurrent access and transactions in a distributed file system
US20160335198A1 (en) * 2015-05-12 2016-11-17 Apple Inc. Methods and system for maintaining an indirection system for a mass storage device
US10725708B2 (en) 2015-07-31 2020-07-28 International Business Machines Corporation Replication of versions of an object from a source storage to a target storage
US10769024B2 (en) * 2015-07-31 2020-09-08 Netapp Inc. Incremental transfer with unused data block reclamation
US10127243B2 (en) * 2015-09-22 2018-11-13 International Business Machines Corporation Fast recovery using self-describing replica files in a distributed storage system
US20170161150A1 (en) * 2015-12-07 2017-06-08 Dell Products L.P. Method and system for efficient replication of files using shared null mappings when having trim operations on files
US10558622B2 (en) * 2016-05-10 2020-02-11 Nasuni Corporation Network accessible file server
US10585860B2 (en) 2017-01-03 2020-03-10 International Business Machines Corporation Global namespace for a hierarchical set of file systems
CN108108467B (en) * 2017-12-29 2021-08-20 北京奇虎科技有限公司 Data deleting method and device
US11593315B2 (en) * 2018-06-26 2023-02-28 Hulu, LLC Data cluster migration using an incremental synchronization
CN209025978U (en) * 2018-06-27 2019-06-25 蔚来汽车有限公司 Captive nut, battery component and vehicle
CN109344206B (en) * 2018-12-03 2021-07-16 天津电气科学研究院有限公司 OLAP metadata conflict automatic repairing method based on query reasoning
US11436189B2 (en) * 2019-02-19 2022-09-06 International Business Machines Corporation Performance- and cost-efficient archiving of small objects
US11157455B2 (en) 2019-03-19 2021-10-26 Netapp Inc. Inofile management and access control list file handle parity
US10852985B2 (en) 2019-03-19 2020-12-01 Netapp Inc. Persistent hole reservation
US11086551B2 (en) 2019-03-19 2021-08-10 Netapp, Inc. Freeing and utilizing unused inodes
US11151162B2 (en) 2019-03-19 2021-10-19 Netapp Inc. Timestamp consistency for synchronous replication
US11841825B2 (en) 2021-11-30 2023-12-12 Dell Products L.P. Inode clash resolution during file system migration
US20230169033A1 (en) * 2021-11-30 2023-06-01 Dell Products, L.P. Efficient Transparent Switchover of File System Consolidation Migrations

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812773A (en) * 1996-07-12 1998-09-22 Microsoft Corporation System and method for the distribution of hierarchically structured data
US6993539B2 (en) * 2002-03-19 2006-01-31 Network Appliance, Inc. System and method for determining changes in two snapshots and for transmitting changes to destination snapshot
US20060095480A1 (en) * 2004-10-29 2006-05-04 Microsoft Corporation Method and subsystem for performing subset computation for replication topologies
US20060106895A1 (en) * 2004-11-12 2006-05-18 Microsoft Corporation Method and subsystem for performing metadata cleanup for replication topologies
US7237076B2 (en) * 2003-03-18 2007-06-26 Hitachi, Ltd. Method of maintaining a plurality of snapshots, server apparatus and storage apparatus
US7243115B2 (en) * 2002-03-19 2007-07-10 Network Appliance, Inc. System and method for asynchronous mirroring of snapshots at a destination using a purgatory directory and inode mapping
US20070256055A1 (en) * 2004-11-19 2007-11-01 Adrian Herscu Method for building component-software for execution in a standards-compliant programming environment
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
US7885923B1 (en) * 2006-06-30 2011-02-08 Symantec Operating Corporation On demand consistency checkpoints for temporal volumes within consistency interval marker based replication

Family Cites Families (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5544347A (en) 1990-09-24 1996-08-06 Emc Corporation Data storage system controlled remote data mirroring with respectively maintained data indices
ATE409907T1 (en) 1993-06-03 2008-10-15 Network Appliance Inc METHOD AND DEVICE FOR DESCRIBING ANY AREAS OF A FILE SYSTEM
US5504861A (en) 1994-02-22 1996-04-02 International Business Machines Corporation Remote data duplexing
JP2894676B2 (en) 1994-03-21 1999-05-24 インターナショナル・ビジネス・マシーンズ・コーポレイション Asynchronous remote copy system and asynchronous remote copy method
US5592618A (en) 1994-10-03 1997-01-07 International Business Machines Corporation Remote copy secondary data copy validation-audit function
US5682513A (en) 1995-03-31 1997-10-28 International Business Machines Corporation Cache queue entry linking for DASD record updates
US6144999A (en) 1998-05-29 2000-11-07 Sun Microsystems, Incorporated Method and apparatus for file system disaster recovery
US6539396B1 (en) 1999-08-31 2003-03-25 Accenture Llp Multi-object identifier system and method for information service pattern environment
US7203732B2 (en) 1999-11-11 2007-04-10 Miralink Corporation Flexible remote data mirroring
JP3983451B2 (en) * 2000-04-07 2007-09-26 シャープ株式会社 Digital signal sampling frequency converter
US6985499B2 (en) 2000-04-20 2006-01-10 Symmetricom, Inc. Precise network time transfer
US7334098B1 (en) 2000-06-06 2008-02-19 Quantum Corporation Producing a mass storage backup using a log of write commands and time information
US6711693B1 (en) 2000-08-31 2004-03-23 Hewlett-Packard Development Company, L.P. Method for synchronizing plurality of time of year clocks in partitioned plurality of processors where each partition having a microprocessor configured as a multiprocessor backplane manager
WO2002019097A1 (en) 2000-09-01 2002-03-07 International Interactive Commerce, Ltd. System and method for collaboration using web browsers
US6725342B1 (en) 2000-09-26 2004-04-20 Intel Corporation Non-volatile mass storage cache coherency apparatus
US7302634B2 (en) 2001-03-14 2007-11-27 Microsoft Corporation Schema-based services for identity-based data access
US7016963B1 (en) 2001-06-29 2006-03-21 Glow Designs, Llc Content management and transformation system for digital content
US7117266B2 (en) 2001-07-17 2006-10-03 Bea Systems, Inc. Method for providing user-apparent consistency in a wireless device
US6912645B2 (en) 2001-07-19 2005-06-28 Lucent Technologies Inc. Method and apparatus for archival data storage
US7136882B2 (en) 2001-07-31 2006-11-14 Hewlett-Packard Development Company, L.P. Storage device manager
US20070094466A1 (en) 2001-12-26 2007-04-26 Cisco Technology, Inc., A Corporation Of California Techniques for improving mirroring operations implemented in storage area networks and network based virtualization
US7114091B2 (en) 2002-03-18 2006-09-26 National Instruments Corporation Synchronization of distributed systems
US6943610B2 (en) 2002-04-19 2005-09-13 Intel Corporation Clock distribution network using feedback for skew compensation and jitter filtering
US6983353B2 (en) 2002-04-29 2006-01-03 Emc Corporation Method and apparatus for enhancing operations in disk array storage devices
US6842825B2 (en) 2002-08-07 2005-01-11 International Business Machines Corporation Adjusting timestamps to preserve update timing information for cached data objects
US7076508B2 (en) 2002-08-12 2006-07-11 International Business Machines Corporation Method, system, and program for merging log entries from multiple recovery log files
US7734681B2 (en) 2002-08-20 2010-06-08 Symantec Operating Corporation Inter-process messaging using multiple client-server pairs
US7028147B2 (en) 2002-12-13 2006-04-11 Sun Microsystems, Inc. System and method for efficiently and reliably performing write cache mirroring
US7334014B2 (en) 2003-01-03 2008-02-19 Availigent, Inc. Consistent time service for fault-tolerant distributed systems
US7024584B2 (en) 2003-01-09 2006-04-04 International Business Machines Corporation Method, system, and article of manufacture for maintaining data integrity
US7055009B2 (en) 2003-03-21 2006-05-30 International Business Machines Corporation Method, system, and program for establishing and maintaining a point-in-time copy
US7539976B1 (en) 2003-03-25 2009-05-26 Electric Cloud, Inc. System and method for intelligently distributing source files within a distributed program build architecture
EP1671200A4 (en) 2003-04-24 2007-10-17 Secureinfo Corp Automated electronic software distribution and management method and system
US7152077B2 (en) 2003-05-16 2006-12-19 Hewlett-Packard Development Company, L.P. System for redundant storage of data
US7380081B2 (en) 2003-06-06 2008-05-27 Hewlett-Packard Development Company, L.P. Asynchronous data redundancy technique
US7467168B2 (en) 2003-06-18 2008-12-16 International Business Machines Corporation Method for mirroring data at storage locations
US7065589B2 (en) 2003-06-23 2006-06-20 Hitachi, Ltd. Three data center remote copy system with journaling
US7660833B2 (en) 2003-07-10 2010-02-09 Microsoft Corporation Granular control over the authority of replicated information via fencing and unfencing
US8200775B2 (en) 2005-02-01 2012-06-12 Newsilike Media Group, Inc Enhanced syndication
US20050050115A1 (en) 2003-08-29 2005-03-03 Kekre Anand A. Method and system of providing cascaded replication
US7278049B2 (en) 2003-09-29 2007-10-02 International Business Machines Corporation Method, system, and program for recovery from a failure in an asynchronous data copying system
US7325109B1 (en) 2003-10-24 2008-01-29 Network Appliance, Inc. Method and apparatus to mirror data at two separate sites without comparing the data at the two sites
US7590807B2 (en) 2003-11-03 2009-09-15 Netapp, Inc. System and method for record retention date in a write once read many storage system
US7054960B1 (en) 2003-11-18 2006-05-30 Veritas Operating Corporation System and method for identifying block-level write operations to be transferred to a secondary site during replication
JP5166735B2 (en) 2003-12-19 2013-03-21 ネットアップ,インコーポレイテッド System and method capable of synchronous data replication in a very short update interval
US7039661B1 (en) 2003-12-29 2006-05-02 Veritas Operating Corporation Coordinated dirty block tracking
US20050154786A1 (en) 2004-01-09 2005-07-14 International Business Machines Corporation Ordering updates in remote copying of data
US7624109B2 (en) 2004-02-27 2009-11-24 Texas Memory Systems, Inc. Distributed asynchronous ordered replication
US20050278382A1 (en) 2004-05-28 2005-12-15 Network Appliance, Inc. Method and apparatus for recovery of a current read-write unit of a file system
GB0412609D0 (en) 2004-06-05 2004-07-07 Ibm Storage system with inhibition of cache destaging
GB0416074D0 (en) 2004-07-17 2004-08-18 Ibm Controlling data consistency guarantees in storage apparatus
US8090880B2 (en) 2006-11-09 2012-01-03 Microsoft Corporation Data consistency within a federation infrastructure
US7386676B2 (en) 2005-01-21 2008-06-10 International Buiness Machines Coporation Data coherence system
US7562077B2 (en) 2005-03-28 2009-07-14 Netapp, Inc. Method and apparatus for generating and describing block-level difference information about two snapshots
TWI275101B (en) 2005-05-24 2007-03-01 Prolific Technology Inc Flash memory storage system
US7676539B2 (en) 2005-06-09 2010-03-09 International Business Machines Corporation Methods, apparatus and computer programs for automated problem solving in a distributed, collaborative environment
US7467265B1 (en) 2005-06-30 2008-12-16 Symantec Operating Corporation System and method for block conflict resolution within consistency interval marker based replication
JP2009501382A (en) 2005-07-14 2009-01-15 ヨッタ ヨッタ, インコーポレイテッド Maintaining writing order fidelity in multi-writer systems
US7376796B2 (en) * 2005-11-01 2008-05-20 Network Appliance, Inc. Lightweight coherency control protocol for clustered storage system
US7653668B1 (en) 2005-11-23 2010-01-26 Symantec Operating Corporation Fault tolerant multi-stage data replication with relaxed coherency guarantees
US7651593B2 (en) * 2005-12-19 2010-01-26 Commvault Systems, Inc. Systems and methods for performing data replication
US7617253B2 (en) * 2005-12-19 2009-11-10 Commvault Systems, Inc. Destination systems and methods for performing data replication
US7617262B2 (en) 2005-12-19 2009-11-10 Commvault Systems, Inc. Systems and methods for monitoring application data in a data replication system
US7496786B2 (en) 2006-01-10 2009-02-24 Stratus Technologies Bermuda Ltd. Systems and methods for maintaining lock step operation
US7702870B2 (en) 2006-01-19 2010-04-20 Network Appliance Inc. Method and apparatus for defragmentation and for detection of relocated blocks
US7864817B2 (en) 2006-01-19 2011-01-04 Ciena Corporation Transport systems and methods incorporating absolute time references and selective buildout delays
US20070208790A1 (en) 2006-03-06 2007-09-06 Reuter James M Distributed data-storage system
US7644308B2 (en) 2006-03-06 2010-01-05 Hewlett-Packard Development Company, L.P. Hierarchical timestamps
US20070214194A1 (en) 2006-03-07 2007-09-13 James Reuter Consistency methods and systems
US7571268B2 (en) 2006-04-06 2009-08-04 International Business Machines Corporation Consistent updates across storage subsystems coupled to a plurality of primary and secondary units at selected times
US7478210B2 (en) 2006-06-09 2009-01-13 Intel Corporation Memory reclamation with optimistic concurrency
US7562203B2 (en) 2006-09-27 2009-07-14 Network Appliance, Inc. Storage defragmentation based on modified physical address and unmodified logical address
US7726236B2 (en) 2007-02-20 2010-06-01 Pizza Hut, Inc. Sandwich maker
US7925629B2 (en) 2007-03-28 2011-04-12 Netapp, Inc. Write ordering style asynchronous replication utilizing a loosely-accurate global clock
US8290899B2 (en) 2007-03-28 2012-10-16 Netapp, Inc. Group stamping style asynchronous replication utilizing a loosely-accurate global clock
US8150800B2 (en) 2007-03-28 2012-04-03 Netapp, Inc. Advanced clock synchronization technique
US7900003B2 (en) 2007-04-20 2011-03-01 International Business Machines Corporation System, method and computer program product for storing an information block
JP5026213B2 (en) 2007-09-28 2012-09-12 株式会社日立製作所 Storage apparatus and data deduplication method
US7814074B2 (en) 2008-03-14 2010-10-12 International Business Machines Corporation Method and system for assuring integrity of deduplicated data
US7937371B2 (en) 2008-03-14 2011-05-03 International Business Machines Corporation Ordering compression and deduplication of data
US7984022B2 (en) 2008-04-18 2011-07-19 International Business Machines Corporation Space recovery with storage management coupled with a deduplicating storage system
US7996371B1 (en) 2008-06-10 2011-08-09 Netapp, Inc. Combining context-aware and context-independent data deduplication for optimal space savings
US8099571B1 (en) 2008-08-06 2012-01-17 Netapp, Inc. Logical block replication with deduplication
US8655840B2 (en) 2008-12-03 2014-02-18 Nokia Corporation Method, apparatus and computer program product for sub-file level synchronization
US7962447B2 (en) 2008-12-30 2011-06-14 International Business Machines Corporation Accessing a hierarchical database using service data objects (SDO) via a data access service (DAS)
US8321380B1 (en) 2009-04-30 2012-11-27 Netapp, Inc. Unordered idempotent replication operations
US8356017B2 (en) 2009-08-11 2013-01-15 International Business Machines Corporation Replication of deduplicated data
US8799367B1 (en) 2009-10-30 2014-08-05 Netapp, Inc. Using logical block addresses with generation numbers as data fingerprints for network deduplication
US8473690B1 (en) 2009-10-30 2013-06-25 Netapp, Inc. Using logical block addresses with generation numbers as data fingerprints to provide cache coherency

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5812773A (en) * 1996-07-12 1998-09-22 Microsoft Corporation System and method for the distribution of hierarchically structured data
US6993539B2 (en) * 2002-03-19 2006-01-31 Network Appliance, Inc. System and method for determining changes in two snapshots and for transmitting changes to destination snapshot
US7243115B2 (en) * 2002-03-19 2007-07-10 Network Appliance, Inc. System and method for asynchronous mirroring of snapshots at a destination using a purgatory directory and inode mapping
US7237076B2 (en) * 2003-03-18 2007-06-26 Hitachi, Ltd. Method of maintaining a plurality of snapshots, server apparatus and storage apparatus
US20060095480A1 (en) * 2004-10-29 2006-05-04 Microsoft Corporation Method and subsystem for performing subset computation for replication topologies
US20060106895A1 (en) * 2004-11-12 2006-05-18 Microsoft Corporation Method and subsystem for performing metadata cleanup for replication topologies
US20070256055A1 (en) * 2004-11-19 2007-11-01 Adrian Herscu Method for building component-software for execution in a standards-compliant programming environment
US7885923B1 (en) * 2006-06-30 2011-02-08 Symantec Operating Corporation On demand consistency checkpoints for temporal volumes within consistency interval marker based replication
US20100250497A1 (en) * 2007-01-05 2010-09-30 Redlich Ron M Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor

Cited By (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11650976B2 (en) 2011-10-14 2023-05-16 Pure Storage, Inc. Pattern matching using hash tables in storage system
US11652884B2 (en) 2014-06-04 2023-05-16 Pure Storage, Inc. Customized hash algorithms
US10838633B2 (en) 2014-06-04 2020-11-17 Pure Storage, Inc. Configurable hyperconverged multi-tenant storage system
US11500552B2 (en) 2014-06-04 2022-11-15 Pure Storage, Inc. Configurable hyperconverged multi-tenant storage system
US11385799B2 (en) 2014-06-04 2022-07-12 Pure Storage, Inc. Storage nodes supporting multiple erasure coding schemes
US11822444B2 (en) 2014-06-04 2023-11-21 Pure Storage, Inc. Data rebuild independent of error detection
US11310317B1 (en) 2014-06-04 2022-04-19 Pure Storage, Inc. Efficient load balancing
US11593203B2 (en) 2014-06-04 2023-02-28 Pure Storage, Inc. Coexisting differing erasure codes
US11138082B2 (en) 2014-06-04 2021-10-05 Pure Storage, Inc. Action determination based on redundancy level
US11671496B2 (en) 2014-06-04 2023-06-06 Pure Storage, Inc. Load balacing for distibuted computing
US11079962B2 (en) 2014-07-02 2021-08-03 Pure Storage, Inc. Addressable non-volatile random access memory
US10817431B2 (en) 2014-07-02 2020-10-27 Pure Storage, Inc. Distributed storage addressing
US11604598B2 (en) 2014-07-02 2023-03-14 Pure Storage, Inc. Storage cluster with zoned drives
US11886308B2 (en) 2014-07-02 2024-01-30 Pure Storage, Inc. Dual class of service for unified file and object messaging
US11385979B2 (en) 2014-07-02 2022-07-12 Pure Storage, Inc. Mirrored remote procedure call cache
US11922046B2 (en) 2014-07-02 2024-03-05 Pure Storage, Inc. Erasure coded data within zoned drives
US11392522B2 (en) 2014-07-03 2022-07-19 Pure Storage, Inc. Transfer of segmented data
US11550752B2 (en) 2014-07-03 2023-01-10 Pure Storage, Inc. Administrative actions via a reserved filename
US11494498B2 (en) 2014-07-03 2022-11-08 Pure Storage, Inc. Storage data decryption
US11928076B2 (en) 2014-07-03 2024-03-12 Pure Storage, Inc. Actions for reserved filenames
US11204830B2 (en) 2014-08-07 2021-12-21 Pure Storage, Inc. Die-level monitoring in a storage cluster
US11544143B2 (en) 2014-08-07 2023-01-03 Pure Storage, Inc. Increased data reliability
US11442625B2 (en) 2014-08-07 2022-09-13 Pure Storage, Inc. Multiple read data paths in a storage system
US11656939B2 (en) 2014-08-07 2023-05-23 Pure Storage, Inc. Storage cluster memory characterization
US11620197B2 (en) 2014-08-07 2023-04-04 Pure Storage, Inc. Recovering error corrected data
US11188476B1 (en) 2014-08-20 2021-11-30 Pure Storage, Inc. Virtual addressing in a storage system
US11734186B2 (en) 2014-08-20 2023-08-22 Pure Storage, Inc. Heterogeneous storage with preserved addressing
US11775428B2 (en) 2015-03-26 2023-10-03 Pure Storage, Inc. Deletion immunity for unreferenced data
US11240307B2 (en) 2015-04-09 2022-02-01 Pure Storage, Inc. Multiple communication paths in a storage system
US11722567B2 (en) 2015-04-09 2023-08-08 Pure Storage, Inc. Communication paths for storage devices having differing capacities
US11144212B2 (en) 2015-04-10 2021-10-12 Pure Storage, Inc. Independent partitions within an array
US11675762B2 (en) 2015-06-26 2023-06-13 Pure Storage, Inc. Data structures for key management
US11704073B2 (en) 2015-07-13 2023-07-18 Pure Storage, Inc Ownership determination for accessing a file
US11740802B2 (en) 2015-09-01 2023-08-29 Pure Storage, Inc. Error correction bypass for erased pages
US11893023B2 (en) 2015-09-04 2024-02-06 Pure Storage, Inc. Deterministic searching using compressed indexes
US11838412B2 (en) 2015-09-30 2023-12-05 Pure Storage, Inc. Secret regeneration from distributed shares
US11567917B2 (en) 2015-09-30 2023-01-31 Pure Storage, Inc. Writing data and metadata into storage
US11489668B2 (en) 2015-09-30 2022-11-01 Pure Storage, Inc. Secret regeneration in a storage system
US11582046B2 (en) 2015-10-23 2023-02-14 Pure Storage, Inc. Storage system communication
US11204701B2 (en) 2015-12-22 2021-12-21 Pure Storage, Inc. Token based transactions
US11550473B2 (en) 2016-05-03 2023-01-10 Pure Storage, Inc. High-availability storage array
US11847320B2 (en) 2016-05-03 2023-12-19 Pure Storage, Inc. Reassignment of requests for high availability
US11861188B2 (en) 2016-07-19 2024-01-02 Pure Storage, Inc. System having modular accelerators
US11409437B2 (en) 2016-07-22 2022-08-09 Pure Storage, Inc. Persisting configuration information
US11886288B2 (en) 2016-07-22 2024-01-30 Pure Storage, Inc. Optimize data protection layouts based on distributed flash wear leveling
US11604690B2 (en) 2016-07-24 2023-03-14 Pure Storage, Inc. Online failure span determination
US11734169B2 (en) 2016-07-26 2023-08-22 Pure Storage, Inc. Optimizing spool and memory space management
US11030090B2 (en) 2016-07-26 2021-06-08 Pure Storage, Inc. Adaptive data migration
US11886334B2 (en) 2016-07-26 2024-01-30 Pure Storage, Inc. Optimizing spool and memory space management
US11797212B2 (en) 2016-07-26 2023-10-24 Pure Storage, Inc. Data migration for zoned drives
US11340821B2 (en) 2016-07-26 2022-05-24 Pure Storage, Inc. Adjustable migration utilization
US11922033B2 (en) 2016-09-15 2024-03-05 Pure Storage, Inc. Batch data deletion
US11656768B2 (en) 2016-09-15 2023-05-23 Pure Storage, Inc. File deletion in a distributed system
US11922070B2 (en) 2016-10-04 2024-03-05 Pure Storage, Inc. Granting access to a storage device based on reservations
US11842053B2 (en) 2016-12-19 2023-12-12 Pure Storage, Inc. Zone namespace
US11307998B2 (en) 2017-01-09 2022-04-19 Pure Storage, Inc. Storage efficiency of encrypted host system data
US11289169B2 (en) 2017-01-13 2022-03-29 Pure Storage, Inc. Cycled background reads
US10942869B2 (en) 2017-03-30 2021-03-09 Pure Storage, Inc. Efficient coding in a storage system
US11592985B2 (en) 2017-04-05 2023-02-28 Pure Storage, Inc. Mapping LUNs in a storage memory
US11722455B2 (en) 2017-04-27 2023-08-08 Pure Storage, Inc. Storage cluster address resolution
US11869583B2 (en) 2017-04-27 2024-01-09 Pure Storage, Inc. Page write requirements for differing types of flash memory
US11782625B2 (en) 2017-06-11 2023-10-10 Pure Storage, Inc. Heterogeneity supportive resiliency groups
US11689610B2 (en) 2017-07-03 2023-06-27 Pure Storage, Inc. Load balancing reset packets
US11190580B2 (en) 2017-07-03 2021-11-30 Pure Storage, Inc. Stateful connection resets
US11714708B2 (en) 2017-07-31 2023-08-01 Pure Storage, Inc. Intra-device redundancy scheme
US11604585B2 (en) 2017-10-31 2023-03-14 Pure Storage, Inc. Data rebuild when changing erase block sizes during drive replacement
US11704066B2 (en) 2017-10-31 2023-07-18 Pure Storage, Inc. Heterogeneous erase blocks
US11086532B2 (en) 2017-10-31 2021-08-10 Pure Storage, Inc. Data rebuild with changing erase block sizes
US11074016B2 (en) 2017-10-31 2021-07-27 Pure Storage, Inc. Using flash storage devices with different sized erase blocks
US11741003B2 (en) 2017-11-17 2023-08-29 Pure Storage, Inc. Write granularity for storage system
US11797211B2 (en) 2018-01-31 2023-10-24 Pure Storage, Inc. Expanding data structures in a storage system
US11442645B2 (en) 2018-01-31 2022-09-13 Pure Storage, Inc. Distributed storage system expansion mechanism
US11847013B2 (en) 2018-02-18 2023-12-19 Pure Storage, Inc. Readable data determination
US11836348B2 (en) 2018-04-27 2023-12-05 Pure Storage, Inc. Upgrade for system with differing capacities
US11846968B2 (en) 2018-09-06 2023-12-19 Pure Storage, Inc. Relocation of data for heterogeneous storage systems
US11354058B2 (en) 2018-09-06 2022-06-07 Pure Storage, Inc. Local relocation of data stored at a storage device of a storage system
US11868309B2 (en) 2018-09-06 2024-01-09 Pure Storage, Inc. Queue management for data relocation
US11899582B2 (en) 2019-04-12 2024-02-13 Pure Storage, Inc. Efficient memory dump
US11822807B2 (en) 2019-06-24 2023-11-21 Pure Storage, Inc. Data replication in a storage system
US11281394B2 (en) 2019-06-24 2022-03-22 Pure Storage, Inc. Replication across partitioning schemes in a distributed storage system
US11893126B2 (en) 2019-10-14 2024-02-06 Pure Storage, Inc. Data deletion for a multi-tenant environment
US11704192B2 (en) 2019-12-12 2023-07-18 Pure Storage, Inc. Budgeting open blocks based on power loss protection
US11847331B2 (en) 2019-12-12 2023-12-19 Pure Storage, Inc. Budgeting open blocks of a storage unit based on power loss prevention
US11416144B2 (en) 2019-12-12 2022-08-16 Pure Storage, Inc. Dynamic use of segment or zone power loss protection in a flash device
US11947795B2 (en) 2019-12-12 2024-04-02 Pure Storage, Inc. Power loss protection based on write requirements
US11656961B2 (en) 2020-02-28 2023-05-23 Pure Storage, Inc. Deallocation within a storage system
US11775491B2 (en) 2020-04-24 2023-10-03 Pure Storage, Inc. Machine learning model for storage system
US11789626B2 (en) 2020-12-17 2023-10-17 Pure Storage, Inc. Optimizing block allocation in a data storage system
US11614880B2 (en) 2020-12-31 2023-03-28 Pure Storage, Inc. Storage system with selectable write paths
US11847324B2 (en) 2020-12-31 2023-12-19 Pure Storage, Inc. Optimizing resiliency groups for data regions of a storage system
US11507597B2 (en) 2021-03-31 2022-11-22 Pure Storage, Inc. Data replication to meet a recovery point objective
US11955187B2 (en) 2022-02-28 2024-04-09 Pure Storage, Inc. Refresh of differing capacity NAND

Also Published As

Publication number Publication date
US10852958B2 (en) 2020-12-01
US8671072B1 (en) 2014-03-11
US9858001B2 (en) 2018-01-02
US20180165026A1 (en) 2018-06-14
US20160139843A1 (en) 2016-05-19
US20140136805A1 (en) 2014-05-15

Similar Documents

Publication Publication Date Title
US10852958B2 (en) System and method for hijacking inodes based on replication operations received in an arbitrary order
US11880343B2 (en) Unordered idempotent logical replication operations
US9280288B2 (en) Using logical block addresses with generation numbers as data fingerprints for network deduplication
US11494088B2 (en) Push-based piggyback system for source-driven logical replication in a storage environment
US9372794B2 (en) Using logical block addresses with generation numbers as data fingerprints to provide cache coherency
US8321380B1 (en) Unordered idempotent replication operations
US8099571B1 (en) Logical block replication with deduplication
US8484164B1 (en) Method and system for providing substantially constant-time execution of a copy operation
US7809693B2 (en) System and method for restoring data on demand for instant volume restoration
US20110016085A1 (en) Method and system for maintaining multiple inode containers in a storage server
US7593973B2 (en) Method and apparatus for transferring snapshot data
US8296260B2 (en) System and method for managing data deduplication of storage systems utilizing persistent consistency point images
US8126847B1 (en) Single file restore from image backup by using an independent block list for each file
US8538924B2 (en) Computer system and data access control method for recalling the stubbed file on snapshot
US20100125598A1 (en) Architecture for supporting sparse volumes
US9832260B2 (en) Data migration preserving storage efficiency
EP1882223B1 (en) System and method for restoring data on demand for instant volume restoration
US9519590B1 (en) Managing global caches in data storage systems

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8