WO2005022389A2 - Method and system of providing cascaded replication - Google Patents

Method and system of providing cascaded replication Download PDF

Info

Publication number
WO2005022389A2
WO2005022389A2 PCT/US2004/027933 US2004027933W WO2005022389A2 WO 2005022389 A2 WO2005022389 A2 WO 2005022389A2 US 2004027933 W US2004027933 W US 2004027933W WO 2005022389 A2 WO2005022389 A2 WO 2005022389A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
data volume
written
replicating
Prior art date
Application number
PCT/US2004/027933
Other languages
French (fr)
Other versions
WO2005022389A3 (en
Inventor
Anand A. Kekre
Original Assignee
Veritas Operating Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Veritas Operating Corporation filed Critical Veritas Operating Corporation
Publication of WO2005022389A2 publication Critical patent/WO2005022389A2/en
Publication of WO2005022389A3 publication Critical patent/WO2005022389A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2058Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using more than 2 mirrored copies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers
    • G06F11/2074Asynchronous techniques

Definitions

  • the present invention relates to data storage and retrieval generally and more particularly to a method and system of providing cascaded replication.
  • the remote replicated data copy may be utilized, ensuring data integrity and availability.
  • Replication is frequently coupled with other high-availability techniques such as clustering to provide an extremely robust data storage solution.
  • Metrics typically used to assess or design a particular replication system include recovery point or recovery point objective (RPO) and recovery time or recovery time objective (RTO) performance metrics as well as a total cost of ownership (TCO) metric.
  • the RPO metric is used to indicate the point (e.g., in time) to which data (e.g., application data, system state, and the like) must be recovered by a replication system.
  • data e.g., application data, system state, and the like
  • the RTO metric is used to indicate the time within which systems, applications, and/or functions associated with the replication system must be recovered.
  • a replication system would provide for instantaneous and complete recovery of data from one or more remote sites at a great distance from the data-generating primary node. The high costs associated with the high-speed link(s) required by such optimal replication systems have discouraged their implementation however in all but a small number of application environments.
  • a number of replication systems have been implemented in which such short-distance, high-speed/frequency replication (e.g., real-time or synchronous replication) is coupled (e.g., cascaded) with long-distance, low-speed/frequency replication.
  • Fig. 1 illustrates a cascaded replication system according to the prior art.
  • synchronous replication is performed between a primary node 100 and an intermediary node 102 while periodic replication is performed between intermediary node 102 and a secondary node 104.
  • a single intermediary node 102 has been illustrated in the system of Fig. 1 it should be understood that additional intermediary nodes may be provided serially or in parallel between primary node 100 and secondary node 104.
  • Primary node 100 of the illustrated system includes an application 106 (e.g, a database application) coupled to a data volume 108 or other storage area via a replication facility 110.
  • application 106 e.g, a database application
  • Primary node 100 additionally includes a storage replicator log (SRL) 112 used to effect replication (e.g., synchronous replication).
  • SRL 112 is used to store or "journal" data to be written by one or more write operations requested by an application such as application 106 during primary node 100's operation.
  • Intermediary node 102 of the illustrated prior art replication system includes a replication facility 114, a data volume 116, and a snapshot data volume 118 as shown.
  • replication facility 110 intercepts the write.
  • Replication facility 110 then writes the data to be written by the requested write operation to storage replicator log (SRL) 112. It is not required that such data is written to a storage replicator log, although a storage replicator log is valuable in assisting with recovery upon node failure.
  • the data may be written directly to data volume 108 or into a memory buffer that is later copied to data volume 108.
  • Replication facility 110 then replicates the data to be written to data volume 116 within intermediary node 102.
  • such replication is performed by copying the data to be written, and transferring the generated copy to data volume 116.
  • Replication facility 110 then asynchronously issues a write operation to write the data to be written locally to data volume 108.
  • writing the data to the SRL 112 writing data to local data volume 108 and transferring a copy of the data to be written to intermediary node may start and/or complete in any order or may be performed in parallel.
  • the data is then written to data volume 108.
  • replication facility 110 waits until an acknowledgement is received from replication facility 114 before notifying application 106 that the write operation is complete.
  • the described data transfer between primary node 100 and intermediary node 102 is performed over a communication link (e.g., a communications network and storage area network (SAN)) between the nodes.
  • a communication link e.g., a communications network and storage area network (SAN)
  • replication facility 114 on intermediary node 102 issues a write command to write the data directly to data volume 116.
  • Primary node 100, intermediary node 102, and secondary node 104 may include more or fewer components in alternative prior art embodiments.
  • primary node 100 may include additional data volumes beyond data volume 108 and/or a data volume or storage area manager used to coordinate the storage of data within any associated data volume.
  • data volume 124 within secondary node 104 is periodically updated with changes resulting from write operations on data volume 116 over a period of time. At the beginning of an initial time period a snapshot data volume 118 is created corresponding to data volume 116.
  • Fig. 2 illustrates a one-to-many replication system used in place of a cascaded replication system according to the prior art.
  • data is synchronously replicated between a data volume 208 within a primary node 200 and a data volume 216 within a first secondary node 202 as data is periodically replicated between the data volume 208 within the primary node 200 and a data volume 224 within a second secondary node 204 as described in more detail herein.
  • a significant shortcoming of the illustrated one-to-many replication system is that substantial resources of primary node 200 and its associated replication facility 210 are required to perform the multiple illustrated replication operations.
  • data is asynchronously replicated between data volumes of two or more nodes within a cascaded replication system.
  • Data may be asynchronously replicated between a primary data volume and an intermediary data volume or alternatively between an intermediary data volume and one or more secondary data volumes.
  • Embodiments of the present invention may be used to quickly and reliably replicate data to one or more secondary nodes while reducing replication costs and write operation latency.
  • data may be replicated initially over a- relatively shorter and higher cost/bandwidth link and subsequently over a comparatively longer and lower cost/bandwidth link while write operation latency for applications and primary node loading are reduced as compared with conventional synchronous replication-based cascaded replication systems.
  • FIG. 1 illustrates a cascaded replication system according to the prior art
  • Fig. 2 illustrates a one-to-many replication system used in place of a cascaded replication system according to the prior art
  • Fig. 3 illustrates a cascaded replication system according to a first embodiment of the present invention
  • Fig. 4 illustrates a cascaded replication system according to a second embodiment of the present invention
  • Fig. 5 illustrates a cascaded replication system including a replication multiplexer according to an embodiment of the present invention
  • Fig. 6 illustrates a cascaded replication process according to an embodiment of the present invention
  • Fig. 7 illustrates an exemplary data processing system useable with one or more embodiments of the present invention.
  • references within the specification to "one embodiment” or “an embodiment” are intended to indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention.
  • the appearance of the phrase “in one embodiment” in various places within the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
  • various features are described which may be exhibited by some embodiments and not by others.
  • various requirements are described which may be requirements for some embodiments but not other embodiments.
  • Fig. 3 illustrates a cascaded replication system according to a first embodiment of the present invention.
  • asynchronous replication is performed between a primary node 300 and an intermediary node 302 thus reducing application write operation latency and cost while meeting desired recovery point objectives.
  • Asynchronous replication utilizes a log area (e.g., a storage replicator log) to stage write operations such that the write operation can return as soon as data associated with the write operation (e.g., the data to be written, metadata, and the like) has been logged (i.e., stored) to this log area.
  • Asynchronous replication requires write ordering (e.g., at a secondary node) to ensure that each replicated data volume is consistent.
  • writes are ordered by tagging each write with a globally increasing sequence number.
  • a distributed environment e.g., SAN Volume Manager or Cluster
  • this sequence number may be used to maintain the write order across various nodes (hosts, switches, appliances etc).
  • the log or "journal” may alternately be shared or exclusive to each node of a group of nodes.
  • Replication between intermediary node 302 and secondary node 304 may then in turn be performed using one of several replication techniques (e.g., asynchronous and/or periodic replication) according to alternative embodiments of the present invention.
  • periodic replication a site or node (e g , a secondary node) is pe ⁇ odically updated with changes that have been written (e g , to an intermediary node) over a period of time
  • Primary node 300 of the illustrated system includes an application 306 (e g , a database application) coupled to a data volume 308 or other storage area via a replication facility 310 such as the Volume Replicator product provided by VERITAS Softwaie Corporation of Mountain View, California
  • an application 306 e g , a database application
  • a replication facility 310 such as the Volume Replicator product provided by VERITAS Softwaie Corporation of Mountain View, California
  • cascaded data replication is performed on more than two levels (e g , intermediary node and secondary node) as illustrated in Fig 3
  • replication frequency e g , periodic replication frequency
  • Primary node 300 additionally includes a storage replicator log (SRL) 312 used to effect the desc ⁇ bed asynchronous replication
  • SRL 312 is used to store or "journal" data to be w ⁇ tten by one or more write operations requested by an application such as application 306 during p ⁇ mary node 300's operation
  • asynchronous replication may be effected by tracking data changes using a bitmap or an extent map rather than a log such as SRL 312 It is assumed for purposes of this description that the data volumes of primary node 300, intermediary node 302, and secondary node 304 are initially synchromzed Intermediary node 302 of the illustrated prior art replication system includes a replication facility 314, and a data volume 316 as shown
  • replication facility 310 intercepts the w ⁇ te Replication facility 310 then w ⁇ tes the data to be w ⁇ tten by the requested w ⁇ te operation to storage replicator log (SRL) 312 It is not required that such data is w ⁇ tten to a storage replicator log, although a storage replicator log is valuable in assisting with recovery upon node failure
  • SRL storage replicator log
  • m one alternate embodiment such data can be tracked m bitmap or extent map to facilitate recovery
  • the data may be written directly to data volume 308 or into a memory buffer that is later copied to data volume 308 As soon as the data has been written to the SRL or tracked in a bit/extent map, replication facility 310 on node 300 may notify application 306 that the write operation is complete
  • the data is w ⁇ tten to data volume 308 and replicated to data volume 316 within intermediary node 302 by replication facility 310
  • such replication is performed by copying the data to be w ⁇ tten, and transferring the generated copy to data volume 316
  • confirmation of the intermediary node's receipt of the data to be written as well as the actual w ⁇ te operation of such data to data volume 316 may be tiansmitted to the p ⁇ mary node
  • the data to be w ⁇ tten may logged (e g , using an SRL, not shown) within the intermediary node as part of such replication according to one embodiment of the present invention
  • the described data transfer between primary node 300 and intermediary node 302 is performed over a communication link (e.g., a communications network and SAN) between the nodes.
  • a communication link e.g., a communications network and SAN
  • primary node 300, intermediary node 302, and secondary node 304 may include more or fewer components.
  • primary node 300 may include additional data volumes beyond data volume 308 and or a data volume or storage area manager used to coordinate the storage of data within any associated data volume.
  • Fig. 4 illustrates a cascaded replication system according to a second embodiment of the present invention.
  • asynchronous replication is performed between an intermediary node 402 and a secondary node 404 while replication between a primary node 400 and intermediary node 402 is performed using one of several replication techniques (e.g., synchronous, asynchronous, and/or periodic replication) according to alternative embodiments of the present invention.
  • Asynchronous replication in the illustrated embodiment is performed as described with respect to Fig. 3 herein.
  • replication repeaters and/or multiplexers may be provided.
  • a replication node acting as a repeater or multiplexer according to one embodiment of the present invention includes limited or specialized data volume replication functionality.
  • Exemplary repeaters and multiplexers include, but are not limited to, local multiplexers used to relieve a primary node from performing n-way replication (i.e., replication to "n" secondary nodes, where n is an integer) over a local area network (LAN); remote multiplexers used to perform such n-way replication across greater physical distance (e.g., using a Wide Area Network (WAN)); and repeaters.
  • a repeater may be considered a specialized local or remote multiplexer having a single target node (i.e., where n is equal to one).
  • a replication repeater is provided according to one embodiment of the present invention utilizing two or more space-saving volumes (e.g., VI and V2) such as are described for example in United States Patent Number 10/436,354, entitled, "Method and System of Providing Periodic Replication" stored on a space saving construct (e.g., cache structured storage and/or log structured storage) that can be used to alternately store incremental transfers or updates.
  • a space saving construct e.g., cache structured storage and/or log structured storage
  • one of the space-saving volumes may be used to accept data from a first node (e.g., a primary node) while the other space-saving volume is used to transfer data received during a prior incremental transfer to a second node (e.g., a secondary node).
  • FIG. 5 illustrates a cascaded replication system including a replication multiplexer according to an embodiment of the present invention.
  • data is first replicated from a primary node 500 to an intermediary node 502 which in turn is used as a replication multiplexer to replicate data to multiple target secondary nodes 504A and 504B. While a single intermediary node 502 and two secondary nodes 504 have been illustrated in the system of Fig.
  • FIG. 6 illustrates a cascaded replication process according to an embodiment of the present invention.
  • a first data volume of a primary node, a second data volume of an intermediary node, and a third data volume of a secondary node are initially synchronized (process block 602).
  • Such initially synchronization may be implemented according to various embodiments of the present invention using data transfer from one node data processing system to another across a network, tape or other persistent backup and restore capabilities, or one or more snapshots or portions thereof.
  • a request to perform a write operation on the first data volume is intercepted (e.g., using a replication facility) (process block 604). Thereafter, the data to be written by the intercepted write operation request is stored within a storage replicator log (SRL) at the primary node (process block 606) before the original requesting application is notified of the successful completion of the write (process block 608).
  • SRL storage replicator log
  • the data can alternately be marked in a bitmap or extent map before indicating successful completion of the write operation to the application.
  • the data to be written by the requested write operation is replicated to the second data volume at the intermediary node (process block 610).
  • the described replication includes a cascaded replication operation from the second data volume to the third data volume of the secondary node (not illustrated).
  • the described cascaded replication may be implemented as asynchronous replication and/or periodic replication.
  • the data to be written After the data to be written has been replicated as described, it is the stored within the first data volume at the primary node (process block 612). While operations of the illustrated process embodiment of Fig. 6 have been illustrated as being performed serially for clarity, it should be appreciated that one or more of such operations (e.g., the operations depicted by process blocks 608 through 612) may be performed in parallel in alternative embodiments of the present invention.
  • Fig. 7 illustrates an exemplary data processing system useable with one or more embodiments of the present invention.
  • Data processing system 710 includes a bus 712 which interconnects major subsystems of data processing system 710, such as a central processor 714, a system memory 717 (typically RAM, but which may also include ROM, flash RAM, or the like), an input output controller 718, an external audio device, such as a speaker system 720 via an audio output interface 722, an external device, such as a display screen 724 via display adapter 726, serial ports 728 and 730, a keyboard 732 (interfaced with a keyboard controller 733), a storage interface 734, a floppy disk drive 737 operative to receive a floppy disk 738, a host bus adapter (HBA) interface card 735A operative to connect with a fibre channel network 790, a host bus adapter (HBA) interface card 735B operative to connect to a SCSI bus 739, and an optical disk drive 740 operative to receive an optical disk 742.
  • a host bus adapter (HBA) interface card 735A operative to
  • mouse 746 or other point-and-click device, coupled to bus 712 via serial port 728
  • modem 747 coupled to bus 712 via serial port 730
  • network interface 748 coupled directly to bus 712.
  • Bus 712 allows data communication between central processor 714 and system memory 717, which may include read-only memory (ROM) or flash memory (neither shown), and random access memory (RAM) (not shown), as previously noted.
  • the RAM is generally the main memory into which the operating system and application programs are loaded and typically affords at least 64 megabytes of memory space.
  • the ROM or flash memory may contain, among other code, the Basic Input-Output system (BIOS) which controls basic hardware operation such as the interaction with peripheral components.
  • BIOS Basic Input-Output system
  • Applications resident with data processing system 710 are generally stored on and accessed via a computer readable medium, such as a hard disk drive (e.g., fixed disk 744), an optical drive (e.g., optical drive 740), floppy disk unit 737 or other storage medium. Additionally, applications may be in the form of electronic signals modulated in accordance with the application and data communication technology when accessed via network modem 747 or interface 748.
  • Storage interface 734 may connect to a standard computer readable medium for storage and/or retrieval of information, such as a fixed disk drive 744.
  • Fixed disk drive 744 may be a part of data processing system 710 or may be separate and accessed through other interface systems.
  • Modem 747 may provide a direct connection to a remote server via a telephone link or to the Internet via an internet service provider (ISP).
  • ISP internet service provider
  • Network interface 748 may provide a direct connection to a remote server via a direct network link to the Internet via a POP (point of presence).
  • Network interface 748 may provide such connection using wireless techniques, including digital cellular telephone connection, Cellular Digital Packet Data (CDPD) connection, digital satellite data connection or the like.
  • CDPD Cellular Digital Packet Data
  • data processing system 710 may be any kind of computing device, and so includes personal data assistants (PDAs), network appliances, X-window terminals or other such computing devices.
  • PDAs personal data assistants
  • the operating system provided on data processing system 710 may be MS-DOS®, MS-WINDOWS®, OS/2®, UNIX®, Linux®, or another known operating system.
  • Data processing system 710 also supports a number of Internet access tools, including, for example, an HTTP- compliant web browser having a JavaScript interpreter, such as Netscape Navigator®, Microsoft Explorer®, and the like.
  • exemplary data processing systems may include one or more hosts, network switches, appliance and/or storage arrays and may implement in-band and/or out-of-band storage or data volume virtualization.
  • Examples of such signal bearing media include recordable media such as floppy disks and CD-ROM, transmission type media such as digital and analog communications links, as well as media storage and distribution systems developed in the future.
  • embodiments of the present invention are not limited to a particular type of data processing or computer system. Rather, embodiments of the present invention may be implemented in a wide variety of data processing systems (e.g., host computer systems, network switches, network appliances, and/or disk arrays).
  • the above-discussed embodiments may be implemented using software modules which perform certain tasks.
  • the software modules discussed herein may include script, batch, or other executable files.
  • the software modules may be stored on a machine-readable or computer-readable storage medium such as a disk drive.
  • Storage devices used for storing software modules in accordance with an embodiment of the invention may be magnetic floppy disks, hard disks, or optical discs such as CD-ROMs or CD-Rs, for example.
  • a storage device used for storing firmware or hardware modules in accordance with an embodiment of the invention may also include a semiconductor-based memory, which may be permanently, removably or remotely coupled to a microprocessor/memory system.
  • the modules may be stored within a computer system memory to configure the computer system to perform the functions of the module.
  • Other new and various types of computer-readable storage media may be used to store the modules discussed herein.

Abstract

A method and system of providing cascaded replication is disclosed. According to one embodiment a method is provided in which data to be written to a data volume of a first node is replicated to a data volume of a second node and data to be written to the data volume of the second node is replicated to a data volume of a third node where at least one of the replication operations includes asynchronous data replication.

Description

METHOD AND SYSTEM OF PROVIDING CASCADED REPLICATION Anand A. Kekre
BACKGROUND
Technical Field The present invention relates to data storage and retrieval generally and more particularly to a method and system of providing cascaded replication.
Description of the Related Art Information drives business. Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss, from natural disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss, and recover quickly with useable data. Replication is one technique utilized to rrmiimize data loss and improve the availability of data in which a replicated copy of data is distributed and stored at one or more remote sites or nodes. In the event of a site migration, failure of one or more physical disks storing data or of a node or host data processing system associated with such a disk, the remote replicated data copy may be utilized, ensuring data integrity and availability. Replication is frequently coupled with other high-availability techniques such as clustering to provide an extremely robust data storage solution. Metrics typically used to assess or design a particular replication system include recovery point or recovery point objective (RPO) and recovery time or recovery time objective (RTO) performance metrics as well as a total cost of ownership (TCO) metric.
The RPO metric is used to indicate the point (e.g., in time) to which data (e.g., application data, system state, and the like) must be recovered by a replication system. In other words, RPO may be used to indicate how much data loss can be tolerated by applications associated with the replication system. The RTO metric is used to indicate the time within which systems, applications, and/or functions associated with the replication system must be recovered. Optimally, a replication system would provide for instantaneous and complete recovery of data from one or more remote sites at a great distance from the data-generating primary node. The high costs associated with the high-speed link(s) required by such optimal replication systems have discouraged their implementation however in all but a small number of application environments. Replication systems in which alternatively high-frequency data replication is performed over short, high-speed links or low-frequency data replication is performed over longer, low-speed links alone similarly suffer from a number of drawbacks (e.g., a poor RPO metric, high write operation/application latency, high cost, replication and/or recovery failure where an event negatively impacts a primary node and one or more nodes including replicated data due to geographic proximity). Consequently a number of replication systems have been implemented in which such short-distance, high-speed/frequency replication (e.g., real-time or synchronous replication) is coupled (e.g., cascaded) with long-distance, low-speed/frequency replication.
Fig. 1 illustrates a cascaded replication system according to the prior art. In the illustrated cascaded replication system, synchronous replication is performed between a primary node 100 and an intermediary node 102 while periodic replication is performed between intermediary node 102 and a secondary node 104. While a single intermediary node 102 has been illustrated in the system of Fig. 1 it should be understood that additional intermediary nodes may be provided serially or in parallel between primary node 100 and secondary node 104. Primary node 100 of the illustrated system includes an application 106 (e.g, a database application) coupled to a data volume 108 or other storage area via a replication facility 110.
Primary node 100 additionally includes a storage replicator log (SRL) 112 used to effect replication (e.g., synchronous replication). In a typical cascaded replication system such as that illustrated in Fig. 1, SRL 112 is used to store or "journal" data to be written by one or more write operations requested by an application such as application 106 during primary node 100's operation.
It is assumed for purposes of this description that the data volumes of primary node 100, intermediary node 102, and secondary node 104 are initially synchronized. Intermediary node 102 of the illustrated prior art replication system includes a replication facility 114, a data volume 116, and a snapshot data volume 118 as shown. In synchronous replication, when application 106 requests that a write operation be performed on its behalf to data volume 108, replication facility 110 intercepts the write. Replication facility 110 then writes the data to be written by the requested write operation to storage replicator log (SRL) 112. It is not required that such data is written to a storage replicator log, although a storage replicator log is valuable in assisting with recovery upon node failure. The data may be written directly to data volume 108 or into a memory buffer that is later copied to data volume 108. Replication facility 110 then replicates the data to be written to data volume 116 within intermediary node 102. In one prior art replication system, such replication is performed by copying the data to be written, and transferring the generated copy to data volume 116. Replication facility 110 then asynchronously issues a write operation to write the data to be written locally to data volume 108. In a conventional replication system implementing synchronous replication, writing the data to the SRL 112, writing data to local data volume 108 and transferring a copy of the data to be written to intermediary node may start and/or complete in any order or may be performed in parallel. The data is then written to data volume 108. Because the updated data resulting from the write operation is sent to a node that is updated synchronously, replication facility 110 waits until an acknowledgement is received from replication facility 114 before notifying application 106 that the write operation is complete. The described data transfer between primary node 100 and intermediary node 102 is performed over a communication link (e.g., a communications network and storage area network (SAN)) between the nodes. Upon receiving replicated data, replication facility 114 on intermediary node 102 issues a write command to write the data directly to data volume 116.
An acknowledgement is then transmitted from intermediary node 102 to primary node 100 indicating that the write operation or "update" has been received. Upon receiving the described acknowledgement, replication facility 110 on node 100 notifies application 106 that the write operation is complete. Primary node 100, intermediary node 102, and secondary node 104 may include more or fewer components in alternative prior art embodiments. For example, primary node 100 may include additional data volumes beyond data volume 108 and/or a data volume or storage area manager used to coordinate the storage of data within any associated data volume. In the periodic replication of the illustrated replication system, data volume 124 within secondary node 104 is periodically updated with changes resulting from write operations on data volume 116 over a period of time. At the beginning of an initial time period a snapshot data volume 118 is created corresponding to data volume 116.
Fig. 2 illustrates a one-to-many replication system used in place of a cascaded replication system according to the prior art. In the illustrated replication system, data is synchronously replicated between a data volume 208 within a primary node 200 and a data volume 216 within a first secondary node 202 as data is periodically replicated between the data volume 208 within the primary node 200 and a data volume 224 within a second secondary node 204 as described in more detail herein. A significant shortcoming of the illustrated one-to-many replication system is that substantial resources of primary node 200 and its associated replication facility 210 are required to perform the multiple illustrated replication operations.
SUMMARY OF THE INVENTION Disclosed is a method and system of providing cascaded replication. According one embodiment of the present invention, data is asynchronously replicated between data volumes of two or more nodes within a cascaded replication system. Data may be asynchronously replicated between a primary data volume and an intermediary data volume or alternatively between an intermediary data volume and one or more secondary data volumes.
Embodiments of the present invention may be used to quickly and reliably replicate data to one or more secondary nodes while reducing replication costs and write operation latency. By providing asynchronous replication within a cascaded replication system, data may be replicated initially over a- relatively shorter and higher cost/bandwidth link and subsequently over a comparatively longer and lower cost/bandwidth link while write operation latency for applications and primary node loading are reduced as compared with conventional synchronous replication-based cascaded replication systems.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non- limiting detailed description set forth below.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings in which:
Fig. 1 illustrates a cascaded replication system according to the prior art; Fig. 2 illustrates a one-to-many replication system used in place of a cascaded replication system according to the prior art;
Fig. 3 illustrates a cascaded replication system according to a first embodiment of the present invention;
Fig. 4 illustrates a cascaded replication system according to a second embodiment of the present invention;
Fig. 5 illustrates a cascaded replication system including a replication multiplexer according to an embodiment of the present invention;
Fig. 6 illustrates a cascaded replication process according to an embodiment of the present invention; and Fig. 7 illustrates an exemplary data processing system useable with one or more embodiments of the present invention.
The use of the same reference symbols in different drawings indicates similar or identical items.
DETAILED DESCRIPTION Although the present invention has been described in connection with one embodiment, the invention is not intended to be limited to the specific forms set forth herein. On the contrary, it is intended to cover such alternatives, modifications, and equivalents as can be reasonably included within the scope of the invention as defined by the appended claims.
In the following detailed description, numerous specific details such as specific method orders, structures, elements, and connections have been set forth. It is to be understood however that these and other specific details need not be utilized to practice embodiments of the present invention.
In other circumstances, well-known structures, elements, or connections have been omitted, or have not been described in particular detail in order to avoid unnecessarily obscuring this description.
References within the specification to "one embodiment" or "an embodiment" are intended to indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. The appearance of the phrase "in one embodiment" in various places within the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
Fig. 3 illustrates a cascaded replication system according to a first embodiment of the present invention. In the illustrated cascaded replication system, asynchronous replication is performed between a primary node 300 and an intermediary node 302 thus reducing application write operation latency and cost while meeting desired recovery point objectives. Asynchronous replication utilizes a log area (e.g., a storage replicator log) to stage write operations such that the write operation can return as soon as data associated with the write operation (e.g., the data to be written, metadata, and the like) has been logged (i.e., stored) to this log area. Asynchronous replication requires write ordering (e.g., at a secondary node) to ensure that each replicated data volume is consistent. According to one embodiment of the present invention, writes are ordered by tagging each write with a globally increasing sequence number. In a distributed environment (e.g., SAN Volume Manager or Cluster
Volume Manager provided by VERITAS Software Corporation of Mountain View, California) this sequence number may be used to maintain the write order across various nodes (hosts, switches, appliances etc). According to still other embodiments of the present invention in such a distributed environment, the log or "journal" may alternately be shared or exclusive to each node of a group of nodes.
Replication between intermediary node 302 and secondary node 304 may then in turn be performed using one of several replication techniques (e.g., asynchronous and/or periodic replication) according to alternative embodiments of the present invention. In periodic replication a site or node (e g , a secondary node) is peπodically updated with changes that have been written (e g , to an intermediary node) over a period of time
While a single intermediary node 302 has been illustrated in the system of Fig 3, it should be understood that additional intermediary nodes may be provided serially or in parallel between pπmary node 300 and secondary node 304 Primary node 300 of the illustrated system includes an application 306 (e g , a database application) coupled to a data volume 308 or other storage area via a replication facility 310 such as the Volume Replicator product provided by VERITAS Softwaie Corporation of Mountain View, California According to one embodiment, cascaded data replication is performed on more than two levels (e g , intermediary node and secondary node) as illustrated in Fig 3 In the described embodiment, replication frequency (e g , periodic replication frequency) may be reduced as the data is lephcated from one node to another, for example, where available bandwidth decreases as the number of hops increases
Primary node 300 additionally includes a storage replicator log (SRL) 312 used to effect the descπbed asynchronous replication In a typical cascaded replication system such as that illustrated in Fig 3, SRL 312 is used to store or "journal" data to be wπtten by one or more write operations requested by an application such as application 306 during pπmary node 300's operation In another embodiment asynchronous replication may be effected by tracking data changes using a bitmap or an extent map rather than a log such as SRL 312 It is assumed for purposes of this description that the data volumes of primary node 300, intermediary node 302, and secondary node 304 are initially synchromzed Intermediary node 302 of the illustrated prior art replication system includes a replication facility 314, and a data volume 316 as shown
In asynchronous replication, when application 306 requests that a wπte operation be performed on its behalf to data volume 308, replication facility 310 intercepts the wπte Replication facility 310 then wπtes the data to be wπtten by the requested wπte operation to storage replicator log (SRL) 312 It is not required that such data is wπtten to a storage replicator log, although a storage replicator log is valuable in assisting with recovery upon node failure For example, m one alternate embodiment such data can be tracked m bitmap or extent map to facilitate recovery The data may be written directly to data volume 308 or into a memory buffer that is later copied to data volume 308 As soon as the data has been written to the SRL or tracked in a bit/extent map, replication facility 310 on node 300 may notify application 306 that the write operation is complete
Thereafter, the data is wπtten to data volume 308 and replicated to data volume 316 within intermediary node 302 by replication facility 310 In one embodiment, such replication is performed by copying the data to be wπtten, and transferring the generated copy to data volume 316 As part of the described replication, confirmation of the intermediary node's receipt of the data to be written as well as the actual wπte operation of such data to data volume 316 may be tiansmitted to the pπmary node Additionally, the data to be wπtten may logged (e g , using an SRL, not shown) within the intermediary node as part of such replication according to one embodiment of the present invention In the illustrated embodiment, the described data transfer between primary node 300 and intermediary node 302 is performed over a communication link (e.g., a communications network and SAN) between the nodes. In alternative embodiments, primary node 300, intermediary node 302, and secondary node 304 may include more or fewer components. For example, primary node 300 may include additional data volumes beyond data volume 308 and or a data volume or storage area manager used to coordinate the storage of data within any associated data volume.
Fig. 4 illustrates a cascaded replication system according to a second embodiment of the present invention. In the illustrated cascaded replication system, asynchronous replication is performed between an intermediary node 402 and a secondary node 404 while replication between a primary node 400 and intermediary node 402 is performed using one of several replication techniques (e.g., synchronous, asynchronous, and/or periodic replication) according to alternative embodiments of the present invention. Asynchronous replication in the illustrated embodiment is performed as described with respect to Fig. 3 herein.
According to one or more embodiments of the present invention, replication repeaters and/or multiplexers may be provided. A replication node acting as a repeater or multiplexer according to one embodiment of the present invention includes limited or specialized data volume replication functionality. Exemplary repeaters and multiplexers include, but are not limited to, local multiplexers used to relieve a primary node from performing n-way replication (i.e., replication to "n" secondary nodes, where n is an integer) over a local area network (LAN); remote multiplexers used to perform such n-way replication across greater physical distance (e.g., using a Wide Area Network (WAN)); and repeaters. A repeater may be considered a specialized local or remote multiplexer having a single target node (i.e., where n is equal to one).
A replication repeater is provided according to one embodiment of the present invention utilizing two or more space-saving volumes (e.g., VI and V2) such as are described for example in United States Patent Number 10/436,354, entitled, "Method and System of Providing Periodic Replication" stored on a space saving construct (e.g., cache structured storage and/or log structured storage) that can be used to alternately store incremental transfers or updates. In the described embodiment, one of the space-saving volumes may be used to accept data from a first node (e.g., a primary node) while the other space-saving volume is used to transfer data received during a prior incremental transfer to a second node (e.g., a secondary node). Once this initial transfer is complete, the space-saving volumes' roles may be reversed with the described process being repeated to effectively double data replication throughput. In yet another embodiment, multiple sets of such space- saving volumes may be employed to perform replication multiplexing as described herein with similar effect. Fig. 5 illustrates a cascaded replication system including a replication multiplexer according to an embodiment of the present invention. In the illustrated embodiment, data is first replicated from a primary node 500 to an intermediary node 502 which in turn is used as a replication multiplexer to replicate data to multiple target secondary nodes 504A and 504B. While a single intermediary node 502 and two secondary nodes 504 have been illustrated in the system of Fig. 5, it should be understood that additional intermediary nodes and/or secondary nodes may be provided in alternative embodiments of the present invention. Fig. 6 illustrates a cascaded replication process according to an embodiment of the present invention. In the illustrated process embodiment a first data volume of a primary node, a second data volume of an intermediary node, and a third data volume of a secondary node are initially synchronized (process block 602). Such initially synchronization may be implemented according to various embodiments of the present invention using data transfer from one node data processing system to another across a network, tape or other persistent backup and restore capabilities, or one or more snapshots or portions thereof.
Following the initial synchronization of the described nodes and associated data volumes a request to perform a write operation on the first data volume is intercepted (e.g., using a replication facility) (process block 604). Thereafter, the data to be written by the intercepted write operation request is stored within a storage replicator log (SRL) at the primary node (process block 606) before the original requesting application is notified of the successful completion of the write (process block 608). In step 606 the data can alternately be marked in a bitmap or extent map before indicating successful completion of the write operation to the application. Thereafter the data to be written by the requested write operation is replicated to the second data volume at the intermediary node (process block 610).
According to one embodiment of the present invention, the described replication includes a cascaded replication operation from the second data volume to the third data volume of the secondary node (not illustrated). In alternative embodiments the described cascaded replication may be implemented as asynchronous replication and/or periodic replication. After the data to be written has been replicated as described, it is the stored within the first data volume at the primary node (process block 612). While operations of the illustrated process embodiment of Fig. 6 have been illustrated as being performed serially for clarity, it should be appreciated that one or more of such operations (e.g., the operations depicted by process blocks 608 through 612) may be performed in parallel in alternative embodiments of the present invention. Fig. 7 illustrates an exemplary data processing system useable with one or more embodiments of the present invention. Data processing system 710 includes a bus 712 which interconnects major subsystems of data processing system 710, such as a central processor 714, a system memory 717 (typically RAM, but which may also include ROM, flash RAM, or the like), an input output controller 718, an external audio device, such as a speaker system 720 via an audio output interface 722, an external device, such as a display screen 724 via display adapter 726, serial ports 728 and 730, a keyboard 732 (interfaced with a keyboard controller 733), a storage interface 734, a floppy disk drive 737 operative to receive a floppy disk 738, a host bus adapter (HBA) interface card 735A operative to connect with a fibre channel network 790, a host bus adapter (HBA) interface card 735B operative to connect to a SCSI bus 739, and an optical disk drive 740 operative to receive an optical disk 742. Also included are a mouse 746 (or other point-and-click device, coupled to bus 712 via serial port 728), a modem 747 (coupled to bus 712 via serial port 730), and a network interface 748 (coupled directly to bus 712).
Bus 712 allows data communication between central processor 714 and system memory 717, which may include read-only memory (ROM) or flash memory (neither shown), and random access memory (RAM) (not shown), as previously noted. The RAM is generally the main memory into which the operating system and application programs are loaded and typically affords at least 64 megabytes of memory space. The ROM or flash memory may contain, among other code, the Basic Input-Output system (BIOS) which controls basic hardware operation such as the interaction with peripheral components. Applications resident with data processing system 710 are generally stored on and accessed via a computer readable medium, such as a hard disk drive (e.g., fixed disk 744), an optical drive (e.g., optical drive 740), floppy disk unit 737 or other storage medium. Additionally, applications may be in the form of electronic signals modulated in accordance with the application and data communication technology when accessed via network modem 747 or interface 748.
Storage interface 734, as with the other storage interfaces of data processing system 710, may connect to a standard computer readable medium for storage and/or retrieval of information, such as a fixed disk drive 744. Fixed disk drive 744 may be a part of data processing system 710 or may be separate and accessed through other interface systems. Modem 747 may provide a direct connection to a remote server via a telephone link or to the Internet via an internet service provider (ISP). Network interface 748 may provide a direct connection to a remote server via a direct network link to the Internet via a POP (point of presence). Network interface 748 may provide such connection using wireless techniques, including digital cellular telephone connection, Cellular Digital Packet Data (CDPD) connection, digital satellite data connection or the like.
Many other devices or subsystems (not shown) may be connected in a similar manner (e.g., bar code readers, document scanners, digital cameras and so on). Conversely, it is not necessary for all of the devices shown in Fig. 7 to be present to practice the present invention. The devices and subsystems may be interconnected in different ways from that shown in Fig. 7. The operation of a computer system such as that shown in Fig. 7 is readily known in the art and is not discussed in detail in this application. Code to implement the present invention may be stored in computer-readable storage media such as one or more of system memory 717, fixed disk 744, optical disk 742, or floppy disk 738. Additionally, data processing system 710 may be any kind of computing device, and so includes personal data assistants (PDAs), network appliances, X-window terminals or other such computing devices. The operating system provided on data processing system 710 may be MS-DOS®, MS-WINDOWS®, OS/2®, UNIX®, Linux®, or another known operating system. Data processing system 710 also supports a number of Internet access tools, including, for example, an HTTP- compliant web browser having a JavaScript interpreter, such as Netscape Navigator®, Microsoft Explorer®, and the like.
While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, changes and modifications may be made without departing from this invention and its broader aspects and, therefore, the appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of this invention. Furthermore, it is to be understood that the invention is solely defined by the appended claims.
The present invention is well adapted to attain the advantages mentioned as well as others inherent therein. While the present invention has been depicted, described, and is defined by reference to particular embodiments of the invention, such references do not imply a limitation on the invention, and no such limitation is to be infeπed. The invention is capable of considerable modification, alteration, and equivalents in form and function, as will occur to those ordinarily skilled in the pertinent arts. The depicted and described embodiments are examples only, and are not exhaustive of the scope of the invention.
The foregoing detailed description has set forth various embodiments of the present invention via the use of block diagrams, flowcharts, and examples. It will be understood by those within the art that each block diagram component, flowchart step, operation and/or component illustrated by the use of examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or any combination thereof.
The present invention has been described in the context of fully functional data processing system or computer systems; however, those skilled in the art will appreciate that the present invention is capable of being distributed as a program product in a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Exemplary data processing systems may include one or more hosts, network switches, appliance and/or storage arrays and may implement in-band and/or out-of-band storage or data volume virtualization. Examples of such signal bearing media include recordable media such as floppy disks and CD-ROM, transmission type media such as digital and analog communications links, as well as media storage and distribution systems developed in the future. Additionally, it should be understood that embodiments of the present invention are not limited to a particular type of data processing or computer system. Rather, embodiments of the present invention may be implemented in a wide variety of data processing systems (e.g., host computer systems, network switches, network appliances, and/or disk arrays).
The above-discussed embodiments may be implemented using software modules which perform certain tasks. The software modules discussed herein may include script, batch, or other executable files. The software modules may be stored on a machine-readable or computer-readable storage medium such as a disk drive. Storage devices used for storing software modules in accordance with an embodiment of the invention may be magnetic floppy disks, hard disks, or optical discs such as CD-ROMs or CD-Rs, for example. A storage device used for storing firmware or hardware modules in accordance with an embodiment of the invention may also include a semiconductor-based memory, which may be permanently, removably or remotely coupled to a microprocessor/memory system. Thus, the modules may be stored within a computer system memory to configure the computer system to perform the functions of the module. Other new and various types of computer-readable storage media may be used to store the modules discussed herein.
The above description is intended to be illustrative of the invention and should not be taken to be limiting. Other embodiments within the scope of the present invention are possible. Those skilled in the art will readily implement the steps necessary to provide the structures and the methods disclosed herein, and will understand that the process parameters and sequence of steps are given by way of example only and can be varied to achieve the desired structure as well as modifications that are within the scope of the invention. Variations and modifications of the embodiments disclosed herein can be made based on the description set forth herein, without departing from the scope of the invention. Consequently, the invention is intended to be limited only by the scope of the appended claims, giving full cognizance to equivalents in all respects.

Claims

WHAT IS CLAIMED IS:
1. A method of performing cascaded replication comprising: replicating data to be written to a data volume of a first node to a data volume of a second node; and replicating data to be written to said data volume of said second node to a data volume of a third node, wherein, at least one of said replicating data to be written to said data volume of said first node to said data volume of said second node and said replicating data to be written to said data volume of said second node to said data volume of said third node comprises asynchronously replicating data.
2. The method of claim 1 , wherein said replicating data to be written to said data volume of said first node comprises asynchronously replicating said data to be written to said data volume of said first node to said data volume of said second node.
3. The method of claim 2, wherein said replicating data to be written to said data volume of said second node comprises asynchronously replicating said data to be written to said data volume of said second node to said data volume of said third node.
4. The method of claim 2, wherein said replicating data to be written to said data volume of said second node comprises periodically replicating said data to be written to said data volume of said second node to said data volume of said third node.
5. The method of claim 2, wherein, said replicating data to be written to said data volume of said first node comprises, replicating data to be written to a data volume of a primary node to a data volume of an intermediate node; and said replicating data to be written to said data volume of said second node comprises, replicating data to be written to said data volume of said intermediate node to a data volume of a secondary node.
6. The method of claim 5, wherein said replicating data to be written to said data volume of said intermediate node comprises replicating data to be written to said data volume of said intermediate node to a data volume of each of a plurality of secondary nodes.
7. The method of claim 2, wherein, said replicating data to be written to said dat a volume of said first node comprises replicating data to be written to said data volume of said first node to said data volume of said second node using a first data link coupled between said first node and said second node; said replicating data to be written to said data volume of said second node comprises replicating data to be written to said data volume of said second node to said data volume of said third node using a second data link coupled between said second node and said third node; and said first data link has a higher bandwidth than said second data link.
8. An apparatus configured to perform cascaded replication comprising: means for replicating data to be written to a data volume of a first node to a data volume of a second node; and means for replicating data to be written to said data volume of said second node to a data volume of a third node, wherein, at least one of said means for replicating data to be written to said data volume of said first node to said data volume of said second node and said means for replicating data to be written to said data volume of said second node to said data volume of said third node comprises means for asynchronously replicating data.
9. The apparatus of claim 8, wherein said means for replicating data to be written to a data volume of a first node comprises means for asynchronously replicating said data to be written to said data volume of said first node to said data volume of said second node.
10. The apparatus of claim 9, wherein said means for replicating data to be written to said data volume of said second node comprises means for asynchronously replicating said data to be written to said data volume of said second node to said data volume of said third node.
11. The apparatus of claim 9, wherein said means for replicating data to be written to said data volume of said second node comprises means for periodically replicating said data to be written to said data volume of said second node to said data volume of said third node.
12. The apparatus of claim 9, wherein, said means for replicating data to be written to said data volume of said first node comprises, means for replicating data to be written to a data volume of a primary node to a data volume of an intermediate node; and said means for replicating data to be written to said data volume of said second node comprises, means for replicating data to be written to said data volume of said intermediate node to a data volume of a secondary node.
13. The apparatus of claim 12, wherein said means for replicating data to be written to said data volume of said intermediate node comprises means for replicating data to be written to said data volume of said intermediate node to a data volume of each of a plurality of secondary nodes.
14. The apparatus of claim 9, wherein, said means for replicating data to be written to said data volume of said first node comprises means for replicating data to be written to said data volume of said first node to said data volume of said second node using a first data link coupled between said first node and said second node; said means for replicating data to be written to said data volume of said second node comprises means for replicating data to be written to said data volume of said second node to said data volume of said third node using a second data link coupled between said second node and said third node; and said first data link has a higher bandwidth than said second data link.
15. A machine-readable medium having a plurality of instructions executable by a machine embodied therein, wherein said plurality of instructions when executed cause said machine to perform a method comprising: replicating data to be written to a data volume of a first node to a data volume of a second node; and replicating data to be written to said data volume of said second node to a data volume of a third node, wherein, at least one of said replicating data to be written to said data volume of said first node to said data volume of said second node and said replicating data to be written to said data volume of said second node to said data volume of said third node comprises asynchronously replicating data.
16. The machine-readable medium of claim 15, wherein said replicating data to be written to a data volume of a first node comprises asynclironously replicating said data to be written to said data volume of said first node to said data volume of said second node.
17. The machine-readable medium of claim 16, wherein said replicating data to be written to said data volume of said second node comprises asynchronously replicating said data to be written to said data volume of said second node to said data volume of said third node.
18. The machine-readable medium of claim 16, wherein said replicating data to be written to said data volume of said second node comprises periodically replicating said data to be written to said data volume of said second node to said data volume of said third node.
19. The machine-readable medium of claim 16, wherein, said replicating data to be written to said data volume of said first node comprises, replicating data to be written to a data volume of a primary node to a data volume of an intermediate node; and said replicating data to be written to said data volume of said second node comprises, replicating data to be written to said data volume of said intermediate node to a data volume of a secondary node.
20. The machine-readable medium of claim 19, wherein said replicating data to be written to said data volume of said intermediate node comprises replicating data to be written to said data volume of said intermediate node to a data volume of each of a plurality of secondary nodes.
21. The machine-readable medium of claim 16, wherein, said replicating data to be written to said data volume of said first node comprises replicating data to be written to said data volume of said first node to said data volume of said second node using a first data link coupled between said first node and said second node; said replicating data to be written to said data volume of said second node comprises replicating data to be written to said data volume of said second node to said data volume of said third node using a second data link coupled between said second node and said third node; and said first data link has a higher bandwidth than said second data link.
22. A data processing system comprising: a log to store data to be written to at least one of a data volume of a first node and a data volume of a second node; and a replication facility configured to replicate data to be written to said data volume of said first node to said data volume of said second node and to replicate data to be written to said data volume of said second node to a data volume of a third node using said log, wherein, said replication facility comprises a replication facility configured to asynchronously replicate at least one of said data to be written to said data volume of said first node and said data to be written to said data volume of said second node.
23. The data processing system of claim 22, wherein said replication facility further comprises, a replication facility configured to asynchronously replicate said data to be written to said data volume of said first node to said data volume of said second node.
24. The data processing system of claim 23, wherein said replication facility further comprises, a replication facility configured to asynchronously replicate said data to be written to said data volume of said second node to said data volume of said third node
25. The data processing system of claim 23, wherein said replication facility further comprises, a replication facility configured to periodically replicate said data to be written to said data volume of said second node to said data volume of said third node.
PCT/US2004/027933 2003-08-29 2004-08-27 Method and system of providing cascaded replication WO2005022389A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/652,326 2003-08-29
US10/652,326 US20050050115A1 (en) 2003-08-29 2003-08-29 Method and system of providing cascaded replication

Publications (2)

Publication Number Publication Date
WO2005022389A2 true WO2005022389A2 (en) 2005-03-10
WO2005022389A3 WO2005022389A3 (en) 2005-06-16

Family

ID=34217613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/027933 WO2005022389A2 (en) 2003-08-29 2004-08-27 Method and system of providing cascaded replication

Country Status (2)

Country Link
US (1) US20050050115A1 (en)
WO (1) WO2005022389A2 (en)

Families Citing this family (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7130975B2 (en) * 2003-06-27 2006-10-31 Hitachi, Ltd. Data processing system
JP2005309550A (en) * 2004-04-19 2005-11-04 Hitachi Ltd Remote copying method and system
JP4124348B2 (en) 2003-06-27 2008-07-23 株式会社日立製作所 Storage system
JP4374953B2 (en) 2003-09-09 2009-12-02 株式会社日立製作所 Data processing system
US20050083960A1 (en) * 2003-10-03 2005-04-21 Nortel Networks Limited Method and apparatus for transporting parcels of data using network elements with network element storage
US7181647B2 (en) * 2003-10-15 2007-02-20 International Business Machines Corporation Error tracking method and system
US7398350B1 (en) * 2003-10-29 2008-07-08 Symantec Operating Corporation Distribution of data volume virtualization
JP4412989B2 (en) 2003-12-15 2010-02-10 株式会社日立製作所 Data processing system having a plurality of storage systems
JP4282464B2 (en) * 2003-12-17 2009-06-24 株式会社日立製作所 Remote copy system
JP4477370B2 (en) 2004-01-30 2010-06-09 株式会社日立製作所 Data processing system
US7610319B1 (en) * 2004-03-01 2009-10-27 Symantec Operating Corporation Efficient operations using assistance from secondary site
JP4519563B2 (en) * 2004-08-04 2010-08-04 株式会社日立製作所 Storage system and data processing system
US8078813B2 (en) * 2004-09-30 2011-12-13 Emc Corporation Triangular asynchronous replication
JP2006127028A (en) * 2004-10-27 2006-05-18 Hitachi Ltd Memory system and storage controller
US7657578B1 (en) 2004-12-20 2010-02-02 Symantec Operating Corporation System and method for volume replication in a storage environment employing distributed block virtualization
EP1708095A1 (en) * 2005-03-31 2006-10-04 Ubs Ag Computer network system for constructing, synchronizing and/or managing a second database from/with a first database, and methods therefore
US7685385B1 (en) 2005-06-30 2010-03-23 Symantec Operating Corporation System and method for satisfying I/O requests before a replica has been fully synchronized
JP4887893B2 (en) * 2006-04-26 2012-02-29 株式会社日立製作所 Computer system and computer system control method
JP2007310448A (en) * 2006-05-16 2007-11-29 Hitachi Ltd Computer system, management computer, and storage system management method
JP4842720B2 (en) * 2006-06-29 2011-12-21 株式会社日立製作所 Storage system and data replication method
US8150800B2 (en) * 2007-03-28 2012-04-03 Netapp, Inc. Advanced clock synchronization technique
US8015427B2 (en) * 2007-04-23 2011-09-06 Netapp, Inc. System and method for prioritization of clock rates in a multi-core processor
US7979652B1 (en) 2007-12-20 2011-07-12 Amazon Technologies, Inc. System and method for M-synchronous replication
US8099571B1 (en) 2008-08-06 2012-01-17 Netapp, Inc. Logical block replication with deduplication
US8196203B2 (en) * 2008-09-25 2012-06-05 Symantec Corporation Method and apparatus for determining software trustworthiness
US9158579B1 (en) 2008-11-10 2015-10-13 Netapp, Inc. System having operation queues corresponding to operation execution time
US8655848B1 (en) 2009-04-30 2014-02-18 Netapp, Inc. Unordered idempotent logical replication operations
US8321380B1 (en) 2009-04-30 2012-11-27 Netapp, Inc. Unordered idempotent replication operations
US8275743B1 (en) 2009-08-10 2012-09-25 Symantec Corporation Method and apparatus for securing data volumes to a remote computer using journal volumes
US8671072B1 (en) 2009-09-14 2014-03-11 Netapp, Inc. System and method for hijacking inodes based on replication operations received in an arbitrary order
US8799367B1 (en) 2009-10-30 2014-08-05 Netapp, Inc. Using logical block addresses with generation numbers as data fingerprints for network deduplication
US8473690B1 (en) 2009-10-30 2013-06-25 Netapp, Inc. Using logical block addresses with generation numbers as data fingerprints to provide cache coherency
WO2011125127A1 (en) 2010-04-07 2011-10-13 株式会社日立製作所 Asynchronous remote copy system and storage control method
US8886609B2 (en) * 2010-12-17 2014-11-11 Microsoft Corporation Backup and restore of data from any cluster node
JP5776339B2 (en) * 2011-06-03 2015-09-09 富士通株式会社 File distribution method, file distribution system, master server, and file distribution program
US9292588B1 (en) * 2011-07-20 2016-03-22 Jpmorgan Chase Bank, N.A. Safe storing data for disaster recovery
EP2801024A4 (en) * 2012-01-06 2016-08-03 Intel Corp Reducing the number of read/write operations performed by a cpu to duplicate source data to enable parallel processing on the source data
US9037818B1 (en) * 2012-03-29 2015-05-19 Emc Corporation Active replication switch
US9229829B2 (en) * 2012-07-25 2016-01-05 GlobalFoundries, Inc. Synchronous mode replication to multiple clusters
JP6056408B2 (en) * 2012-11-21 2017-01-11 日本電気株式会社 Fault tolerant system
US9213497B2 (en) * 2012-12-13 2015-12-15 Hitachi, Ltd. Storage apparatus and storage apparatus migration method
US9563655B2 (en) * 2013-03-08 2017-02-07 Oracle International Corporation Zero and near-zero data loss database backup and recovery
US10303782B1 (en) 2014-12-29 2019-05-28 Veritas Technologies Llc Method to allow multi-read access for exclusive access of virtual disks by using a virtualized copy of the disk
US10929431B2 (en) 2015-08-28 2021-02-23 Hewlett Packard Enterprise Development Lp Collision handling during an asynchronous replication
WO2017039577A1 (en) * 2015-08-28 2017-03-09 Hewlett Packard Enterprise Development Lp Managing sets of transactions for replication
US10976937B1 (en) * 2016-09-28 2021-04-13 EMC IP Holding Company LLC Disparate local and remote replication technologies configured for the same device
US11360688B2 (en) * 2018-05-04 2022-06-14 EMC IP Holding Company LLC Cascading snapshot creation in a native replication 3-site configuration
US20190377642A1 (en) * 2018-06-08 2019-12-12 EMC IP Holding Company LLC Decoupled backup solution for distributed databases across a failover cluster
US10705754B2 (en) * 2018-06-22 2020-07-07 International Business Machines Corporation Zero-data loss recovery for active-active sites configurations
CN112749123A (en) * 2019-10-30 2021-05-04 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing a file system
US11327679B2 (en) * 2020-01-31 2022-05-10 EMC IP Holding Company LLC Method and system for bitmap-based synchronous replication
US11681677B2 (en) * 2020-02-27 2023-06-20 EMC IP Holding Company LLC Geographically diverse data storage system employing a replication tree
US20240012718A1 (en) * 2022-07-07 2024-01-11 Dell Products L.P. Recovery aware data migration in distributed systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155845A (en) * 1990-06-15 1992-10-13 Storage Technology Corporation Data storage system for providing redundant copies of data on different disk drives
WO1994000816A1 (en) * 1992-06-18 1994-01-06 Andor Systems, Inc. Remote dual copy of data in computer systems
US20030014523A1 (en) * 2001-07-13 2003-01-16 John Teloh Storage network data replicator
US20030163553A1 (en) * 2002-02-26 2003-08-28 Hitachi, Ltd. Storage system and method of copying data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5937414A (en) * 1997-02-28 1999-08-10 Oracle Corporation Method and apparatus for providing database system replication in a mixed propagation environment
US6889228B1 (en) * 2001-03-29 2005-05-03 Network Appliance, Inc. Cascading support for mirrored volumes
US7143307B1 (en) * 2002-03-15 2006-11-28 Network Appliance, Inc. Remote disaster recovery and data migration using virtual appliance migration
US6889231B1 (en) * 2002-08-01 2005-05-03 Oracle International Corporation Asynchronous information sharing system
US7149919B2 (en) * 2003-05-15 2006-12-12 Hewlett-Packard Development Company, L.P. Disaster recovery system with cascaded resynchronization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155845A (en) * 1990-06-15 1992-10-13 Storage Technology Corporation Data storage system for providing redundant copies of data on different disk drives
WO1994000816A1 (en) * 1992-06-18 1994-01-06 Andor Systems, Inc. Remote dual copy of data in computer systems
US20030014523A1 (en) * 2001-07-13 2003-01-16 John Teloh Storage network data replicator
US20030163553A1 (en) * 2002-02-26 2003-08-28 Hitachi, Ltd. Storage system and method of copying data

Also Published As

Publication number Publication date
WO2005022389A3 (en) 2005-06-16
US20050050115A1 (en) 2005-03-03

Similar Documents

Publication Publication Date Title
US20050050115A1 (en) Method and system of providing cascaded replication
US9535907B1 (en) System and method for managing backup operations of virtual machines
US7308545B1 (en) Method and system of providing replication
EP1700215B1 (en) Coordinated storage management operations in replication environment
US7383407B1 (en) Synchronous replication for system and data security
JP4477950B2 (en) Remote copy system and storage device system
US7191299B1 (en) Method and system of providing periodic replication
US7406487B1 (en) Method and system for performing periodic replication using a log
US7617369B1 (en) Fast failover with multiple secondary nodes
US8930309B2 (en) Interval-controlled replication
US8209282B2 (en) Method, system, and article of manufacture for mirroring data at storage locations
US7668876B1 (en) Snapshot-based replication infrastructure for efficient logging with minimal performance effect
US7921273B2 (en) Method, system, and article of manufacture for remote copying of data
JP3968207B2 (en) Data multiplexing method and data multiplexing system
US7457830B1 (en) Method and system of replicating data using a recovery data change log
US7685385B1 (en) System and method for satisfying I/O requests before a replica has been fully synchronized
US7831550B1 (en) Propagating results of a volume-changing operation to replicated nodes
US20040260899A1 (en) Method, system, and program for handling a failover to a remote storage location
JP6136629B2 (en) Storage control device, storage system, and control program
CN108351821A (en) Data reconstruction method and storage device
US8689043B1 (en) Fast failover with multiple secondary nodes
US7707372B1 (en) Updating a change track map based on a mirror recovery map
US8799211B1 (en) Cascaded replication system with remote site resynchronization after intermediate site failure
US11461018B2 (en) Direct snapshot to external storage
US11354268B2 (en) Optimizing snapshot creation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase