US20040202013A1 - System and method for collaborative caching in a multinode system - Google Patents
System and method for collaborative caching in a multinode system Download PDFInfo
- Publication number
- US20040202013A1 US20040202013A1 US10/251,645 US25164502A US2004202013A1 US 20040202013 A1 US20040202013 A1 US 20040202013A1 US 25164502 A US25164502 A US 25164502A US 2004202013 A1 US2004202013 A1 US 2004202013A1
- Authority
- US
- United States
- Prior art keywords
- node
- lock
- operating system
- storage
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99938—Concurrency, e.g. lock management in shared database
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
- Y10S707/99953—Recoverability
Definitions
- the present invention relates generally to computer systems. More specifically, a system and method for collaborative caching in a multi-node file system is disclosed.
- multiple nodes may be set up to share data storage.
- a lock may be used.
- FIG. 1 is a block diagram of a system for accessing data according to an embodiment of the present invention.
- FIG. 2 is another block diagram of a system according to an embodiment of the present invention.
- FIG. 3 is a block diagram of software components inside a node according to an embodiment of the present invention.
- FIGS. 4A-4B show a flow diagram for a method according to an embodiment of the present invention for accessing data.
- FIGS. 5A-5E show another flow diagram of a method according to an embodiment of the present invention for accessing data.
- FIG. 6 is another block diagram of the software components of server 300 according to an embodiment of the present invention.
- the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. It should be noted that the order of the steps of disclosed processes may be altered within the scope of the invention.
- FIG. 1 is a block diagram of a system for accessing data according to an embodiment of the present invention.
- FIG. 3 is a block diagram of a system for a multi- node environment according to an embodiment of the present invention.
- servers 300 A- 300 D are coupled via network interconnects 302 .
- the network interconnects 302 can represent any network infrastructure such as an Ethernet, InfiniBand network or Fibre Channel network capable of host-to-host communication.
- the servers 300 A- 300 D are also coupled to the data storage interconnect 304 , which in turn is coupled to shared storage 306 A- 306 D.
- the data storage interconnect 304 can be any interconnect that can allow access to the shared storage 306 A- 306 D by servers 300 A- 300 D.
- the data storage interconnect 304 is a Fibre Channel switch, such as a Brocade 3200 Fibre Channel switch.
- the data storage network might be an iSCSI or other IP storage network, InfiniBand network, or another kind of host-to-storage network.
- the network interconnects 302 and the data storage interconnect 304 may be embodied in a single interconnect.
- Servers 300 A- 300 D can be any computer, preferable an off-the-shelf computer or server or any equivalent thereof. Servers 300 A- 300 D can each run operating systems that are independent of each other. Accordingly, each server 300 A- 300 D can, but does not need to, run a different operating system. For example, server 300 A may run Microsoft windows, while server 300 B runs Linux, and server 300 C can simultaneously run a Unix operating system.
- An advantage of running independent operating systems for the servers 300 A- 300 D is that the entire multi-node system can be dynamic. For example, one of the servers 300 A- 300 D can fail while the other servers 300 A- 300 D continue to operate.
- the shared storage 306 A- 306 D can be any storage device, such as hard drive disks, compact disks, tape, and random access memory.
- a filesystem is a logical entity built on the shared storage.
- the shared storage 306 A- 306 D is typically considered a physical device while the filesystem is typically considered a logical structure overlaid on part of the storage, the filesystem is sometimes referred to herein as shared storage for simplicity.
- shared storage can mean the physical storage device, a portion of a filesystem, a filesystem, filesystems, or any combination thereof.
- FIG. 2 is another block diagram of a system according to an embodiment of the present invention.
- the system preferably has no single point of failure.
- servers 300 A′- 300 D′ are coupled with multiple network interconnects 302 A- 302 D.
- the servers 300 A′- 300 D′ are also shown to be coupled with multiple storage interconnects 304 A- 304 B.
- the storage interconnects 304 A- 304 B are each coupled to a plurality of data storage 306 A′- 306 D′.
- the number of servers 300 A′- 300 D′, the number of storage interconnects 304 A- 304 B, and the number of data storage 306 A′- 306 D′ can be as many as the customer requires and is not physically limited by the system.
- the operating systems used by servers 300 A′- 300 D′ can also be as many independent operating systems as the customer requires.
- FIG. 3 is a block diagram of software components inside a node 300 .
- node 300 is shown to include a buffer cache 350 , processes 352 , a distributed lock manager (DLM) 354 , and a lock caching layer (LCL) 356 .
- a block is kept in the node's cache (in local storage) after node 300 changes the block rather than writing it immediately into the shared storage. In this manner, it is faster if that node 300 can find the latest document in its own buffer cache 350 rather than taking the time to access the shared storage.
- the distributed lock manager communicates with other DLMs in other nodes and also communicates with the lock caching layer 356 .
- the lock caching layer 356 calls requested tasks before a lock is downgraded or released.
- a process 352 such as an application or a file system, can obtain a lock on a block via the lock caching layer 356 , use it, then eventually relinquish the lock on the block.
- the block is then stored in buffer cache 350 .
- a search can be performed in the buffer cache 350 to find that block. If the block is not found in the buffer cache, then it can be retrieved from the shared storage.
- FIGS. 4A-4B show a flow diagram for a method according to an embodiment of the present invention for accessing data.
- a process within a particular node requests the lock caching layer (LCL) for a write lock for a document ( 400 ).
- the LCL obtains a distributed lock manager (DLM) lock for that document ( 402 ).
- the LCL grants the LCL lock to the process for that document ( 404 ).
- the LCL caches the DLM lock ( 406 ).
- 400 - 406 occur within a single node.
- Another node requests a read lock on the document and the request is received by this node's DLM ( 408 ).
- the DLM asks the LCL to downgrade the DLM lock ( 450 of FIG. 4B).
- the LCL determines that there are no local processes using the lock and writes the document to shared storage ( 452 ).
- the LCL informs the DLM that it is down grading the lock from write to read ( 454 ).
- the DLM then passes the lock as well as the latest version of the document to the requesting node ( 456 ).
- FIGS. 5A-5E show another flow diagram of a method according to an embodiment of the present invention for accessing data.
- the example shown in FIGS. 5A-5C the example of a requesting node requesting a shared lock is used. Variations of this example can be used to accommodate other types of locks, such as an exclusive lock or a lock with a different level of exclusion.
- the requesting node asks its DLM for a shared lock ( 500 ). It is determined whether the requesting node is the home node ( 502 ).
- a lock home node is the server that is responsible for granting or denying lock requests for a given DLM lock when there is no cached lock reference available on the requesting node. In this embodiment, there is one lock home node per lock. The home node does not necessarily hold the lock locked but if other nodes hold the lock locked or cached, then the home node has a description of the lock since the other nodes that holds the lock locked or cached communicated with the home node in order to get it locked or cached.
- the DLM of the requesting node requests a shared lock from the home node ( 504 ). It is also determined whether a lock is held by a node other than the requesting node ( 506 ). If a lock is held by a node other than the requesting node, the home node then gives the requesting node the lock in shared mode ( 508 ). The requesting node then reads the content from shared storage ( 510 ).
- the requesting node is the home node ( 502 ), then it is determined whether a lock is held by another node ( 550 ). If a lock is not held by another node, then the requesting node obtains the lock and reads from shared storage ( 562 ). If, however, there is a lock held by another node, then it is also determined whether the other node holds a shared lock ( 552 ). If the other node holds a shared lock, then the requesting node grants itself a shared lock ( 563 ) and sends a request for content to the owner of the shared lock ( 564 ).
- the owner If the owner does not have the content in the local cache, it sends the downgrade message to the requesting node ( 592 ). The requesting node then grants itself a shared lock and reads the content from shared storage ( 594 ).
- a lock is held by a node other than the requesting node ( 506 of FIG. 5A)
- the home node If the home node does not hold the lock ( 602 ), it then sends the content request to the lock holder ( 612 ). The content is sent from the lock holder to the home node ( 614 ). The home node sends the lock as well as the content to the requester ( 616 ).
- the lock held by another node is not a shared lock ( 600 ), for example, it's an exclusive lock, then it is determined whether the home node holds the lock ( 650 of FIG. 5E). If the home node holds the lock, it then writes the content to the shared storage ( 654 ). The home node downgrades the exclusive lock to shared and send the shared lock to the requester along with content if known ( 656 ).
- the home node If the home node does not hold the lock ( 650 ), it then sends the request for downgrade and content to the owner of the lock ( 660 ). The owner of the lock writes the content to shared storage ( 662 ). The owner of the lock then sends the content and a message that it is down grading from exclusive lock to shared lock to the home node ( 664 ). The home node sends the lock and the content to the requester ( 666 ).
- the home node sends the content to the requester if the home node has the content in its cache. If, however, the home node does not have the content in its cache, it then notifies the requester that it does not have the content in the cache and the requester retrieves the content from the shared storage.
- the nodes can access information directly amongst each other, without regularly writing to the shared storage. Accordingly, FIGS. 5A-5E still applies to this embodiment except that it would be modified to delete 558 of FIG. 5C, 654 of FIG. 5E, and 662 of FIG. 5E.
- FIG. 6 is another block diagram of the software components of server 300 according to an embodiment of the present invention.
- each server 300 A- 300 D of FIG. 1 includes these software components.
- the Distributed Lock Manager (DLM) 1500 manages matrix-wide locks for the filesystem image 306 a - 306 d , including the management of lock state during crash recovery.
- the Matrix Filesystem 1504 uses DLM 1500 -managed locks to implement matrix-wide mutual exclusion and matrix-wide filesystem 306 a - 306 d metadata and data cache consistency.
- the DLM 1500 is a distributed symmetric lock manager. Preferably, there is an instance of the DLM 1500 resident on every server in the matrix. Every instance is a peer to every other instance; there is no master/slave relationship among the instances.
- the lock-caching layer (“LCL”) 1502 is a component internal to the operating system kernel that interfaces between the Matrix Filesystem 1504 and the application-level DLM 1500 .
- the purposes of the LCL 1502 include the following:
- DLM 1500 It caches DLM 1500 locks (that is, it may hold on to DLM 1500 locks after clients have released all references to them), sometimes obviating the need for kernel components to communicate with an application-level process (the DLM 1500 ) to obtain matrix-wide locks.
- [0054] 4 It allows clients to define callouts for different types of locks when certain events related to locks occur, particularly the acquisition and surrender of DLM 1500 -level locks. This ability is a requirement for cache-coherency, which depends on callouts to flush modified cached data to permanent storage when corresponding DLM 1500 write locks are downgraded or released, and to purge cached data when DLM 1500 read locks are released.
- the LCL 1502 is the only kernel component that makes lock requests from the user-level DLM 1500 . It partitions DLM 1500 locks among kernel clients, so that a single DLM 1500 lock has at most one kernel client on each node, namely, the LCL 1502 itself. Each DLM 1500 lock is the product of an LCL 1502 request, which was induced by a client's request of an LCL 1502 lock, and each LCL 1502 lock is backed by a DLM 1500 lock.
- the Matrix Filesystem 1504 is the shared filesystem component of The Matrix Server.
- the Matrix Filesystem 1504 allows multiple servers to simultaneously mount, in read/write mode, filesystems living on physically shared storage devices 306 a - 306 d .
- the Matrix Filesystem 1504 is a distributed symmetric matrixed filesystem; there is no single server that filesystem activity must pass through to perform filesystem activities.
- the Matrix Filesystem 1504 provides normal local filesystem semantics and interfaces for clients of the filesystem.
- SAN (Storage Area Network) Membership Service 1506 provides the group membership services infrastructure for the Matrix Filesystem 1504 , including managing filesystem membership, health monitoring, coordinating mounts and unmounts of shared filesystems 306 a - 306 d , and coordinating crash recovery.
- Matrix Membership Service 1508 provides the Local, matrix-style matrix membership support, including virtual host management, service monitoring, notification services, data replication, etc.
- the Matrix Filesystem 1504 does not interface directly with the MMS 1508 , but the Matrix Filesystem 1504 does interface with the SAN Membership Service 1506 , which interfaces with the MMS 1508 in order to provide the filesystem 1504 with the matrix group services infrastructure.
- the Shared Disk Monitor Probe 1510 maintains and monitors the membership of the various shared storage devices in the matrix. It acquires and maintains leases on the various shared storage devices in the matrix as a protection against rogue server “split-brain” conditions. It communicates with the SMS 1506 to coordinate recovery activities on occurrence of a device membership transition.
- Filesystem monitors 1512 are used by the SAN Membership Service 1508 to initiate Matrix Filesystem 1504 mounts and unmounts, according to the matrix configuration put in place by the Matrix Server user interface.
- the Service Monitor 1514 tracks the state (health & availability) of various services on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored service transitions.
- Services monitored include HTTP, FTP, Telnet, SMTP, etc.
- the remedial actions include service restart on the same server or service fail-over and restart on another server.
- the Device Monitor 1516 tracks the state (health & availability) of various storage-related devices in the matrix so that the matrix server may take automatic remedial action when the state of any monitored device transitions.
- Devices monitored may include data storage devices 306 a - 306 d (such as storage device drives, solid state storage devices, ram storage devices, JOBDs, RAID arrays, etc.)and storage network devices 304 ′ (such as FibreChannel Switches, Infiniband Switches, iSCSI switches, etc.).
- the remedial actions include initiation of Matrix Filesystem 1504 recovery, storage network path failover, and device reset.
- the Application Monitor 1518 tracks the state (health & availability) of various applications on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored application transitions.
- Applications monitored may include databases, mail routers, CRM apps, etc.
- the remedial actions include application restart on the same server or application fail-over and restart on another server.
- the Notifier Agent 1520 tracks events associated with specified objects in the matrix and executes supplied scripts of commands on occurrence of any tracked event.
- the Replicator Agent 1522 monitors the content of any filesystem subtree and periodically replicates any data which has not yet been replicated from a source tree to a destination tree.
- the Matrix Communication Service 1524 provides the network communication infrastructure for the DLM 1500 , Matrix Membership Service 1508 , and SAN Membership Service 1506 .
- the Matrix Filesystem 1504 does not use the MCS 1524 directly, but it does use it indirectly through these other components.
- the Storage Control Layber (SCL) 1526 provides matrix-wide device identification, used to identify the Matrix Filesystems 1504 at mount time.
- the SCL 1526 also manages storage fabric configuration and low level I/O device fencing of rogue servers from the shared storage devices 306 a - 306 d containing the Matrix Filesystems 1504 . It also provides the ability for a server in the matrix to voluntarily intercede during normal device operations to fence itself when communication with rest of the matrix has been lost.
- the Storage Control Layer 1526 is the Matrix Server module responsible for managing shared storage devices 306 a - 306 d . Management in this context consists of two primary functions. The first is to enforce I/O fencing at the hardware SAN level by enabling/disabling host access to the set of shared storage devices 306 a - 306 d . And the second is to generate global(matrix-wide) unique device names (or “labels”) for all matrix storage devices 306 a - 306 d and ensure that all hosts in the matrix have access to those global device names.
- the SCL module also includes utilities and library routines needed to provide device information to the UI.
- the Pseudo Storage Driver 1528 is a layered driver that “hides” a target storage device 306 a - 306 d so that all references to the underlying target device must pass through the PSD layered driver.
- the PSD provides the ability to “fence” a device, blocking all I/O from the host server to the underlying target device until it is unfenced again.
- the PSD also provides an application-level interface to lock a storage partition across the matrix. It also has the ability to provide common matrix-wide ‘handles’, or paths, to devices such that all servers accessing shared storage in the Matrix Server can use the same path to access a given shared device.
Abstract
A system and method are disclosed for accessing data in a multi-node system comprising providing a first node associated with a first operating system; providing a second node associated with a second operating system, wherein the second operating system is independent of the first operating system; providing a storage, wherein the first node directly accesses the storage and the second node directly accesses the storage; requesting a lock for a block by the first node to the second node; obtaining the lock from the second node; and obtaining the block the from the second node.
Description
- This application claims priority to U.S. Provisional Patent Application No. 60/324,196 (Attorney Docket No. POLYP001+) entitled SHARED STORAGE LOCK: A NEW SOFTWARE SYNCHRONIZATION MECHANISM FOR ENFORCING MUTUAL EXCLUSION AMONG MULTIPLE NEGOTIATORS filed Sep. 21, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No. 60/324,226 (Attorney Docket No. POLYP002+) entitled JOUNALING MECHANISM WITH EFFICIENT, SELECTIVE RECOVERY FOR MULTI-NODE ENVIRONMENTS filed Sep. 21, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No. 60/324,224 (Attorney Docket No. POLYP003+) entitled COLLABORATIVE CACHING IN A MULTI-NODE FILESYSTEM filed Sep. 21, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No 60/324,242 (Attorney Docket No. POLYP005+) entitled DISTRIBUTED MANAGEMENT OF A STORAGE AREA NETWORK filed Sep. 21, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No. 60/324,195 (Attorney Docket No. POLYP006+) entitled METHOD FOR IMPLEMENTING JOURNALING AND DISTRIBUTED LOCK MANAGEMENT filed Sep. 21, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No. 60/324,243 (Attorney Docket No. POLYP007+) entitled MATRIX SERVER: A HIGHLY AVAILABLE MATRIX PROCESSING SYSTEM WITH COHERENT SHARED FILE STORAGE filed Sep. 21, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No. 60/324,787 (Attorney Docket No. POLYP008+) entitled A METHOD FOR EFFICIENT ON-LINE LOCK RECOVERY IN A HIGHLY AVAILABLE MATRIX PROCESSING SYSTEM filed Sep. 24, 2001, which is incorporated herein by reference for all purposes.
- This application claims priority to U.S. Provisional Patent Application No. 60/327,191 (Attorney Docket No. POLYP009+) entitled FAST LOCK RECOVERY: A METHOD FOR EFFICIENT ON-LINE LOCK RECOVERY IN A HIGHLY AVAILABLE MATRIX PROCESSING SYSTEM filed Oct. 1, 2001, which is incorporated herein by reference for all purposes.
- This application is related to co-pending U.S. patent application Ser. No. ______(Attorney Docket No.POLYP001) entitled A SYSTEM AND METHOD FOR SYNCHRONIZATION FOR ENFORCING MUTUAL EXCLUSION AMONG MULTIPLE NEGOTIATORS filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. patent application Ser. No. ______ (Attorney Docket No. POLYP002) entitled SYSTEM AND METHOD FOR JOURNAL RECOVERY FOR MULTINODE ENVIRONMENTS filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. patent application Ser. No. ______(Attorney Docket No. POLYP005) entitled A SYSTEM AND METHOD FOR MANAGEMENT OF A STORAGE AREA NETWORK filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. patent application Ser. No. ______(Attorney Docket No. POLYP006) entitled SYSTEM AND METHOD FOR IMPLEMENTING JOURNALING IN A MULTI-NODE ENVIRONMENT filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. patent application Ser. No. ______(Attorney Docket No. POLYP007) entitled A SYSTEM AND METHOD FOR A MULTI-NODE ENVIRONMENT WITH SHARED STORAGE filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. patent application Ser. No. ______(Attorney Docket No. POLYP009) entitled A SYSTEM AND METHOD FOR EFFICIENT LOCK RECOVERY filed concurrently herewith, which is incorporated herein by reference for all purposes.
- The present invention relates generally to computer systems. More specifically, a system and method for collaborative caching in a multi-node file system is disclosed.
- In today's complex network systems, multiple nodes may be set up to share data storage. Preferably, in order to share storage only one node or application is allowed to alter data at any given time. In order to accomplish this synchronization, a lock may be used.
- Typically, it can be slow for a node to read or write to a particular block in a shared storage system due to the time it can take to coordinate the locking mechanism and retrieval time of the document from shared storage.
- It would be desirable to speed up the time required to obtain access to a shared document. The present invention addresses such a need.
- The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
- FIG. 1 is a block diagram of a system for accessing data according to an embodiment of the present invention.
- FIG. 2 is another block diagram of a system according to an embodiment of the present invention.
- FIG. 3 is a block diagram of software components inside a node according to an embodiment of the present invention.
- FIGS. 4A-4B show a flow diagram for a method according to an embodiment of the present invention for accessing data.
- FIGS. 5A-5E show another flow diagram of a method according to an embodiment of the present invention for accessing data.
- FIG. 6 is another block diagram of the software components of
server 300 according to an embodiment of the present invention. - It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. It should be noted that the order of the steps of disclosed processes may be altered within the scope of the invention.
- A detailed description of one or more preferred embodiments of the invention is provided below along with accompanying figures that illustrate by way of example the principles of the invention. While the invention is described in connection with such embodiments, it should be understood that the invention is not limited to any embodiment. On the contrary, the scope of the invention is limited only by the appended claims and the invention encompasses numerous alternatives, modifications and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. The present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.
- FIG. 1 is a block diagram of a system for accessing data according to an embodiment of the present invention. FIG. 3 is a block diagram of a system for a multi- node environment according to an embodiment of the present invention. In this example, servers300A-300D are coupled via network interconnects 302. The network interconnects 302 can represent any network infrastructure such as an Ethernet, InfiniBand network or Fibre Channel network capable of host-to-host communication. The servers 300A-300D are also coupled to the
data storage interconnect 304, which in turn is coupled to shared storage 306A-306D. Thedata storage interconnect 304 can be any interconnect that can allow access to the shared storage 306A-306D by servers 300A-300D. An example of thedata storage interconnect 304 is a Fibre Channel switch, such as a Brocade 3200 Fibre Channel switch. Alternately, the data storage network might be an iSCSI or other IP storage network, InfiniBand network, or another kind of host-to-storage network. In addition, the network interconnects 302 and thedata storage interconnect 304 may be embodied in a single interconnect. - Servers300A-300D can be any computer, preferable an off-the-shelf computer or server or any equivalent thereof. Servers 300A-300D can each run operating systems that are independent of each other. Accordingly, each server 300A-300D can, but does not need to, run a different operating system. For example, server 300A may run Microsoft windows, while server 300B runs Linux, and server 300C can simultaneously run a Unix operating system. An advantage of running independent operating systems for the servers 300A-300D is that the entire multi-node system can be dynamic. For example, one of the servers 300A-300D can fail while the other servers 300A-300D continue to operate.
- The shared storage306A-306D can be any storage device, such as hard drive disks, compact disks, tape, and random access memory. A filesystem is a logical entity built on the shared storage. Although the shared storage 306A-306D is typically considered a physical device while the filesystem is typically considered a logical structure overlaid on part of the storage, the filesystem is sometimes referred to herein as shared storage for simplicity. For example, when it is stated that shared storage fails, it can be a failure of a part of a filesystem, one or more filesystems, or the physical storage device on which the filesystem is overlaid. Accordingly, shared storage, as used herein, can mean the physical storage device, a portion of a filesystem, a filesystem, filesystems, or any combination thereof.
- FIG. 2 is another block diagram of a system according to an embodiment of the present invention. In this example, the system preferably has no single point of failure. Accordingly, servers300A′-300D′ are coupled with multiple network interconnects 302A-302D. The servers 300A′-300D′ are also shown to be coupled with multiple storage interconnects 304A-304B. The storage interconnects 304A-304B are each coupled to a plurality of data storage 306A′-306D′.
- In this manner, there are redundancies in the system such that if any of the components or connections fail, the entire system can continue to operate.
- In the example shown in FIG. 2, as well as the example shown in FIG. 1, the number of servers300A′-300D′, the number of storage interconnects 304A-304B, and the number of data storage 306A′-306D′ can be as many as the customer requires and is not physically limited by the system. Likewise, the operating systems used by servers 300A′-300D′ can also be as many independent operating systems as the customer requires.
- FIG. 3 is a block diagram of software components inside a
node 300. In this example,node 300 is shown to include abuffer cache 350, processes 352, a distributed lock manager (DLM) 354, and a lock caching layer (LCL) 356. According to an embodiment of the present invention, a block is kept in the node's cache (in local storage) afternode 300 changes the block rather than writing it immediately into the shared storage. In this manner, it is faster if thatnode 300 can find the latest document in itsown buffer cache 350 rather than taking the time to access the shared storage. - There are various ways to keep a node from using a stale copy of a block. One way is to invalidate the cached copy of the block associated with a lock when the lock is released. Another way is to invalidate or refresh the cached copy of the block associated with a new lock when a new lock is obtained.
- The distributed lock manager communicates with other DLMs in other nodes and also communicates with the
lock caching layer 356. Thelock caching layer 356 calls requested tasks before a lock is downgraded or released. - A
process 352, such as an application or a file system, can obtain a lock on a block via thelock caching layer 356, use it, then eventually relinquish the lock on the block. The block is then stored inbuffer cache 350. The next time aprocess 352 requests that block, a search can be performed in thebuffer cache 350 to find that block. If the block is not found in the buffer cache, then it can be retrieved from the shared storage. - FIGS. 4A-4B show a flow diagram for a method according to an embodiment of the present invention for accessing data. In this example, a process within a particular node requests the lock caching layer (LCL) for a write lock for a document (400). The LCL obtains a distributed lock manager (DLM) lock for that document (402). The LCL grants the LCL lock to the process for that document (404). When the process is finished and relinquishes the LCL lock, the LCL caches the DLM lock (406). In this example, 400-406 occur within a single node. Another node then requests a read lock on the document and the request is received by this node's DLM (408). The DLM asks the LCL to downgrade the DLM lock (450 of FIG. 4B). The LCL then determines that there are no local processes using the lock and writes the document to shared storage (452). The LCL informs the DLM that it is down grading the lock from write to read (454). The DLM then passes the lock as well as the latest version of the document to the requesting node (456).
- Accordingly, by sending the requesting document directly from one node to the other, access to this data is more efficient then having to retrieve it from the shared storage.
- FIGS. 5A-5E show another flow diagram of a method according to an embodiment of the present invention for accessing data. In the example shown in FIGS. 5A-5C the example of a requesting node requesting a shared lock is used. Variations of this example can be used to accommodate other types of locks, such as an exclusive lock or a lock with a different level of exclusion.
- In this example, the requesting node asks its DLM for a shared lock (500). It is determined whether the requesting node is the home node (502). A lock home node, as used herein, is the server that is responsible for granting or denying lock requests for a given DLM lock when there is no cached lock reference available on the requesting node. In this embodiment, there is one lock home node per lock. The home node does not necessarily hold the lock locked but if other nodes hold the lock locked or cached, then the home node has a description of the lock since the other nodes that holds the lock locked or cached communicated with the home node in order to get it locked or cached.
- If the requesting node is not the home node, then the DLM of the requesting node requests a shared lock from the home node (504). It is also determined whether a lock is held by a node other than the requesting node (506). If a lock is held by a node other than the requesting node, the home node then gives the requesting node the lock in shared mode (508). The requesting node then reads the content from shared storage (510).
- If the requesting node is the home node (502), then it is determined whether a lock is held by another node (550). If a lock is not held by another node, then the requesting node obtains the lock and reads from shared storage (562). If, however, there is a lock held by another node, then it is also determined whether the other node holds a shared lock (552). If the other node holds a shared lock, then the requesting node grants itself a shared lock (563) and sends a request for content to the owner of the shared lock (564).
- It is then determined whether the owner has the content in its local cache (580). If yes, the owner of the shared lock sends the content to the requesting node (586), otherwise the owner tells the requesting node that it does not have the content (582) and the requesting node reads the content from shared storage (584). If the other node does not hold a shared lock (552), and instead holds an exclusive lock, then the requesting node sends a request for the downgrade of the lock and content to the owner of the exclusive lock (554).
- Then, it is determined whether the owner has the content in the local cache (590 of FIG. 5C). If the owner has the content in the local cache, the owner writes the content to shared storage (558). The owner then sends the message to the home node (the requestor) with the content and the downgrade request (560). The requesting node then grants itself a shared lock (596).
- If the owner does not have the content in the local cache, it sends the downgrade message to the requesting node (592). The requesting node then grants itself a shared lock and reads the content from shared storage (594).
- If it is determined that a lock is held by a node other than the requesting node (506 of FIG. 5A), then it is also determined whether the held lock is a shared lock (600 of FIG. 5D). If it is a shared lock, then it is also determined whether the home node holds the lock (602). If the home node holds the lock, then it sends the lock as well as the content to the requester (608).
- If the home node does not hold the lock (602), it then sends the content request to the lock holder (612). The content is sent from the lock holder to the home node (614). The home node sends the lock as well as the content to the requester (616).
- If the lock held by another node is not a shared lock (600), for example, it's an exclusive lock, then it is determined whether the home node holds the lock (650 of FIG. 5E). If the home node holds the lock, it then writes the content to the shared storage (654). The home node downgrades the exclusive lock to shared and send the shared lock to the requester along with content if known (656).
- If the home node does not hold the lock (650), it then sends the request for downgrade and content to the owner of the lock (660). The owner of the lock writes the content to shared storage (662). The owner of the lock then sends the content and a message that it is down grading from exclusive lock to shared lock to the home node (664). The home node sends the lock and the content to the requester (666).
- It should be noted that in
steps - If the requesting node requests an exclusive lock in500 of FIG. 5A, rather than a shared lock, then 566 of FIG. 5B would change to “owner of shared lock sends content to requesting node and also gives up the lock to the requesting node”. Likewise, 560 would also change from “downgrading its lock” to “giving up its lock”. 614 of FIG. 5C would add that “the owner of the lock gives up the lock to the requester”. And 664 of FIG. 5D would also change from “downgrade” to “give up its lock”. FIG. 6 is another block diagram of the software components of
server 300 according to an embodiment of the present invention. In an embodiment of the present invention, each server 300A-300D of FIG. 1 includes these software components. - In this embodiment, the following components are shown:
- The Distributed Lock Manager (DLM)1500 manages matrix-wide locks for the filesystem image 306 a-306 d, including the management of lock state during crash recovery. The Matrix Filesystem 1504 uses DLM 1500-managed locks to implement matrix-wide mutual exclusion and matrix-wide filesystem 306 a-306 d metadata and data cache consistency. The
DLM 1500 is a distributed symmetric lock manager. Preferably, there is an instance of theDLM 1500 resident on every server in the matrix. Every instance is a peer to every other instance; there is no master/slave relationship among the instances. - The lock-caching layer (“LCL”)1502 is a component internal to the operating system kernel that interfaces between the Matrix Filesystem 1504 and the application-
level DLM 1500. The purposes of the LCL 1502 include the following: - 1. It hides the details of the
DLM 1500 from kernel-resident clients that need to obtain distributed locks. - 2. It
caches DLM 1500 locks (that is, it may hold on toDLM 1500 locks after clients have released all references to them), sometimes obviating the need for kernel components to communicate with an application-level process (the DLM 1500) to obtain matrix-wide locks. - 3. It provides the ability to obtain locks in both process and server scopes (where a process lock ensures that the corresponding DLM (1500) lock is held, and also excludes local processes attempting to obtain the lock in conflicting modes, whereas a server lock only ensures that the DLM (1500) lock is held, without excluding other local processes).
- 4. It allows clients to define callouts for different types of locks when certain events related to locks occur, particularly the acquisition and surrender of DLM1500-level locks. This ability is a requirement for cache-coherency, which depends on callouts to flush modified cached data to permanent storage when corresponding
DLM 1500 write locks are downgraded or released, and to purge cached data whenDLM 1500 read locks are released. - The LCL1502 is the only kernel component that makes lock requests from the user-
level DLM 1500. It partitions DLM 1500 locks among kernel clients, so that asingle DLM 1500 lock has at most one kernel client on each node, namely, the LCL 1502 itself. EachDLM 1500 lock is the product of an LCL 1502 request, which was induced by a client's request of an LCL 1502 lock, and each LCL 1502 lock is backed by aDLM 1500 lock. - The Matrix Filesystem1504 is the shared filesystem component of The Matrix Server. The Matrix Filesystem 1504 allows multiple servers to simultaneously mount, in read/write mode, filesystems living on physically shared storage devices 306 a-306 d. The Matrix Filesystem 1504 is a distributed symmetric matrixed filesystem; there is no single server that filesystem activity must pass through to perform filesystem activities. The Matrix Filesystem 1504 provides normal local filesystem semantics and interfaces for clients of the filesystem.
- SAN (Storage Area Network)
Membership Service 1506 provides the group membership services infrastructure for the Matrix Filesystem 1504, including managing filesystem membership, health monitoring, coordinating mounts and unmounts of shared filesystems 306 a-306 d, and coordinating crash recovery. -
Matrix Membership Service 1508 provides the Local, matrix-style matrix membership support, including virtual host management, service monitoring, notification services, data replication, etc. The Matrix Filesystem 1504 does not interface directly with theMMS 1508, but the Matrix Filesystem 1504 does interface with theSAN Membership Service 1506, which interfaces with theMMS 1508 in order to provide the filesystem 1504 with the matrix group services infrastructure. - The Shared
Disk Monitor Probe 1510 maintains and monitors the membership of the various shared storage devices in the matrix. It acquires and maintains leases on the various shared storage devices in the matrix as a protection against rogue server “split-brain” conditions. It communicates with theSMS 1506 to coordinate recovery activities on occurrence of a device membership transition. - Filesystem monitors1512 are used by the
SAN Membership Service 1508 to initiate Matrix Filesystem 1504 mounts and unmounts, according to the matrix configuration put in place by the Matrix Server user interface. - The
Service Monitor 1514 tracks the state (health & availability) of various services on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored service transitions. Services monitored include HTTP, FTP, Telnet, SMTP, etc. The remedial actions include service restart on the same server or service fail-over and restart on another server. - The
Device Monitor 1516 tracks the state (health & availability) of various storage-related devices in the matrix so that the matrix server may take automatic remedial action when the state of any monitored device transitions. Devices monitored may include data storage devices 306 a-306 d (such as storage device drives, solid state storage devices, ram storage devices, JOBDs, RAID arrays, etc.)andstorage network devices 304′ (such as FibreChannel Switches, Infiniband Switches, iSCSI switches, etc.). The remedial actions include initiation of Matrix Filesystem 1504 recovery, storage network path failover, and device reset. - The Application Monitor1518 tracks the state (health & availability) of various applications on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored application transitions. Applications monitored may include databases, mail routers, CRM apps, etc. The remedial actions include application restart on the same server or application fail-over and restart on another server.
- The
Notifier Agent 1520 tracks events associated with specified objects in the matrix and executes supplied scripts of commands on occurrence of any tracked event. - The
Replicator Agent 1522 monitors the content of any filesystem subtree and periodically replicates any data which has not yet been replicated from a source tree to a destination tree. - The
Matrix Communication Service 1524 provides the network communication infrastructure for theDLM 1500,Matrix Membership Service 1508, andSAN Membership Service 1506. The Matrix Filesystem 1504 does not use theMCS 1524 directly, but it does use it indirectly through these other components. - The Storage Control Layber (SCL)1526 provides matrix-wide device identification, used to identify the Matrix Filesystems 1504 at mount time. The
SCL 1526 also manages storage fabric configuration and low level I/O device fencing of rogue servers from the shared storage devices 306 a-306 d containing the Matrix Filesystems 1504. It also provides the ability for a server in the matrix to voluntarily intercede during normal device operations to fence itself when communication with rest of the matrix has been lost. - The
Storage Control Layer 1526 is the Matrix Server module responsible for managing shared storage devices 306 a-306 d. Management in this context consists of two primary functions. The first is to enforce I/O fencing at the hardware SAN level by enabling/disabling host access to the set of shared storage devices 306 a-306 d. And the second is to generate global(matrix-wide) unique device names (or “labels”) for all matrix storage devices 306 a-306 d and ensure that all hosts in the matrix have access to those global device names. The SCL module also includes utilities and library routines needed to provide device information to the UI. - The Pseudo Storage Driver1528 is a layered driver that “hides” a target storage device 306 a-306 d so that all references to the underlying target device must pass through the PSD layered driver. Thus, the PSD provides the ability to “fence” a device, blocking all I/O from the host server to the underlying target device until it is unfenced again. The PSD also provides an application-level interface to lock a storage partition across the matrix. It also has the ability to provide common matrix-wide ‘handles’, or paths, to devices such that all servers accessing shared storage in the Matrix Server can use the same path to access a given shared device.
- Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
Claims (12)
1. A method of accessing data in a multi-node system comprising:
providing a first node associated with a first operating system;
providing a second node associated with a second operating system, wherein the second operating system is independent of the first operating system;
providing a storage, wherein the first node directly accesses the storage and the second node directly accesses the storage;
requesting a lock for a block by the first node to the second node;
obtaining the lock from the second node; and
obtaining the block the from the second node.
2. The method of claim 1 , further comprising caching the block.
3. The method of claim 1 , wherein the second node is a home node.
4. The method of claim 1 , further comprising writing the block to the storage.
5. The method of claim 1 , wherein the first node includes a first lock manager and the second node includes a second lock manager.
6. The method of claim 1 , wherein the second node is a home node.
7. A method of accessing data in a node configured for a multi-node environment comprising:
providing a first operating system wherein the first operating system is independent of a second operating system, wherein the second operating system is associated with a second node;
providing a lock manager;
requesting a lock for a block from the second node;
obtaining the lock from the second node; and
obtaining the block from the second node.
8. (not entered)
9. (not entered)
10. A method of accessing data by a first node configured for a multi-node environment comprising:
obtaining a lock for a block from a second node, wherein the first node includes a first operating system and the second node includes a second operating system independent of the first operating system;
altering the block;
writing the block to shared storage;
relinquishing the lock;
caching the block in a local storage.
11. A system of accessing data comprising:
a first node configured to request a lock for a block, wherein the first node includes a first operating system;
a second node configured to receive the request, send the lock and the block to the first node, wherein the second node includes a second operating system independent of the first operating system; and
a storage configured to be accessible by the first and second nodes.
12. A computer program product for accessing data, the computer program product being embodied in a computer readable medium and comprising computer instructions for:
providing a lock manager, wherein the lock manager is configured to work in an environment associated with a first operating system, wherein the first operating system is independent of a second operating system, and wherein the second operating system is associated with a second node;
requesting a lock for a block from the second node;
obtaining the lock from the second node; and
obtaining the block from the second node.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2002/030084 WO2003025802A1 (en) | 2001-09-21 | 2002-09-20 | A system and method for collaborative caching in a multinode system |
US10/251,645 US20040202013A1 (en) | 2001-09-21 | 2002-09-20 | System and method for collaborative caching in a multinode system |
PCT/US2002/029721 WO2003054711A1 (en) | 2001-09-21 | 2002-09-20 | A system and method for management of a storage area network |
AU2002336620A AU2002336620A1 (en) | 2001-09-21 | 2002-09-20 | A system and method for management of a storage area network |
Applications Claiming Priority (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US32419501P | 2001-09-21 | 2001-09-21 | |
US32422401P | 2001-09-21 | 2001-09-21 | |
US32424301P | 2001-09-21 | 2001-09-21 | |
US32424201P | 2001-09-21 | 2001-09-21 | |
US32422601P | 2001-09-21 | 2001-09-21 | |
US32419601P | 2001-09-21 | 2001-09-21 | |
US32478701P | 2001-09-24 | 2001-09-24 | |
US32719101P | 2001-10-01 | 2001-10-01 | |
US10/251,645 US20040202013A1 (en) | 2001-09-21 | 2002-09-20 | System and method for collaborative caching in a multinode system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040202013A1 true US20040202013A1 (en) | 2004-10-14 |
Family
ID=27575390
Family Applications (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/251,894 Expired - Lifetime US7240057B2 (en) | 2001-09-21 | 2002-09-20 | System and method for implementing journaling in a multi-node environment |
US10/251,893 Active 2024-05-31 US7266722B2 (en) | 2001-09-21 | 2002-09-20 | System and method for efficient lock recovery |
US10/251,645 Abandoned US20040202013A1 (en) | 2001-09-21 | 2002-09-20 | System and method for collaborative caching in a multinode system |
US10/251,690 Active 2024-05-26 US7496646B2 (en) | 2001-09-21 | 2002-09-20 | System and method for management of a storage area network |
US10/251,689 Expired - Fee Related US7149853B2 (en) | 2001-09-21 | 2002-09-20 | System and method for synchronization for enforcing mutual exclusion among multiple negotiators |
US10/251,626 Active 2024-05-03 US7111197B2 (en) | 2001-09-21 | 2002-09-20 | System and method for journal recovery for multinode environments |
US10/251,895 Active 2024-11-11 US7437386B2 (en) | 2001-09-21 | 2002-09-20 | System and method for a multi-node environment with shared storage |
US11/499,907 Expired - Lifetime US7467330B2 (en) | 2001-09-21 | 2006-08-04 | System and method for journal recovery for multinode environments |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/251,894 Expired - Lifetime US7240057B2 (en) | 2001-09-21 | 2002-09-20 | System and method for implementing journaling in a multi-node environment |
US10/251,893 Active 2024-05-31 US7266722B2 (en) | 2001-09-21 | 2002-09-20 | System and method for efficient lock recovery |
Family Applications After (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/251,690 Active 2024-05-26 US7496646B2 (en) | 2001-09-21 | 2002-09-20 | System and method for management of a storage area network |
US10/251,689 Expired - Fee Related US7149853B2 (en) | 2001-09-21 | 2002-09-20 | System and method for synchronization for enforcing mutual exclusion among multiple negotiators |
US10/251,626 Active 2024-05-03 US7111197B2 (en) | 2001-09-21 | 2002-09-20 | System and method for journal recovery for multinode environments |
US10/251,895 Active 2024-11-11 US7437386B2 (en) | 2001-09-21 | 2002-09-20 | System and method for a multi-node environment with shared storage |
US11/499,907 Expired - Lifetime US7467330B2 (en) | 2001-09-21 | 2006-08-04 | System and method for journal recovery for multinode environments |
Country Status (7)
Country | Link |
---|---|
US (8) | US7240057B2 (en) |
EP (2) | EP1428151A4 (en) |
JP (2) | JP2005504369A (en) |
CN (2) | CN1302419C (en) |
AU (1) | AU2002341784A1 (en) |
CA (2) | CA2460833C (en) |
WO (5) | WO2003025801A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040172494A1 (en) * | 2003-01-21 | 2004-09-02 | Nextio Inc. | Method and apparatus for shared I/O in a load/store fabric |
US20050286377A1 (en) * | 2002-11-07 | 2005-12-29 | Koninkleijke Philips Electronics, N.V. | Record carrier having a main file system area and a virtual file system area |
US10140194B2 (en) | 2014-03-20 | 2018-11-27 | Hewlett Packard Enterprise Development Lp | Storage system transactions |
US10496538B2 (en) * | 2015-06-30 | 2019-12-03 | Veritas Technologies Llc | System, method and mechanism to efficiently coordinate cache sharing between cluster nodes operating on the same regions of a file or the file system blocks shared among multiple files |
US10725915B1 (en) | 2017-03-31 | 2020-07-28 | Veritas Technologies Llc | Methods and systems for maintaining cache coherency between caches of nodes in a clustered environment |
US20220391374A1 (en) * | 2021-06-08 | 2022-12-08 | International Business Machines Corporation | Identifying resource lock ownership across a clustered computing environment |
Families Citing this family (185)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7412462B2 (en) * | 2000-02-18 | 2008-08-12 | Burnside Acquisition, Llc | Data repository and method for promoting network storage of data |
US6890968B2 (en) * | 2001-05-16 | 2005-05-10 | Kerr Corporation | Prepolymerized filler in dental restorative composite |
US8010558B2 (en) | 2001-06-05 | 2011-08-30 | Silicon Graphics International | Relocation of metadata server with outstanding DMAPI requests |
US7640582B2 (en) | 2003-04-16 | 2009-12-29 | Silicon Graphics International | Clustered filesystem for mix of trusted and untrusted nodes |
US7617292B2 (en) | 2001-06-05 | 2009-11-10 | Silicon Graphics International | Multi-class heterogeneous clients in a clustered filesystem |
US20040139125A1 (en) | 2001-06-05 | 2004-07-15 | Roger Strassburg | Snapshot copy of data volume during data access |
US7702791B2 (en) | 2001-07-16 | 2010-04-20 | Bea Systems, Inc. | Hardware load-balancing apparatus for session replication |
US7409420B2 (en) * | 2001-07-16 | 2008-08-05 | Bea Systems, Inc. | Method and apparatus for session replication and failover |
US7571215B2 (en) * | 2001-07-16 | 2009-08-04 | Bea Systems, Inc. | Data replication protocol |
US7113980B2 (en) | 2001-09-06 | 2006-09-26 | Bea Systems, Inc. | Exactly once JMS communication |
US6826601B2 (en) | 2001-09-06 | 2004-11-30 | Bea Systems, Inc. | Exactly one cache framework |
US7240057B2 (en) * | 2001-09-21 | 2007-07-03 | Kingsbury Brent A | System and method for implementing journaling in a multi-node environment |
US7403996B2 (en) | 2002-02-21 | 2008-07-22 | Bea Systems, Inc. | Systems and methods for migratable services |
US7178050B2 (en) * | 2002-02-22 | 2007-02-13 | Bea Systems, Inc. | System for highly available transaction recovery for transaction processing systems |
US7096213B2 (en) * | 2002-04-08 | 2006-08-22 | Oracle International Corporation | Persistent key-value repository with a pluggable architecture to abstract physical storage |
AU2003214624A1 (en) * | 2002-04-25 | 2003-11-10 | Kashya Israel Ltd. | An apparatus for continuous compression of large volumes of data |
US20030220943A1 (en) * | 2002-05-23 | 2003-11-27 | International Business Machines Corporation | Recovery of a single metadata controller failure in a storage area network environment |
US7774325B2 (en) * | 2002-10-17 | 2010-08-10 | Intel Corporation | Distributed network attached storage system |
US7613797B2 (en) * | 2003-03-19 | 2009-11-03 | Unisys Corporation | Remote discovery and system architecture |
GB0308923D0 (en) * | 2003-04-17 | 2003-05-28 | Ibm | Low-overhead storage cluster configuration locking |
US7409389B2 (en) | 2003-04-29 | 2008-08-05 | International Business Machines Corporation | Managing access to objects of a computing environment |
US7376744B2 (en) * | 2003-05-09 | 2008-05-20 | Oracle International Corporation | Using local locks for global synchronization in multi-node systems |
US20040230896A1 (en) * | 2003-05-16 | 2004-11-18 | Dethe Elza | Method and system for enabling collaborative authoring of hierarchical documents with unique node identifications |
CA2429375A1 (en) * | 2003-05-22 | 2004-11-22 | Cognos Incorporated | Model action logging |
WO2005008434A2 (en) * | 2003-07-11 | 2005-01-27 | Computer Associates Think, Inc. | A distributed locking method and system for networked device management |
US7739541B1 (en) | 2003-07-25 | 2010-06-15 | Symantec Operating Corporation | System and method for resolving cluster partitions in out-of-band storage virtualization environments |
US7356531B1 (en) * | 2003-07-25 | 2008-04-08 | Symantec Operating Corporation | Network file system record lock recovery in a highly available environment |
US8234517B2 (en) * | 2003-08-01 | 2012-07-31 | Oracle International Corporation | Parallel recovery by non-failed nodes |
US7584454B1 (en) * | 2003-09-10 | 2009-09-01 | Nextaxiom Technology, Inc. | Semantic-based transactional support and recovery for nested composite software services |
US20050091215A1 (en) * | 2003-09-29 | 2005-04-28 | Chandra Tushar D. | Technique for provisioning storage for servers in an on-demand environment |
US7234073B1 (en) * | 2003-09-30 | 2007-06-19 | Emc Corporation | System and methods for failover management of manageable entity agents |
US7581205B1 (en) | 2003-09-30 | 2009-08-25 | Nextaxiom Technology, Inc. | System and method of implementing a customizable software platform |
US8225282B1 (en) | 2003-11-25 | 2012-07-17 | Nextaxiom Technology, Inc. | Semantic-based, service-oriented system and method of developing, programming and managing software modules and software solutions |
US20050138154A1 (en) * | 2003-12-18 | 2005-06-23 | Intel Corporation | Enclosure management device |
US7155546B2 (en) * | 2003-12-18 | 2006-12-26 | Intel Corporation | Multiple physical interfaces in a slot of a storage enclosure to support different storage interconnect architectures |
US7376147B2 (en) * | 2003-12-18 | 2008-05-20 | Intel Corporation | Adaptor supporting different protocols |
US8543781B2 (en) | 2004-02-06 | 2013-09-24 | Vmware, Inc. | Hybrid locking using network and on-disk based schemes |
US8560747B1 (en) | 2007-02-16 | 2013-10-15 | Vmware, Inc. | Associating heartbeat data with access to shared resources of a computer system |
US20110179082A1 (en) * | 2004-02-06 | 2011-07-21 | Vmware, Inc. | Managing concurrent file system accesses by multiple servers using locks |
US10776206B1 (en) * | 2004-02-06 | 2020-09-15 | Vmware, Inc. | Distributed transaction system |
US8700585B2 (en) * | 2004-02-06 | 2014-04-15 | Vmware, Inc. | Optimistic locking method and system for committing transactions on a file system |
US7849098B1 (en) * | 2004-02-06 | 2010-12-07 | Vmware, Inc. | Providing multiple concurrent access to a file system |
JP4485256B2 (en) * | 2004-05-20 | 2010-06-16 | 株式会社日立製作所 | Storage area management method and management system |
US7962449B2 (en) * | 2004-06-25 | 2011-06-14 | Apple Inc. | Trusted index structure in a network environment |
US7730012B2 (en) | 2004-06-25 | 2010-06-01 | Apple Inc. | Methods and systems for managing data |
US8131674B2 (en) | 2004-06-25 | 2012-03-06 | Apple Inc. | Methods and systems for managing data |
US7386752B1 (en) * | 2004-06-30 | 2008-06-10 | Symantec Operating Corporation | Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery |
US7769734B2 (en) * | 2004-07-26 | 2010-08-03 | International Business Machines Corporation | Managing long-lived resource locks in a multi-system mail infrastructure |
WO2006015536A1 (en) * | 2004-08-08 | 2006-02-16 | Huawei Technologies Co. Ltd. | A method for realizing notification log operation |
US20060041559A1 (en) * | 2004-08-17 | 2006-02-23 | International Business Machines Corporation | Innovation for managing virtual storage area networks |
US20060059269A1 (en) * | 2004-09-13 | 2006-03-16 | Chien Chen | Transparent recovery of switch device |
US7310711B2 (en) * | 2004-10-29 | 2007-12-18 | Hitachi Global Storage Technologies Netherlands B.V. | Hard disk drive with support for atomic transactions |
US7496701B2 (en) * | 2004-11-18 | 2009-02-24 | International Business Machines Corporation | Managing virtual server control of computer support systems with heartbeat message |
JP4462024B2 (en) | 2004-12-09 | 2010-05-12 | 株式会社日立製作所 | Failover method by disk takeover |
US8495266B2 (en) * | 2004-12-10 | 2013-07-23 | Hewlett-Packard Development Company, L.P. | Distributed lock |
US7506204B2 (en) * | 2005-04-25 | 2009-03-17 | Microsoft Corporation | Dedicated connection to a database server for alternative failure recovery |
US20060242453A1 (en) * | 2005-04-25 | 2006-10-26 | Dell Products L.P. | System and method for managing hung cluster nodes |
JP4648751B2 (en) * | 2005-05-02 | 2011-03-09 | 株式会社日立製作所 | Storage control system and storage control method |
US7631016B2 (en) * | 2005-05-04 | 2009-12-08 | Oracle International Corporation | Providing the latest version of a data item from an N-replica set |
US7356653B2 (en) * | 2005-06-03 | 2008-04-08 | International Business Machines Corporation | Reader-initiated shared memory synchronization |
US7437426B2 (en) * | 2005-09-27 | 2008-10-14 | Oracle International Corporation | Detecting and correcting node misconfiguration of information about the location of shared storage resources |
US8060713B1 (en) | 2005-12-21 | 2011-11-15 | Emc (Benelux) B.V., S.A.R.L. | Consolidating snapshots in a continuous data protection system using journaling |
US7774565B2 (en) * | 2005-12-21 | 2010-08-10 | Emc Israel Development Center, Ltd. | Methods and apparatus for point in time data access and recovery |
US7849361B2 (en) * | 2005-12-22 | 2010-12-07 | Emc Corporation | Methods and apparatus for multiple point in time data access |
US7836033B1 (en) * | 2006-01-24 | 2010-11-16 | Network Appliance, Inc. | Method and apparatus for parallel updates to global state in a multi-processor system |
US20070180287A1 (en) * | 2006-01-31 | 2007-08-02 | Dell Products L. P. | System and method for managing node resets in a cluster |
US7577867B2 (en) * | 2006-02-17 | 2009-08-18 | Emc Corporation | Cross tagging to data for consistent recovery |
US7552148B2 (en) * | 2006-02-28 | 2009-06-23 | Microsoft Corporation | Shutdown recovery |
US7899780B1 (en) * | 2006-03-30 | 2011-03-01 | Emc Corporation | Methods and apparatus for structured partitioning of management information |
CN100383750C (en) * | 2006-06-07 | 2008-04-23 | 中国科学院计算技术研究所 | High-reliable journal system realizing method facing to large-scale computing system |
US7734960B2 (en) * | 2006-08-14 | 2010-06-08 | Hewlett-Packard Development Company, L.P. | Method of managing nodes in computer cluster |
US7886034B1 (en) * | 2006-09-27 | 2011-02-08 | Symantec Corporation | Adaptive liveness management for robust and efficient peer-to-peer storage |
US7627687B2 (en) * | 2006-09-28 | 2009-12-01 | Emc Israel Development Center, Ltd. | Methods and apparatus for managing data flow in a continuous data replication system having journaling |
US7627612B2 (en) * | 2006-09-28 | 2009-12-01 | Emc Israel Development Center, Ltd. | Methods and apparatus for optimal journaling for continuous data replication |
US20080082533A1 (en) * | 2006-09-28 | 2008-04-03 | Tak Fung Wang | Persistent locks/resources for concurrency control |
US8024521B2 (en) * | 2007-03-13 | 2011-09-20 | Sony Computer Entertainment Inc. | Atomic operation on non-standard sized data using external cache |
US7778986B2 (en) * | 2007-08-29 | 2010-08-17 | International Business Machines Corporation | Securing transfer of ownership of a storage object from an unavailable owner node to another node |
US7921272B2 (en) * | 2007-10-05 | 2011-04-05 | International Business Machines Corporation | Monitoring patterns of processes accessing addresses in a storage device to determine access parameters to apply |
US7856536B2 (en) * | 2007-10-05 | 2010-12-21 | International Business Machines Corporation | Providing a process exclusive access to a page including a memory address to which a lock is granted to the process |
US7770064B2 (en) * | 2007-10-05 | 2010-08-03 | International Business Machines Corporation | Recovery of application faults in a mirrored application environment |
US8055855B2 (en) * | 2007-10-05 | 2011-11-08 | International Business Machines Corporation | Varying access parameters for processes to access memory addresses in response to detecting a condition related to a pattern of processes access to memory addresses |
US8041940B1 (en) | 2007-12-26 | 2011-10-18 | Emc Corporation | Offloading encryption processing in a storage area network |
US7958372B1 (en) | 2007-12-26 | 2011-06-07 | Emc (Benelux) B.V., S.A.R.L. | Method and apparatus to convert a logical unit from a first encryption state to a second encryption state using a journal in a continuous data protection environment |
US7840536B1 (en) | 2007-12-26 | 2010-11-23 | Emc (Benelux) B.V., S.A.R.L. | Methods and apparatus for dynamic journal expansion |
US7860836B1 (en) | 2007-12-26 | 2010-12-28 | Emc (Benelux) B.V., S.A.R.L. | Method and apparatus to recover data in a continuous data protection environment using a journal |
US9178785B1 (en) | 2008-01-24 | 2015-11-03 | NextAxiom Technology, Inc | Accounting for usage and usage-based pricing of runtime engine |
US9501542B1 (en) | 2008-03-11 | 2016-11-22 | Emc Corporation | Methods and apparatus for volume synchronization |
US7719443B1 (en) | 2008-06-27 | 2010-05-18 | Emc Corporation | Compressing data in a continuous data protection environment |
US7840730B2 (en) | 2008-06-27 | 2010-11-23 | Microsoft Corporation | Cluster shared volumes |
US8108634B1 (en) | 2008-06-27 | 2012-01-31 | Emc B.V., S.A.R.L. | Replicating a thin logical unit |
US8719473B2 (en) * | 2008-09-19 | 2014-05-06 | Microsoft Corporation | Resource arbitration for shared-write access via persistent reservation |
US8060714B1 (en) | 2008-09-26 | 2011-11-15 | Emc (Benelux) B.V., S.A.R.L. | Initializing volumes in a replication system |
US7882286B1 (en) | 2008-09-26 | 2011-02-01 | EMC (Benelux)B.V., S.A.R.L. | Synchronizing volumes for replication |
WO2010041515A1 (en) * | 2008-10-06 | 2010-04-15 | インターナショナル・ビジネス・マシーンズ・コーポレーション | System accessing shared data by a plurality of application servers |
US8972515B2 (en) * | 2009-03-30 | 2015-03-03 | The Boeing Company | Computer architectures using shared storage |
US8296358B2 (en) * | 2009-05-14 | 2012-10-23 | Hewlett-Packard Development Company, L.P. | Method and system for journaling data updates in a distributed file system |
US8055615B2 (en) * | 2009-08-25 | 2011-11-08 | Yahoo! Inc. | Method for efficient storage node replacement |
US20110055494A1 (en) * | 2009-08-25 | 2011-03-03 | Yahoo! Inc. | Method for distributed direct object access storage |
US9311319B2 (en) * | 2009-08-27 | 2016-04-12 | Hewlett Packard Enterprise Development Lp | Method and system for administration of storage objects |
US20110093745A1 (en) * | 2009-10-20 | 2011-04-21 | Aviad Zlotnick | Systems and methods for implementing test applications for systems using locks |
US8510334B2 (en) | 2009-11-05 | 2013-08-13 | Oracle International Corporation | Lock manager on disk |
US8392680B1 (en) | 2010-03-30 | 2013-03-05 | Emc International Company | Accessing a volume in a distributed environment |
US8103937B1 (en) * | 2010-03-31 | 2012-01-24 | Emc Corporation | Cas command network replication |
US8381014B2 (en) | 2010-05-06 | 2013-02-19 | International Business Machines Corporation | Node controller first failure error management for a distributed system |
US20110276728A1 (en) * | 2010-05-06 | 2011-11-10 | Hitachi, Ltd. | Method and apparatus for storage i/o path configuration |
US8332687B1 (en) | 2010-06-23 | 2012-12-11 | Emc Corporation | Splitter used in a continuous data protection environment |
US9098462B1 (en) | 2010-09-14 | 2015-08-04 | The Boeing Company | Communications via shared memory |
US8478955B1 (en) | 2010-09-27 | 2013-07-02 | Emc International Company | Virtualized consistency group using more than one data protection appliance |
US8433869B1 (en) | 2010-09-27 | 2013-04-30 | Emc International Company | Virtualized consistency group using an enhanced splitter |
US8335771B1 (en) | 2010-09-29 | 2012-12-18 | Emc Corporation | Storage array snapshots for logged access replication in a continuous data protection system |
US8694700B1 (en) | 2010-09-29 | 2014-04-08 | Emc Corporation | Using I/O track information for continuous push with splitter for storage device |
US8589732B2 (en) | 2010-10-25 | 2013-11-19 | Microsoft Corporation | Consistent messaging with replication |
US8335761B1 (en) | 2010-12-02 | 2012-12-18 | Emc International Company | Replicating in a multi-copy environment |
US8812916B2 (en) | 2011-06-02 | 2014-08-19 | International Business Machines Corporation | Failure data management for a distributed computer system |
US9256605B1 (en) | 2011-08-03 | 2016-02-09 | Emc Corporation | Reading and writing to an unexposed device |
US8973018B2 (en) | 2011-08-23 | 2015-03-03 | International Business Machines Corporation | Configuring and relaying events from a storage controller to a host server |
US8694724B1 (en) * | 2011-09-06 | 2014-04-08 | Emc Corporation | Managing data storage by provisioning cache as a virtual device |
US8898112B1 (en) | 2011-09-07 | 2014-11-25 | Emc Corporation | Write signature command |
US8560662B2 (en) * | 2011-09-12 | 2013-10-15 | Microsoft Corporation | Locking system for cluster updates |
US9170852B2 (en) | 2012-02-02 | 2015-10-27 | Microsoft Technology Licensing, Llc | Self-updating functionality in a distributed system |
US20130290385A1 (en) * | 2012-04-30 | 2013-10-31 | Charles B. Morrey, III | Durably recording events for performing file system operations |
US9223659B1 (en) | 2012-06-28 | 2015-12-29 | Emc International Company | Generating and accessing a virtual volume snapshot in a continuous data protection system |
US9218295B2 (en) * | 2012-07-13 | 2015-12-22 | Ca, Inc. | Methods and systems for implementing time-locks |
US9336094B1 (en) | 2012-09-13 | 2016-05-10 | Emc International Company | Scaleout replication of an application |
US10235145B1 (en) | 2012-09-13 | 2019-03-19 | Emc International Company | Distributed scale-out replication |
US9081840B2 (en) * | 2012-09-21 | 2015-07-14 | Citigroup Technology, Inc. | Methods and systems for modeling a replication topology |
US9696939B1 (en) | 2013-03-14 | 2017-07-04 | EMC IP Holding Company LLC | Replicating data using deduplication-based arrays using network-based replication |
US9110914B1 (en) | 2013-03-14 | 2015-08-18 | Emc Corporation | Continuous data protection using deduplication-based storage |
US9383937B1 (en) | 2013-03-14 | 2016-07-05 | Emc Corporation | Journal tiering in a continuous data protection system using deduplication-based storage |
US8996460B1 (en) | 2013-03-14 | 2015-03-31 | Emc Corporation | Accessing an image in a continuous data protection using deduplication-based storage |
US9244997B1 (en) | 2013-03-15 | 2016-01-26 | Emc Corporation | Asymmetric active-active access of asynchronously-protected data storage |
US9081842B1 (en) | 2013-03-15 | 2015-07-14 | Emc Corporation | Synchronous and asymmetric asynchronous active-active-active data access |
US9152339B1 (en) | 2013-03-15 | 2015-10-06 | Emc Corporation | Synchronization of asymmetric active-active, asynchronously-protected storage |
US9069709B1 (en) | 2013-06-24 | 2015-06-30 | Emc International Company | Dynamic granularity in data replication |
US9087112B1 (en) | 2013-06-24 | 2015-07-21 | Emc International Company | Consistency across snapshot shipping and continuous replication |
US9146878B1 (en) | 2013-06-25 | 2015-09-29 | Emc Corporation | Storage recovery from total cache loss using journal-based replication |
US9454485B2 (en) | 2013-08-01 | 2016-09-27 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Sharing local cache from a failover node |
KR102165775B1 (en) * | 2013-10-25 | 2020-10-14 | 어드밴스드 마이크로 디바이시즈, 인코포레이티드 | Method and apparatus for performing a bus lock and translation lookaside buffer invalidation |
US9367260B1 (en) | 2013-12-13 | 2016-06-14 | Emc Corporation | Dynamic replication system |
US9405765B1 (en) | 2013-12-17 | 2016-08-02 | Emc Corporation | Replication of virtual machines |
US9158630B1 (en) | 2013-12-19 | 2015-10-13 | Emc Corporation | Testing integrity of replicated storage |
US9372752B2 (en) * | 2013-12-27 | 2016-06-21 | Intel Corporation | Assisted coherent shared memory |
US9189339B1 (en) | 2014-03-28 | 2015-11-17 | Emc Corporation | Replication of a virtual distributed volume with virtual machine granualarity |
US9686206B2 (en) * | 2014-04-29 | 2017-06-20 | Silicon Graphics International Corp. | Temporal based collaborative mutual exclusion control of a shared resource |
US9497140B2 (en) | 2014-05-14 | 2016-11-15 | International Business Machines Corporation | Autonomous multi-node network configuration and self-awareness through establishment of a switch port group |
US9274718B1 (en) | 2014-06-20 | 2016-03-01 | Emc Corporation | Migration in replication system |
US10082980B1 (en) | 2014-06-20 | 2018-09-25 | EMC IP Holding Company LLC | Migration of snapshot in replication system using a log |
US9619543B1 (en) | 2014-06-23 | 2017-04-11 | EMC IP Holding Company LLC | Replicating in virtual desktop infrastructure |
US10237342B2 (en) * | 2014-09-17 | 2019-03-19 | Dh2I Company | Coordinated and high availability storage access |
US10101943B1 (en) | 2014-09-25 | 2018-10-16 | EMC IP Holding Company LLC | Realigning data in replication system |
US10437783B1 (en) | 2014-09-25 | 2019-10-08 | EMC IP Holding Company LLC | Recover storage array using remote deduplication device |
US10324798B1 (en) | 2014-09-25 | 2019-06-18 | EMC IP Holding Company LLC | Restoring active areas of a logical unit |
US9529885B1 (en) | 2014-09-29 | 2016-12-27 | EMC IP Holding Company LLC | Maintaining consistent point-in-time in asynchronous replication during virtual machine relocation |
US9910621B1 (en) | 2014-09-29 | 2018-03-06 | EMC IP Holding Company LLC | Backlogging I/O metadata utilizing counters to monitor write acknowledgements and no acknowledgements |
US10496487B1 (en) | 2014-12-03 | 2019-12-03 | EMC IP Holding Company LLC | Storing snapshot changes with snapshots |
US9600377B1 (en) | 2014-12-03 | 2017-03-21 | EMC IP Holding Company LLC | Providing data protection using point-in-time images from multiple types of storage devices |
US9405481B1 (en) | 2014-12-17 | 2016-08-02 | Emc Corporation | Replicating using volume multiplexing with consistency group file |
US9632881B1 (en) | 2015-03-24 | 2017-04-25 | EMC IP Holding Company LLC | Replication of a virtual distributed volume |
US10296419B1 (en) | 2015-03-27 | 2019-05-21 | EMC IP Holding Company LLC | Accessing a virtual device using a kernel |
US9411535B1 (en) | 2015-03-27 | 2016-08-09 | Emc Corporation | Accessing multiple virtual devices |
US9678680B1 (en) | 2015-03-30 | 2017-06-13 | EMC IP Holding Company LLC | Forming a protection domain in a storage architecture |
US10853181B1 (en) | 2015-06-29 | 2020-12-01 | EMC IP Holding Company LLC | Backing up volumes using fragment files |
US10360236B2 (en) * | 2015-09-25 | 2019-07-23 | International Business Machines Corporation | Replicating structured query language (SQL) in a heterogeneous replication environment |
US10320703B2 (en) | 2015-09-30 | 2019-06-11 | Veritas Technologies Llc | Preventing data corruption due to pre-existing split brain |
US9684576B1 (en) | 2015-12-21 | 2017-06-20 | EMC IP Holding Company LLC | Replication using a virtual distributed volume |
US10235196B1 (en) | 2015-12-28 | 2019-03-19 | EMC IP Holding Company LLC | Virtual machine joining or separating |
US10133874B1 (en) | 2015-12-28 | 2018-11-20 | EMC IP Holding Company LLC | Performing snapshot replication on a storage system not configured to support snapshot replication |
US10067837B1 (en) | 2015-12-28 | 2018-09-04 | EMC IP Holding Company LLC | Continuous data protection with cloud resources |
US10152267B1 (en) | 2016-03-30 | 2018-12-11 | Emc Corporation | Replication data pull |
US10579282B1 (en) | 2016-03-30 | 2020-03-03 | EMC IP Holding Company LLC | Distributed copy in multi-copy replication where offset and size of I/O requests to replication site is half offset and size of I/O request to production volume |
US10235087B1 (en) | 2016-03-30 | 2019-03-19 | EMC IP Holding Company LLC | Distributing journal data over multiple journals |
US10235060B1 (en) | 2016-04-14 | 2019-03-19 | EMC IP Holding Company, LLC | Multilevel snapshot replication for hot and cold regions of a storage system |
CN106055417B (en) * | 2016-06-02 | 2018-09-11 | 北京百度网讯科技有限公司 | Method for message transmission and device for robot operating system |
US10019194B1 (en) | 2016-09-23 | 2018-07-10 | EMC IP Holding Company LLC | Eventually consistent synchronous data replication in a storage system |
US10146961B1 (en) | 2016-09-23 | 2018-12-04 | EMC IP Holding Company LLC | Encrypting replication journals in a storage system |
US10666569B1 (en) * | 2016-09-23 | 2020-05-26 | Amazon Technologies, Inc. | Journal service with named clients |
US10235091B1 (en) | 2016-09-23 | 2019-03-19 | EMC IP Holding Company LLC | Full sweep disk synchronization in a storage system |
US10210073B1 (en) | 2016-09-23 | 2019-02-19 | EMC IP Holding Company, LLC | Real time debugging of production replicated data with data obfuscation in a storage system |
US10346366B1 (en) | 2016-09-23 | 2019-07-09 | Amazon Technologies, Inc. | Management of a data processing pipeline |
US10805238B1 (en) | 2016-09-23 | 2020-10-13 | Amazon Technologies, Inc. | Management of alternative resources |
US10423459B1 (en) | 2016-09-23 | 2019-09-24 | Amazon Technologies, Inc. | Resource manager |
US10235090B1 (en) | 2016-09-23 | 2019-03-19 | EMC IP Holding Company LLC | Validating replication copy consistency using a hash function in a storage system |
US10459810B2 (en) | 2017-07-06 | 2019-10-29 | Oracle International Corporation | Technique for higher availability in a multi-node system using replicated lock information to determine a set of data blocks for recovery |
US11144493B1 (en) | 2018-05-02 | 2021-10-12 | Ecosense Lighting Inc. | Composite interface circuit |
CN109376014B (en) * | 2018-10-19 | 2021-07-02 | 郑州云海信息技术有限公司 | Distributed lock manager implementation method and system |
Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276872A (en) * | 1991-06-25 | 1994-01-04 | Digital Equipment Corporation | Concurrency and recovery for index trees with nodal updates using multiple atomic actions by which the trees integrity is preserved during undesired system interruptions |
US5438464A (en) * | 1993-04-23 | 1995-08-01 | Quantum Corporation | Synchronization of multiple disk drive spindles |
US5678026A (en) * | 1995-12-28 | 1997-10-14 | Unisys Corporation | Multi-processor data processing system with control for granting multiple storage locks in parallel and parallel lock priority and second level cache priority queues |
US5751992A (en) * | 1994-09-23 | 1998-05-12 | International Business Machines Corporation | Computer program product for continuous destaging of changed data from a shared cache in a multisystem shared disk environment wherein castout interest is established in a hierarchical fashion |
US5813016A (en) * | 1995-03-01 | 1998-09-22 | Fujitsu Limited | Device/system for processing shared data accessed by a plurality of data processing devices/systems |
US5850507A (en) * | 1996-03-19 | 1998-12-15 | Oracle Corporation | Method and apparatus for improved transaction recovery |
US5909540A (en) * | 1996-11-22 | 1999-06-01 | Mangosoft Corporation | System and method for providing highly available data storage using globally addressable memory |
US5913227A (en) * | 1997-03-24 | 1999-06-15 | Emc Corporation | Agent-implemented locking mechanism |
US5920872A (en) * | 1996-06-25 | 1999-07-06 | Oracle Corporation | Resource management using resource domains |
US5953719A (en) * | 1997-09-15 | 1999-09-14 | International Business Machines Corporation | Heterogeneous database system with dynamic commit procedure control |
US5960446A (en) * | 1997-07-11 | 1999-09-28 | International Business Machines Corporation | Parallel file system and method with allocation map |
US5987506A (en) * | 1996-11-22 | 1999-11-16 | Mangosoft Corporation | Remote access and geographically distributed computers in a globally addressable storage environment |
US6009466A (en) * | 1997-10-31 | 1999-12-28 | International Business Machines Corporation | Network management system for enabling a user to configure a network of storage devices via a graphical user interface |
US6021508A (en) * | 1997-07-11 | 2000-02-01 | International Business Machines Corporation | Parallel file system and method for independent metadata loggin |
US6026474A (en) * | 1996-11-22 | 2000-02-15 | Mangosoft Corporation | Shared client-side web caching using globally addressable memory |
US6044367A (en) * | 1996-08-02 | 2000-03-28 | Hewlett-Packard Company | Distributed I/O store |
US6108654A (en) * | 1997-10-31 | 2000-08-22 | Oracle Corporation | Method and system for locking resources in a computer system |
US6112281A (en) * | 1997-10-07 | 2000-08-29 | Oracle Corporation | I/O forwarding in a cache coherent shared disk computer system |
US6154512A (en) * | 1998-11-19 | 2000-11-28 | Nortel Networks Corporation | Digital phase lock loop with control for enabling and disabling synchronization |
US6163855A (en) * | 1998-04-17 | 2000-12-19 | Microsoft Corporation | Method and system for replicated and consistent modifications in a server cluster |
US6226717B1 (en) * | 1999-02-04 | 2001-05-01 | Compaq Computer Corporation | System and method for exclusive access to shared storage |
US6256740B1 (en) * | 1998-02-06 | 2001-07-03 | Ncr Corporation | Name service for multinode system segmented into I/O and compute nodes, generating guid at I/O node and exporting guid to compute nodes via interconnect fabric |
US6269410B1 (en) * | 1999-02-12 | 2001-07-31 | Hewlett-Packard Co | Method and apparatus for using system traces to characterize workloads in a data storage system |
US6272491B1 (en) * | 1998-08-24 | 2001-08-07 | Oracle Corporation | Method and system for mastering locks in a multiple server database system |
US6370625B1 (en) * | 1999-12-29 | 2002-04-09 | Intel Corporation | Method and apparatus for lock synchronization in a microprocessor system |
US6421723B1 (en) * | 1999-06-11 | 2002-07-16 | Dell Products L.P. | Method and system for establishing a storage area network configuration |
US20020184216A1 (en) * | 2001-05-31 | 2002-12-05 | Sashikanth Chandrasekaran | Method and apparatus for reducing latency and message traffic during data and lock transfer in a multi-node system |
Family Cites Families (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0833857B2 (en) * | 1987-02-18 | 1996-03-29 | 株式会社日立製作所 | System database sharing system system system |
JP2667039B2 (en) | 1990-05-18 | 1997-10-22 | 株式会社東芝 | Data management system and data management method |
JPH0827755B2 (en) * | 1991-02-15 | 1996-03-21 | インターナショナル・ビジネス・マシーンズ・コーポレイション | How to access data units at high speed |
JP3023441B2 (en) * | 1993-11-16 | 2000-03-21 | 株式会社日立製作所 | Database division management method and parallel database system |
DE4341877A1 (en) * | 1993-12-08 | 1995-06-14 | Siemens Ag | Coordination to access multiple processors to common resource |
US5454108A (en) * | 1994-01-26 | 1995-09-26 | International Business Machines Corporation | Distributed lock manager using a passive, state-full control-server |
US5699500A (en) * | 1995-06-01 | 1997-12-16 | Ncr Corporation | Reliable datagram service provider for fast messaging in a clustered environment |
US5594863A (en) * | 1995-06-26 | 1997-01-14 | Novell, Inc. | Method and apparatus for network file recovery |
US6356740B1 (en) * | 1995-06-30 | 2002-03-12 | Hughes Electronics Corporation | Method and system of frequency stabilization in a mobile satellite communication system |
JPH09114721A (en) | 1995-10-19 | 1997-05-02 | Nec Corp | Device sharing method and device sharing system in local area network |
US6016505A (en) * | 1996-04-30 | 2000-01-18 | International Business Machines Corporation | Program product to effect barrier synchronization in a distributed computing environment |
US6026426A (en) | 1996-04-30 | 2000-02-15 | International Business Machines Corporation | Application programming interface unifying multiple mechanisms |
US5875469A (en) * | 1996-08-26 | 1999-02-23 | International Business Machines Corporation | Apparatus and method of snooping processors and look-aside caches |
US5974250A (en) * | 1996-12-13 | 1999-10-26 | Compaq Computer Corp. | System and method for secure information transmission over a network |
US6108757A (en) * | 1997-02-28 | 2000-08-22 | Lucent Technologies Inc. | Method for locking a shared resource in multiprocessor system |
FR2762418B1 (en) * | 1997-04-17 | 1999-06-11 | Alsthom Cge Alcatel | METHOD FOR MANAGING A SHARED MEMORY |
US6237001B1 (en) * | 1997-04-23 | 2001-05-22 | Oracle Corporation | Managing access to data in a distributed database environment |
JPH11143843A (en) | 1997-11-06 | 1999-05-28 | Hitachi Ltd | Operation condition management method for plural nodes configuration system |
US6199105B1 (en) * | 1997-12-09 | 2001-03-06 | Nec Corporation | Recovery system for system coupling apparatuses, and recording medium recording recovery program |
US6173293B1 (en) * | 1998-03-13 | 2001-01-09 | Digital Equipment Corporation | Scalable distributed file system |
US6438582B1 (en) * | 1998-07-21 | 2002-08-20 | International Business Machines Corporation | Method and system for efficiently coordinating commit processing in a parallel or distributed database system |
US6178519B1 (en) * | 1998-12-10 | 2001-01-23 | Mci Worldcom, Inc. | Cluster-wide database system |
US6757277B1 (en) * | 1999-01-26 | 2004-06-29 | Siemens Information And Communication Networks, Inc. | System and method for coding algorithm policy adjustment in telephony-over-LAN networks |
US6725392B1 (en) * | 1999-03-03 | 2004-04-20 | Adaptec, Inc. | Controller fault recovery system for a distributed file system |
WO2000062502A2 (en) * | 1999-04-12 | 2000-10-19 | Rainfinity, Inc. | Distributed server cluster for controlling network traffic |
KR20010074733A (en) * | 1999-05-20 | 2001-08-09 | 황 이반 충슝 | A method and apparatus for implementing a workgroup server array |
JP4057201B2 (en) | 1999-09-16 | 2008-03-05 | 富士通株式会社 | High-speed data exchange method between different computers and extent extraction / conversion program recording medium |
US6598058B2 (en) * | 1999-09-22 | 2003-07-22 | International Business Machines Corporation | Method and apparatus for cross-node sharing of cached dynamic SQL in a multiple relational database management system environment |
US6865549B1 (en) * | 1999-11-15 | 2005-03-08 | Sun Microsystems, Inc. | Method and apparatus for concurrency control in a policy-based management system |
US6473819B1 (en) * | 1999-12-17 | 2002-10-29 | International Business Machines Corporation | Scalable interruptible queue locks for shared-memory multiprocessor |
US6618819B1 (en) * | 1999-12-23 | 2003-09-09 | Nortel Networks Limited | Sparing system and method to accommodate equipment failures in critical systems |
US7062648B2 (en) | 2000-02-18 | 2006-06-13 | Avamar Technologies, Inc. | System and method for redundant array network storage |
US6643748B1 (en) * | 2000-04-20 | 2003-11-04 | Microsoft Corporation | Programmatic masking of storage units |
US20030041138A1 (en) * | 2000-05-02 | 2003-02-27 | Sun Microsystems, Inc. | Cluster membership monitor |
US6530004B1 (en) * | 2000-06-20 | 2003-03-04 | International Business Machines Corporation | Efficient fault-tolerant preservation of data integrity during dynamic RAID data migration |
US7844513B2 (en) | 2000-07-17 | 2010-11-30 | Galactic Computing Corporation Bvi/Bc | Method and system for operating a commissioned e-commerce service prover |
WO2002015449A2 (en) | 2000-08-17 | 2002-02-21 | Broadcom Corporation | Method and system for transmitting isochronous voice in a wireless network |
US6665814B2 (en) * | 2000-11-29 | 2003-12-16 | International Business Machines Corporation | Method and apparatus for providing serialization support for a computer system |
US6976060B2 (en) * | 2000-12-05 | 2005-12-13 | Agami Sytems, Inc. | Symmetric shared file storage system |
US8219662B2 (en) | 2000-12-06 | 2012-07-10 | International Business Machines Corporation | Redirecting data generated by network devices |
US20040213239A1 (en) * | 2000-12-15 | 2004-10-28 | Lin Xinming A. | Implementation of IP multicast on ATM network with EMCON links |
US6804794B1 (en) * | 2001-02-28 | 2004-10-12 | Emc Corporation | Error condition handling |
US7130316B2 (en) | 2001-04-11 | 2006-10-31 | Ati Technologies, Inc. | System for frame based audio synchronization and method thereof |
US6708175B2 (en) * | 2001-06-06 | 2004-03-16 | International Business Machines Corporation | Program support for disk fencing in a shared disk parallel file system across storage area network |
US7240057B2 (en) * | 2001-09-21 | 2007-07-03 | Kingsbury Brent A | System and method for implementing journaling in a multi-node environment |
US6871268B2 (en) * | 2002-03-07 | 2005-03-22 | International Business Machines Corporation | Methods and systems for distributed caching in presence of updates and in accordance with holding times |
US6862666B2 (en) * | 2002-05-16 | 2005-03-01 | Sun Microsystems, Inc. | Hardware assisted lease-based access to memory |
-
2002
- 2002-09-20 US US10/251,894 patent/US7240057B2/en not_active Expired - Lifetime
- 2002-09-20 JP JP2003529357A patent/JP2005504369A/en active Pending
- 2002-09-20 CN CNB028230981A patent/CN1302419C/en not_active Expired - Lifetime
- 2002-09-20 CA CA2460833A patent/CA2460833C/en not_active Expired - Lifetime
- 2002-09-20 AU AU2002341784A patent/AU2002341784A1/en not_active Abandoned
- 2002-09-20 CN CNB028232313A patent/CN1320483C/en not_active Expired - Lifetime
- 2002-09-20 US US10/251,893 patent/US7266722B2/en active Active
- 2002-09-20 US US10/251,645 patent/US20040202013A1/en not_active Abandoned
- 2002-09-20 US US10/251,690 patent/US7496646B2/en active Active
- 2002-09-20 WO PCT/US2002/030082 patent/WO2003025801A1/en active Application Filing
- 2002-09-20 WO PCT/US2002/030083 patent/WO2003025780A1/en not_active Application Discontinuation
- 2002-09-20 WO PCT/US2002/029859 patent/WO2003027903A1/en active Application Filing
- 2002-09-20 WO PCT/US2002/029857 patent/WO2003027853A1/en not_active Application Discontinuation
- 2002-09-20 JP JP2003531367A patent/JP4249622B2/en not_active Expired - Lifetime
- 2002-09-20 US US10/251,689 patent/US7149853B2/en not_active Expired - Fee Related
- 2002-09-20 US US10/251,626 patent/US7111197B2/en active Active
- 2002-09-20 EP EP02775933A patent/EP1428151A4/en not_active Withdrawn
- 2002-09-20 WO PCT/US2002/030085 patent/WO2003025751A1/en not_active Application Discontinuation
- 2002-09-20 CA CA002461015A patent/CA2461015A1/en not_active Abandoned
- 2002-09-20 US US10/251,895 patent/US7437386B2/en active Active
- 2002-09-20 EP EP02768873A patent/EP1428149B1/en not_active Expired - Lifetime
-
2006
- 2006-08-04 US US11/499,907 patent/US7467330B2/en not_active Expired - Lifetime
Patent Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276872A (en) * | 1991-06-25 | 1994-01-04 | Digital Equipment Corporation | Concurrency and recovery for index trees with nodal updates using multiple atomic actions by which the trees integrity is preserved during undesired system interruptions |
US5438464A (en) * | 1993-04-23 | 1995-08-01 | Quantum Corporation | Synchronization of multiple disk drive spindles |
US5751992A (en) * | 1994-09-23 | 1998-05-12 | International Business Machines Corporation | Computer program product for continuous destaging of changed data from a shared cache in a multisystem shared disk environment wherein castout interest is established in a hierarchical fashion |
US5813016A (en) * | 1995-03-01 | 1998-09-22 | Fujitsu Limited | Device/system for processing shared data accessed by a plurality of data processing devices/systems |
US5678026A (en) * | 1995-12-28 | 1997-10-14 | Unisys Corporation | Multi-processor data processing system with control for granting multiple storage locks in parallel and parallel lock priority and second level cache priority queues |
US5850507A (en) * | 1996-03-19 | 1998-12-15 | Oracle Corporation | Method and apparatus for improved transaction recovery |
US5920872A (en) * | 1996-06-25 | 1999-07-06 | Oracle Corporation | Resource management using resource domains |
US6044367A (en) * | 1996-08-02 | 2000-03-28 | Hewlett-Packard Company | Distributed I/O store |
US5909540A (en) * | 1996-11-22 | 1999-06-01 | Mangosoft Corporation | System and method for providing highly available data storage using globally addressable memory |
US5987506A (en) * | 1996-11-22 | 1999-11-16 | Mangosoft Corporation | Remote access and geographically distributed computers in a globally addressable storage environment |
US6026474A (en) * | 1996-11-22 | 2000-02-15 | Mangosoft Corporation | Shared client-side web caching using globally addressable memory |
US5913227A (en) * | 1997-03-24 | 1999-06-15 | Emc Corporation | Agent-implemented locking mechanism |
US5960446A (en) * | 1997-07-11 | 1999-09-28 | International Business Machines Corporation | Parallel file system and method with allocation map |
US6021508A (en) * | 1997-07-11 | 2000-02-01 | International Business Machines Corporation | Parallel file system and method for independent metadata loggin |
US5953719A (en) * | 1997-09-15 | 1999-09-14 | International Business Machines Corporation | Heterogeneous database system with dynamic commit procedure control |
US6112281A (en) * | 1997-10-07 | 2000-08-29 | Oracle Corporation | I/O forwarding in a cache coherent shared disk computer system |
US6009466A (en) * | 1997-10-31 | 1999-12-28 | International Business Machines Corporation | Network management system for enabling a user to configure a network of storage devices via a graphical user interface |
US6108654A (en) * | 1997-10-31 | 2000-08-22 | Oracle Corporation | Method and system for locking resources in a computer system |
US6256740B1 (en) * | 1998-02-06 | 2001-07-03 | Ncr Corporation | Name service for multinode system segmented into I/O and compute nodes, generating guid at I/O node and exporting guid to compute nodes via interconnect fabric |
US6163855A (en) * | 1998-04-17 | 2000-12-19 | Microsoft Corporation | Method and system for replicated and consistent modifications in a server cluster |
US6272491B1 (en) * | 1998-08-24 | 2001-08-07 | Oracle Corporation | Method and system for mastering locks in a multiple server database system |
US6154512A (en) * | 1998-11-19 | 2000-11-28 | Nortel Networks Corporation | Digital phase lock loop with control for enabling and disabling synchronization |
US6226717B1 (en) * | 1999-02-04 | 2001-05-01 | Compaq Computer Corporation | System and method for exclusive access to shared storage |
US6269410B1 (en) * | 1999-02-12 | 2001-07-31 | Hewlett-Packard Co | Method and apparatus for using system traces to characterize workloads in a data storage system |
US6421723B1 (en) * | 1999-06-11 | 2002-07-16 | Dell Products L.P. | Method and system for establishing a storage area network configuration |
US6370625B1 (en) * | 1999-12-29 | 2002-04-09 | Intel Corporation | Method and apparatus for lock synchronization in a microprocessor system |
US20020184216A1 (en) * | 2001-05-31 | 2002-12-05 | Sashikanth Chandrasekaran | Method and apparatus for reducing latency and message traffic during data and lock transfer in a multi-node system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050286377A1 (en) * | 2002-11-07 | 2005-12-29 | Koninkleijke Philips Electronics, N.V. | Record carrier having a main file system area and a virtual file system area |
US20040172494A1 (en) * | 2003-01-21 | 2004-09-02 | Nextio Inc. | Method and apparatus for shared I/O in a load/store fabric |
US7457906B2 (en) * | 2003-01-21 | 2008-11-25 | Nextio, Inc. | Method and apparatus for shared I/O in a load/store fabric |
US10140194B2 (en) | 2014-03-20 | 2018-11-27 | Hewlett Packard Enterprise Development Lp | Storage system transactions |
US10496538B2 (en) * | 2015-06-30 | 2019-12-03 | Veritas Technologies Llc | System, method and mechanism to efficiently coordinate cache sharing between cluster nodes operating on the same regions of a file or the file system blocks shared among multiple files |
US10725915B1 (en) | 2017-03-31 | 2020-07-28 | Veritas Technologies Llc | Methods and systems for maintaining cache coherency between caches of nodes in a clustered environment |
US11500773B2 (en) | 2017-03-31 | 2022-11-15 | Veritas Technologies Llc | Methods and systems for maintaining cache coherency between nodes in a clustered environment by performing a bitmap lookup in response to a read request from one of the nodes |
US20220391374A1 (en) * | 2021-06-08 | 2022-12-08 | International Business Machines Corporation | Identifying resource lock ownership across a clustered computing environment |
US11880350B2 (en) * | 2021-06-08 | 2024-01-23 | International Business Machines Corporation | Identifying resource lock ownership across a clustered computing environment |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20040202013A1 (en) | System and method for collaborative caching in a multinode system | |
US7076510B2 (en) | Software raid methods and apparatuses including server usage based write delegation | |
US10534681B2 (en) | Clustered filesystems for mix of trusted and untrusted nodes | |
US9442952B2 (en) | Metadata structures and related locking techniques to improve performance and scalability in a cluster file system | |
US8495131B2 (en) | Method, system, and program for managing locks enabling access to a shared resource | |
US6850955B2 (en) | Storage system and control method | |
US7721144B2 (en) | Methods and systems for implementing shared disk array management functions | |
US6986015B2 (en) | Fast path caching | |
US7013379B1 (en) | I/O primitives | |
US8001222B2 (en) | Clustered filesystem with membership version support | |
US6915391B2 (en) | Support for single-node quorum in a two-node nodeset for a shared disk parallel file system | |
US20030028514A1 (en) | Extended attribute caching in clustered filesystem | |
US20070016754A1 (en) | Fast path for performing data operations | |
US20080215839A1 (en) | Providing Storage Control in a Network of Storage Controllers | |
WO2003025802A1 (en) | A system and method for collaborative caching in a multinode system | |
Lee et al. | A comparison of two distributed disk systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: POLYSERVE, INC., OREGON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DOVE, KENNETH F.;KINGSBURY, BRENT A.;REVITCH, SAM;AND OTHERS;REEL/FRAME:013543/0739 Effective date: 20021025 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |