WO2003054711A1 - A system and method for management of a storage area network - Google Patents

A system and method for management of a storage area network Download PDF

Info

Publication number
WO2003054711A1
WO2003054711A1 PCT/US2002/029721 US0229721W WO03054711A1 WO 2003054711 A1 WO2003054711 A1 WO 2003054711A1 US 0229721 W US0229721 W US 0229721W WO 03054711 A1 WO03054711 A1 WO 03054711A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
node
change
area network
server
Prior art date
Application number
PCT/US2002/029721
Other languages
French (fr)
Other versions
WO2003054711A9 (en
Inventor
Corene Casper
Kenneth F. Dove
Original Assignee
Polyserve, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polyserve, Inc. filed Critical Polyserve, Inc.
Priority to AU2002336620A priority Critical patent/AU2002336620A1/en
Priority claimed from US10/251,645 external-priority patent/US20040202013A1/en
Publication of WO2003054711A1 publication Critical patent/WO2003054711A1/en
Publication of WO2003054711A9 publication Critical patent/WO2003054711A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/1827Management specifically adapted to NAS

Definitions

  • 60/324,224 (Attorney Docket No. POLYP003+) entitled COLLABORATIVE CACHING IN A MULTI-NODE FILESYSTEM filed September 21, 2001, which is incorporated herein by reference for all purposes.
  • This application claims priority to U.S. Provisional Patent Application No 60/324,242 (Attorney Docket No. POLYP005+) entitled DISTRIBUTED MANAGEMENT OF A STORAGE AREA NETWORK filed September 21, 2001, which is incorporated herein by reference for all purposes.
  • the present invention relates generally to computer systems.
  • the present invention relates to computer systems that share resources such as storage.
  • Servers are typically used for big applications and work loads such as those used in conjunction with large web services and manufacturing. Often, a single server does not have enough power to perform the required application. Several servers may be used in conjunction with several storage devices in a storage area network (SAN) to accommodate heavy traffic. As systems get larger, applications often need to be highly available to avoid interruptions in service.
  • SAN storage area network
  • a typical server management system uses a single management control station that manages the servers and the shared storage.
  • a potential problem of such a system is that it may have a single point of failure which can cause a shut-down of the entire storage area network to perform maintenance.
  • Another potential problem is that there is typically no dynamic cooperation between the servers in case a change to the system occurs. Often in such a system all servers need be shutdown to perform a simple reconfiguration of the shared storage. This type of interruption is typically unacceptable for mission critical applications
  • Fig. 1 is a block diagram of a shared storage system suitable for facilitating an embodiment of the present invention.
  • Fig. 2 is another block diagram of a system according to an embodiment of the present invention.
  • Fig. 3 is a block diagram of the software components of a server according to an embodiment of the present invention.
  • Figs. 4A-4B are flow diagrams of a method according to an embodiment of the present invention for adding a node.
  • Figs. 5A-5C are flow diagrams of a method according to the present invention for handling a server failure.
  • Fig. 6 is flow diagram of a method according to an embodiment of the present invention for adding or removing shared storage.
  • Fig. 7 is a flow diagram of a method according to an embodiment of the present invention for managing a storage area network.
  • Fig. 8 is a flow diagram of a method managing a storage area network in a first node according to an embodiment of the present invention.
  • the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. It should be noted that the order of the steps of disclosed processes may be altered within the scope of the invention.
  • Fig. 1 is a block diagram of a shared storage system suitable for facilitating the management of a storage area network according to an embodiment of the present invention.
  • nodes 102A-102D are coupled together through a network switch 100.
  • the network switch 100 can represent any network infrastructure such as an Ethernet.
  • the nodes 102A-102D are also shown to be coupled to a data storage interconnect 104.
  • An example of the data storage interconnect 104 is a fiber channel switch, such as a Brocade fiber channel switch.
  • nodes 102A-102D include but are not limited to computers, servers, and any other processing units or applications that can share storage or data.
  • nodes 102A-102D will sometimes be referred to as servers.
  • the data interconnect 104 is shown to be coupled to shared storage 106A-106D. Examples of shared storage 106A-106D include any form of storage such as hard drive disks, compact disks, tape, and random access memory.
  • Fig. 1 Although the system shown in Fig. 1 is a multiple node system, the present invention can also be used with a single computer system for synchronizing various applications as they share data on a shared storage.
  • Shared storage can be any storage device, such as hard drive disks, compact disks, tape, and random access memory.
  • a filesystem is a logical entity built on the shared storage. Although the shared storage is typically considered a physical device while the filesystem is typically considered a logical structure overlaid on part of the storage, the filesystem is sometimes referred to herein as shared storage for simplicity. For example, when it is stated that shared storage fails, it can be a failure of a part of a filesystem, one or more filesystems, or the physical storage device on which the filesystem is overlaid. Accordingly, shared storage, as used herein, can mean the physical storage device, a portion of a filesystem, a filesystem, filesystems, or any combination thereof.
  • FIG. 2 is another block diagram of a system according to an embodiment of the present invention.
  • the system preferably has no single point of failure.
  • servers 102 A' - 102D' are coupled with multiple network interconnects 100A-100D.
  • the servers 102A-102D' are also shown to be coupled with multiple storage interconnects 104A-104B.
  • the storage interconnects 104A- 104B are each coupled to a plurality of data storage 106A'-106D ⁇
  • the number of servers 102 A- 102D', the number of storage interconnects 104 A- 104B, and the number of data storage 106 -106D' can be as many as the customer requires and is not physically limited by the system.
  • the operating systems used by servers lOOA'-lOOD' can also be as many independent operating systems as the customer requires.
  • Fig. 3 is a block diagram of the software components of server 100. In this embodiment, the following components are shown:
  • the Distributed Lock Manager (DLM) 500 manages matrix-wide locks for the filesystem image 306a-306d, including the management of lock state during crash recovery.
  • the Matrix Filesystem 504 uses DLM 500-managed locks to implement matrix-wide mutual exclusion and matrix-wide filesystem 306a-306d metadata and data cache consistency.
  • the DLM 500 is a distributed symmetric lock manager.
  • the lock-caching layer (“LCL”) 502 is a component internal to the operating system kernel that interfaces between the Matrix Filesystem 504 and the application- level DLM 500.
  • the purposes of the LCL 502 include the following:
  • DLM 500 It caches DLM 500 locks (that is, it may hold on to DLM 500 locks after clients have released all references to them), sometimes obviating the need for kernel components to communicate with an application-level process (the DLM 500) to obtain matrix-wide locks.
  • the LCL 502 is the only kernel component that makes lock requests from the user-level DLM 500. It partitions DLM 500 locks among kernel clients, so that a single DLM 500 lock has at most one kernel client on each node, namely, the LCL 502 itself. Each DLM 500 lock is the product of an LCL 502 request, which was induced by a client's request of an LCL 502 lock, and each LCL 502 lock is backed by a DLM 500 lock.
  • the Matrix Filesystem 504 is the shared filesystem component of The Matrix Server.
  • the Matrix Filesystem 504 allows multiple servers to simultaneously mount, in read/write mode, filesystems living on physically shared storage devices 306a- 306d.
  • the Matrix Filesystem 504 is a distributed symmetric matrixed filesystem; there is no single server that filesystem activity must pass through to perform filesystem activities.
  • the Matrix Filesystem 504 provides normal local filesystem semantics and interfaces for clients of the filesystem.
  • SAN (Storage Area Network) Membership Service 506 provides the group membership services infrastructure for the Matrix Filesystem 504, including managing filesystem membership, health monitoring, coordinating mounts and unmounts of shared filesystems 306a-306d, and coordinating crash recovery.
  • Matrix Membership Service 508 provides the Local, matrix-style matrix membership support, including virtual host management, service monitoring, notification services, data replication, etc.
  • the Matrix Filesystem 504 does not interface directly with the MMS 508, but the Matrix Filesystem 504 does interface with the SAN Membership Service 506, which interfaces with the MMS 508 in order to provide the filesystem 504 with the matrix group services infrastructure.
  • the Shared Disk Monitor Probe 510 maintains and monitors the membership of the various shared storage devices in the matrix. It acquires and maintains leases on the various shared storage devices in the matrix as a protection against rogue server "split-brain" conditions. It communicates with the SMS 506 to coordinate recovery activities on occurrence of a device membership transition.
  • Filesystem monitors 512 are used by the SAN Membership Service 508 to initiate Matrix Filesystem 504 mounts and unmounts, according to the matrix configuration put in place by the Matrix Server user interface.
  • the Service Monitor 514 tracks the state (health & availability) of various services on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored service transitions.
  • Services monitored include HTTP, FTP, Telnet, SMTP, etc.
  • the remedial actions include service restart on the same server or service fail-over and restart on another server.
  • the Device Monitor 516 tracks the state (health & availability) of various storage-related devices in the matrix so that the matrix server may take automatic remedial action when the state of any monitored device transitions.
  • Devices monitored may include data storage devices 306a-306d (such as storage device drives, solid state storage devices, ram storage devices, JOBDs, RAID arrays, etc.)and storage network devices 304' (such as FibreChannel Switches, Infiniband Switches, iSCSI switches, etc.).
  • the remedial actions include initiation of Matrix Filesystem 504 recovery, storage network path failover, and device reset.
  • the Application Monitor 518 tracks the state (health & availability) of various applications on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored application transitions.
  • Applications monitored may include databases, mail routers, CRM apps, etc.
  • the remedial actions include application restart on the same server or application fail-over and restart on another server.
  • the Notifier Agent 520 tracks events associated with specified objects in the matrix and executes supplied scripts of commands on occurrence of any tracked event.
  • the Replicator Agent 522 monitors the content of any filesystem subtree and periodically replicates any data which has not yet been replicated from a source tree to a destination tree.
  • the Replicator Agent 522 is preferably used to duplicate file private files between servers that are not accessed using Shared Data Storage (306).
  • the Matrix Communication Service 524 provides the network communication infrastructure for the DLM 500, Matrix Membership Service 508, and SAN Membership Service 506.
  • the Matrix Filesystem 504 does not use the MCS 524 directly, but it does use it indirectly through these other components.
  • the Storage Control Layber (SCL) 526 provides matrix-wide device identification, used to identify the Matrix Filesystems 504 at mount time.
  • the SCL 526 also manages storage fabric configuration and low level I/O device fencing of rogue servers from the shared storage devices 306a-306d containing the Matrix Filesystems 504. It also provides the ability for a server in the matrix to voluntarily intercede during normal device operations to fence itself when communication with rest of the matrix has been lost.
  • the Storage Control Layer 526 is the Matrix Server module responsible for managing shared storage devices 306a-306d. Management in this context consists of two primary functions. The first is to enforce I/O fencing at the hardware SAN level by enabling/disabling host access to the set of shared storage devices 306a-306d.
  • the SCL module also includes utilities and library routines needed to provide device information to the UI.
  • the Pseudo Storage Driver 528 is a layered driver that "hides” a target storage device 306a-306d so that all references to the underlying target device must pass through the PSD layered driver.
  • the PSD provides the ability to "fence” a device, blocking all I/O from the host server to the underlying target device until it is unfenced again.
  • the PSD also provides an application-level interface to lock a storage partition across the matrix. It also has the ability to provide common matrix- wide 'handles', or paths, to devices such that all servers accessing shared storage in the Matrix Server can use the same path to access a given shared device.
  • Figs. 4A-4B are flow diagrams of a method according to an embodiment of the present invention for adding a node to a cluster of servers sharing storage such as a disk.
  • the cluster includes the set of servers that cooperate to share a shared resource such as the shared storage.
  • One of the servers in the cluster is dynamically elected to act as an administrator to manage the shared storage in the cluster. If there is no administrator in the cluster, then it is determined whether this server can try to become the administrator (408). If this server can try to become the . administrator then the server begins an election process shown in figures 5B-5C , and successful completion of this process results in the election of this server as the administrator.
  • the group coordinator selects a server to try to become the new administrator (704 Fig. 5A).
  • An example of how this server can not become the administrator (408) is if another server became the administrator during the time this server established that there was no administrator and then tried to become the administrator, or it had faulty connectivity to the storage network. In this case a partial failure recovery is started in step 704 of Fig. 5 A. If there is an existing administrator in the cluster (400), the existing administrator is then asked to import the new server into the cluster (402). It is then determined whether it is permissible for this server to be imported into the cluster (404). If it is not permissible then the process of adding this server to the cluster has failed (412). Examples of reasons why adding the server would fail include this server not being healthy or having a storage area network generation number mismatch with the generation number used by the administrator.
  • this server If this server can be imported (404), then it receives device names from the administrator (406). Examples of device names include cluster wide names of shared storage.
  • the administrator grants physical storage area network access to this server (410 of Fig. 4B).
  • the administrator then commands the physical hardware to allow this server storage area network (SAN) access (412).
  • This server now has access to the SAN (414).
  • SAN server storage area network
  • Figs. 5A-5C are flow diagrams of a method according to the present invention for handling a server failure , software component, or SAN generation number mismatch
  • a server or communication with a server has failed (700). It is then determined whether there is still an administrator (702). For example, the server that failed may have been the administrator. If there is still an administrator then the failed server is physically isolated (708). An example of physically isolating the failed server is to disable the fiber channel switch port associated with the failed server.
  • the storage area network generation number is then updated and stored to the database (710). Thereafter, normal operation continues (712).
  • a server is selected to try and become the new administrator (704).
  • One example is a random selection of one of the servers.
  • the elected server is then told to try to become the new administrator (706).
  • One example of how the server is selected and told to become the new administrator is through the use of a group coordinator.
  • the group coordinator is elected during the formation of a process communication group using an algorithm that can uniquely identify the coordinator of the group with no communication with any server or node except that required to agree on the membership of the group. For example, the server with the lowest numbered Internet Protocol (IP) address of the members can be selected.
  • IP Internet Protocol
  • the coorindator can then make global decisions for the group of servers, such as the selection of a possible administrator.
  • the server selected as administrator is preferably one which has a high probability of success of actually becoming the administrator.
  • the group coordinator attempts to place the administrator on a node which might be able to connect the SAN hardware and has not recently failed in an attempt to become the SAN administrator.
  • the selected server attempts to acquire the storage area network locks (720). If it cannot acquire the SAN locks, then it has failed to become the administrator (724). If it succeeds in acquiring the SAN locks (720), then it attempts to read the SAN generation number from the membership database (722).
  • the database can be maintained in one of the membership partitions on a shared storage and can be co-resident with the SAN locks.
  • a server may fail to acquire the SAN locks for several reasons including but not limited to physical storage isolation, ownership of the SAN locks by an existing administrator in the cluster or ownership by another cluster on the same storage fabric.
  • the server fails to read the SAN generation number from the database (722), then it drops the SAN locks (726), and it has failed to become the administrator (724). Once the server has failed to become the administrator (724), the group coordinator selects a different server to try to become the new administrator (704 Fig. 5A).
  • the server can read the SAN generation number from the database, then it increments the SAN generation number and stores it back into the database (728). It also informs the group coordinator that this server is now the administrator (730). The group coordinator receives the administrator update (732). It is then determined if it is permissible for this server to be the new administrator (750). If it is not okay, then a message to undo the administrator status is sent to the current server trying to become the administrator (752). Thereafter, the group coordinator selects a server to try to become the new administrator (704 of Fig. 5A).
  • Fig. 6 is flow diagram of a method according to an embodiment of the present invention for adding or removing shared storage.
  • a request is sent from a server to the administrator to add or remove a shared storage (600), such as a disk.
  • the disk is then added or removed to the naming database (602).
  • the naming database is preferably maintained on the shared storage accessible by all servers and the location is known by all servers before they join the cluster. Servers with no knowledge of the location of a naming database are preferably not eligible to become a SAN administrator but may join a cluster with a valid administrator.
  • the SAN generation number is then incremented (604).
  • Each server in the cluster is then informed of the SAN generation number and the addition or deletion of the new disk (606).
  • the new SAN generation number is written to the database (608).
  • the requesting server is then notified that the addition/removal of the disk is complete (610).
  • Fig. 7 is a flow diagram of a method according to an embodiment of the present invention for managing a storage area network.
  • a plurality of nodes is provided (800).
  • a plurality of storage is also provided, wherein the plurality of storage is shared by the plurality of nodes (802). It is determined whether a change in the storage area network has occurred (804). Examples of a change include structural changes such as adding a server, deleting a server, adding a storage, deleting a storage, connecting or disconnecting an interface. If a change has occurred, then the system dynamically adjusts to the change (806).
  • Fig. 8 is a flow diagram of a method managing a storage area network in a first node according to an embodiment of the present invention.
  • the first node communicates with a second node (900), and communicates with at least one storage (902), wherein the storage is shared by the first node and the second node. It is determined if a change in the storage area network has occurred (904). If a change has occurred, then the first node adjusts dynamically to the change (906).

Abstract

A system and method for managing a storage area network is disclosed. In one embodiment, the method comprises a plurality of nodes (102A, 102B, 102C, 102D); providing a plurality of storage (106A, 106B, 106C, 106D), wherein the plurality of storage (106A, 106B, 106C, 106D) is shared by the plurality of nodes (102A, 102B, 102C, 102D); determining if a change in the storage area network has occurred; and dynamically adjusting the change if the change has occurred. In another embodiment, the system comprises a processor configured to communicate with a second node and at least one storage, wherein the storage is shared by the processor and the second node; the processor also being configured to determine if a change in the storage area network has occurred; and dynamically adjusting to the change if the change has occurred; and a memory coupled with the processor, the memory configured to provide instructions to the processor.

Description

A SYSTEM AND METHOD FOR MANAGEMENT OF A STORAGE AREA NETWORK
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims priority to U.S. Provisional Patent Application No.
60/324,196 (Attorney Docket No. POLYP001+) entitled SHARED STORAGE LOCK: A NEW SOFTWARE SYNCHRONIZATION MECHANISM FOR ENFORCING MUTUAL EXCLUSION AMONG MULTIPLE NEGOTIATORS filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No.
60/324,226 (Attorney Docket No. POLYP002+) entitled JOUNALING MECHANISM WITH EFFICIENT, SELECTINE RECONERY FOR MULTI-NODE ENVIRONMENTS filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No.
60/324,224 (Attorney Docket No. POLYP003+) entitled COLLABORATIVE CACHING IN A MULTI-NODE FILESYSTEM filed September 21, 2001, which is incorporated herein by reference for all purposes. This application claims priority to U.S. Provisional Patent Application No 60/324,242 (Attorney Docket No. POLYP005+) entitled DISTRIBUTED MANAGEMENT OF A STORAGE AREA NETWORK filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No.
60/324,195 (Attorney Docket No. POLYP006+) entitled METHOD FOR IMPLEMENTING JOURNALING AND DISTRIBUTED LOCK MANAGEMENT filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No. 60/324,243 (Attorney Docket No. POLYP007+) entitled MATRIX SERNER: A HIGHLY ANAILABLE MATRIX PROCESSING SYSTEM WITH COHERENT SHARED FILE STORAGE filed September 21, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No. 60/324,787 (Attorney Docket No. POLYP008+) entitled A METHOD FOR
EFFICIENT ON-LINE LOCK RECONERY IN A HIGHLY ANAILABLE MATRIX PROCESSING SYSTEM filed September 24, 2001, which is incorporated herein by reference for all purposes.
This application claims priority to U.S. Provisional Patent Application No. 60/327, 191 (Attorney Docket No. POLYP009+) entitled FAST LOCK RECONERY: A METHOD FOR EFFICIENT ON-LINE LOCK RECONERY IN A HIGHLY ANAILABLE MATRIX PROCESSING SYSTEM filed October 1, 2001, which is incorporated herein by reference for all purposes. This application is related to co-pending U.S. Patent Application No. (Attorney Docket No. POLYP001) entitled A SYSTEM AND
METHOD FOR SYNCHRONIZATION FOR ENFORCING MUTUAL EXCLUSION AMONG MULTIP LE NEGOTIATORS filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent
Application No. (Attorney Docket No. POLYP002) entitled SYSTEM
AND METHOD FOR JOURNAL RECONERY FOR MULTIΝODE ENVIRONMENTS filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent Application No. (Attorney Docket No. POLYP003) entitled A SYSTEM AND
METHOD FOR COLLABORATIVE CACHING IN A MULTINODE SYSTEM filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent Application No. (Attorney
Docket No. POLYP006) entitled SYSTEM AND METHOD FOR IMPLEMENTING JOURNALING IN A MULTI-NODE ENVIRONMENT filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent
Application No. (Attorney Docket No. POLYP007) entitled A
SYSTEM AND METHOD FOR A MULTI-NODE ENVIRONMENT WITH SHARED STORAGE filed concurrently herewith, which is incorporated herein by reference for all purposes; and co-pending U.S. Patent Application No. (Attorney Docket No. POLYP009) entitled A SYSTEM AND
METHOD FOR EFFICIENT LOCK RECOVERY filed concurrently herewith, which is incorporated herein by reference for all purposes. FIELD OF THE INVENTION
The present invention relates generally to computer systems. In particular, the present invention relates to computer systems that share resources such as storage.
BACKGROUND OF THE INVENTION
Servers are typically used for big applications and work loads such as those used in conjunction with large web services and manufacturing. Often, a single server does not have enough power to perform the required application. Several servers may be used in conjunction with several storage devices in a storage area network (SAN) to accommodate heavy traffic. As systems get larger, applications often need to be highly available to avoid interruptions in service.
A typical server management system uses a single management control station that manages the servers and the shared storage. A potential problem of such a system is that it may have a single point of failure which can cause a shut-down of the entire storage area network to perform maintenance. Another potential problem is that there is typically no dynamic cooperation between the servers in case a change to the system occurs. Often in such a system all servers need be shutdown to perform a simple reconfiguration of the shared storage. This type of interruption is typically unacceptable for mission critical applications
What is needed is a system and method for management of a storage area network that allows dynamic cooperation among the servers and does not have a single point of failure. The present invention addresses such needs. BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
Fig. 1 is a block diagram of a shared storage system suitable for facilitating an embodiment of the present invention.
Fig. 2 is another block diagram of a system according to an embodiment of the present invention.
Fig. 3 is a block diagram of the software components of a server according to an embodiment of the present invention.
Figs. 4A-4B are flow diagrams of a method according to an embodiment of the present invention for adding a node.
Figs. 5A-5C are flow diagrams of a method according to the present invention for handling a server failure.
Fig. 6 is flow diagram of a method according to an embodiment of the present invention for adding or removing shared storage.
Fig. 7 is a flow diagram of a method according to an embodiment of the present invention for managing a storage area network.
Fig. 8 is a flow diagram of a method managing a storage area network in a first node according to an embodiment of the present invention. DETAILED DESCRIPTION
It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a computer readable medium such as a computer readable storage medium or a computer network wherein program instructions are sent over optical or electronic communication links. It should be noted that the order of the steps of disclosed processes may be altered within the scope of the invention.
A detailed description of one or more preferred embodiments of the invention are provided below along with accompanying figures that illustrate by way of example the principles of the invention. While the invention is described in connection with such embodiments, it should be understood that the invention is not limited to any embodiment. On the contrary, the scope of the invention is limited only by the appended claims and the invention encompasses numerous alternatives, modifications and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. The present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.
Fig. 1 is a block diagram of a shared storage system suitable for facilitating the management of a storage area network according to an embodiment of the present invention. In this example, nodes 102A-102D are coupled together through a network switch 100. The network switch 100 can represent any network infrastructure such as an Ethernet. Additionally, the nodes 102A-102D are also shown to be coupled to a data storage interconnect 104. An example of the data storage interconnect 104 is a fiber channel switch, such as a Brocade fiber channel switch. Examples of nodes 102A-102D include but are not limited to computers, servers, and any other processing units or applications that can share storage or data. For exemplary purposes, nodes 102A-102D will sometimes be referred to as servers. The data interconnect 104 is shown to be coupled to shared storage 106A-106D. Examples of shared storage 106A-106D include any form of storage such as hard drive disks, compact disks, tape, and random access memory.
Although the system shown in Fig. 1 is a multiple node system, the present invention can also be used with a single computer system for synchronizing various applications as they share data on a shared storage.
Shared storage can be any storage device, such as hard drive disks, compact disks, tape, and random access memory. A filesystem is a logical entity built on the shared storage. Although the shared storage is typically considered a physical device while the filesystem is typically considered a logical structure overlaid on part of the storage, the filesystem is sometimes referred to herein as shared storage for simplicity. For example, when it is stated that shared storage fails, it can be a failure of a part of a filesystem, one or more filesystems, or the physical storage device on which the filesystem is overlaid. Accordingly, shared storage, as used herein, can mean the physical storage device, a portion of a filesystem, a filesystem, filesystems, or any combination thereof. Figure 2 is another block diagram of a system according to an embodiment of the present invention. In this example, the system preferably has no single point of failure. Accordingly, servers 102 A' - 102D' are coupled with multiple network interconnects 100A-100D. The servers 102A-102D' are also shown to be coupled with multiple storage interconnects 104A-104B. The storage interconnects 104A- 104B are each coupled to a plurality of data storage 106A'-106D\
In this manner, there is redundancy in the system such that if any of the components or connections fail, the entire system can continue to operate.
In the example shown in Figure 2, as well as the example shown in Figure 1, the number of servers 102 A- 102D', the number of storage interconnects 104 A- 104B, and the number of data storage 106 -106D' can be as many as the customer requires and is not physically limited by the system. Likewise, the operating systems used by servers lOOA'-lOOD' can also be as many independent operating systems as the customer requires.
Fig. 3 is a block diagram of the software components of server 100. In this embodiment, the following components are shown:
The Distributed Lock Manager (DLM) 500 manages matrix-wide locks for the filesystem image 306a-306d, including the management of lock state during crash recovery. The Matrix Filesystem 504 uses DLM 500-managed locks to implement matrix-wide mutual exclusion and matrix-wide filesystem 306a-306d metadata and data cache consistency. The DLM 500 is a distributed symmetric lock manager. Preferably, there is an instance of the DLM 500 resident on every server in the matrix. Every instance is a peer to every other instance; there is no master/slave relationship among the instances.
The lock-caching layer ("LCL") 502 is a component internal to the operating system kernel that interfaces between the Matrix Filesystem 504 and the application- level DLM 500. The purposes of the LCL 502 include the following:
1. It hides the details of the DLM 500 from kernel-resident clients that need to obtain distributed locks.
2. It caches DLM 500 locks (that is, it may hold on to DLM 500 locks after clients have released all references to them), sometimes obviating the need for kernel components to communicate with an application-level process (the DLM 500) to obtain matrix-wide locks.
3. It provides the ability to obtain locks in both process and server scopes (where a process lock ensures that the corresponding DLM (500) lock is held, and also excludes local processes attempting to obtain the lock in conflicting modes, whereas a server lock only ensures that the DLM (500) lock is held, without excluding other local processes).
4. It allows clients to define callouts for different types of locks when certain events related to locks occur, particularly the acquisition and surrender of DLM 500- level locks. This ability is a requirement for cache-coherency, which depends on callouts to flush modified cached data to permanent storage when corresponding DLM 500 write locks are downgraded or released, and to purge cached data when DLM 500 read locks are released. The LCL 502 is the only kernel component that makes lock requests from the user-level DLM 500. It partitions DLM 500 locks among kernel clients, so that a single DLM 500 lock has at most one kernel client on each node, namely, the LCL 502 itself. Each DLM 500 lock is the product of an LCL 502 request, which was induced by a client's request of an LCL 502 lock, and each LCL 502 lock is backed by a DLM 500 lock.
The Matrix Filesystem 504 is the shared filesystem component of The Matrix Server. The Matrix Filesystem 504 allows multiple servers to simultaneously mount, in read/write mode, filesystems living on physically shared storage devices 306a- 306d. The Matrix Filesystem 504 is a distributed symmetric matrixed filesystem; there is no single server that filesystem activity must pass through to perform filesystem activities. The Matrix Filesystem 504 provides normal local filesystem semantics and interfaces for clients of the filesystem.
SAN (Storage Area Network) Membership Service 506 provides the group membership services infrastructure for the Matrix Filesystem 504, including managing filesystem membership, health monitoring, coordinating mounts and unmounts of shared filesystems 306a-306d, and coordinating crash recovery.
Matrix Membership Service 508 provides the Local, matrix-style matrix membership support, including virtual host management, service monitoring, notification services, data replication, etc. The Matrix Filesystem 504 does not interface directly with the MMS 508, but the Matrix Filesystem 504 does interface with the SAN Membership Service 506, which interfaces with the MMS 508 in order to provide the filesystem 504 with the matrix group services infrastructure. The Shared Disk Monitor Probe 510 maintains and monitors the membership of the various shared storage devices in the matrix. It acquires and maintains leases on the various shared storage devices in the matrix as a protection against rogue server "split-brain" conditions. It communicates with the SMS 506 to coordinate recovery activities on occurrence of a device membership transition.
Filesystem monitors 512 are used by the SAN Membership Service 508 to initiate Matrix Filesystem 504 mounts and unmounts, according to the matrix configuration put in place by the Matrix Server user interface.
The Service Monitor 514 tracks the state (health & availability) of various services on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored service transitions. Services monitored include HTTP, FTP, Telnet, SMTP, etc. The remedial actions include service restart on the same server or service fail-over and restart on another server.
The Device Monitor 516 tracks the state (health & availability) of various storage-related devices in the matrix so that the matrix server may take automatic remedial action when the state of any monitored device transitions. Devices monitored may include data storage devices 306a-306d (such as storage device drives, solid state storage devices, ram storage devices, JOBDs, RAID arrays, etc.)and storage network devices 304' (such as FibreChannel Switches, Infiniband Switches, iSCSI switches, etc.). The remedial actions include initiation of Matrix Filesystem 504 recovery, storage network path failover, and device reset.
The Application Monitor 518 tracks the state (health & availability) of various applications on each server in the matrix so that the matrix server may take automatic remedial action when the state of any monitored application transitions. Applications monitored may include databases, mail routers, CRM apps, etc. The remedial actions include application restart on the same server or application fail-over and restart on another server.
The Notifier Agent 520 tracks events associated with specified objects in the matrix and executes supplied scripts of commands on occurrence of any tracked event.
The Replicator Agent 522 monitors the content of any filesystem subtree and periodically replicates any data which has not yet been replicated from a source tree to a destination tree. The Replicator Agent 522 is preferably used to duplicate file private files between servers that are not accessed using Shared Data Storage (306).
The Matrix Communication Service 524 provides the network communication infrastructure for the DLM 500, Matrix Membership Service 508, and SAN Membership Service 506. The Matrix Filesystem 504 does not use the MCS 524 directly, but it does use it indirectly through these other components.
The Storage Control Layber (SCL) 526 provides matrix-wide device identification, used to identify the Matrix Filesystems 504 at mount time. The SCL 526 also manages storage fabric configuration and low level I/O device fencing of rogue servers from the shared storage devices 306a-306d containing the Matrix Filesystems 504. It also provides the ability for a server in the matrix to voluntarily intercede during normal device operations to fence itself when communication with rest of the matrix has been lost. The Storage Control Layer 526 is the Matrix Server module responsible for managing shared storage devices 306a-306d. Management in this context consists of two primary functions. The first is to enforce I/O fencing at the hardware SAN level by enabling/disabling host access to the set of shared storage devices 306a-306d. And the second is to generate global(matrix-wide) unique device names (or "labels") for all matrix storage devices 306a-306d and ensure that all hosts in the matrix have access to those global device names. The SCL module also includes utilities and library routines needed to provide device information to the UI.
The Pseudo Storage Driver 528 is a layered driver that "hides" a target storage device 306a-306d so that all references to the underlying target device must pass through the PSD layered driver. Thus, the PSD provides the ability to "fence" a device, blocking all I/O from the host server to the underlying target device until it is unfenced again. The PSD also provides an application-level interface to lock a storage partition across the matrix. It also has the ability to provide common matrix- wide 'handles', or paths, to devices such that all servers accessing shared storage in the Matrix Server can use the same path to access a given shared device.
Figs. 4A-4B are flow diagrams of a method according to an embodiment of the present invention for adding a node to a cluster of servers sharing storage such as a disk.
In this example, it is determined whether there is an administrator (ADM) in the cluster (400). The cluster includes the set of servers that cooperate to share a shared resource such as the shared storage. One of the servers in the cluster is dynamically elected to act as an administrator to manage the shared storage in the cluster. If there is no administrator in the cluster, then it is determined whether this server can try to become the administrator (408). If this server can try to become the . administrator then the server begins an election process shown in figures 5B-5C , and successful completion of this process results in the election of this server as the administrator.
If, however, the server cannot become the administrator, the group coordinator then selects a server to try to become the new administrator (704 Fig. 5A). An example of how this server can not become the administrator (408) is if another server became the administrator during the time this server established that there was no administrator and then tried to become the administrator, or it had faulty connectivity to the storage network. In this case a partial failure recovery is started in step 704 of Fig. 5 A. If there is an existing administrator in the cluster (400), the existing administrator is then asked to import the new server into the cluster (402). It is then determined whether it is permissible for this server to be imported into the cluster (404). If it is not permissible then the process of adding this server to the cluster has failed (412). Examples of reasons why adding the server would fail include this server not being healthy or having a storage area network generation number mismatch with the generation number used by the administrator.
If this server can be imported (404), then it receives device names from the administrator (406). Examples of device names include cluster wide names of shared storage.
The administrator grants physical storage area network access to this server (410 of Fig. 4B). The administrator then commands the physical hardware to allow this server storage area network (SAN) access (412). This server now has access to the SAN (414).
Figs. 5A-5C are flow diagrams of a method according to the present invention for handling a server failure , software component, or SAN generation number mismatch In this example, it is determined that a server or communication with a server has failed (700). It is then determined whether there is still an administrator (702). For example, the server that failed may have been the administrator. If there is still an administrator then the failed server is physically isolated (708). An example of physically isolating the failed server is to disable the fiber channel switch port associated with the failed server.
The storage area network generation number is then updated and stored to the database (710). Thereafter, normal operation continues (712).
If there is no longer an administrator (702), then a server is selected to try and become the new administrator (704). There are several ways to select a server to try to become the new administrator. One example is a random selection of one of the servers. The elected server is then told to try to become the new administrator (706). One example of how the server is selected and told to become the new administrator is through the use of a group coordinator.
In one embodiment, the group coordinator is elected during the formation of a process communication group using an algorithm that can uniquely identify the coordinator of the group with no communication with any server or node except that required to agree on the membership of the group. For example, the server with the lowest numbered Internet Protocol (IP) address of the members can be selected. The coorindator can then make global decisions for the group of servers, such as the selection of a possible administrator. The server selected as administrator is preferably one which has a high probability of success of actually becoming the administrator. The group coordinator attempts to place the administrator on a node which might be able to connect the SAN hardware and has not recently failed in an attempt to become the SAN administrator.
The selected server then attempts to acquire the storage area network locks (720). If it cannot acquire the SAN locks, then it has failed to become the administrator (724). If it succeeds in acquiring the SAN locks (720), then it attempts to read the SAN generation number from the membership database (722). The database can be maintained in one of the membership partitions on a shared storage and can be co-resident with the SAN locks. A server may fail to acquire the SAN locks for several reasons including but not limited to physical storage isolation, ownership of the SAN locks by an existing administrator in the cluster or ownership by another cluster on the same storage fabric.
If the server fails to read the SAN generation number from the database (722), then it drops the SAN locks (726), and it has failed to become the administrator (724). Once the server has failed to become the administrator (724), the group coordinator selects a different server to try to become the new administrator (704 Fig. 5A).
If the server can read the SAN generation number from the database, then it increments the SAN generation number and stores it back into the database (728). It also informs the group coordinator that this server is now the administrator (730). The group coordinator receives the administrator update (732). It is then determined if it is permissible for this server to be the new administrator (750). If it is not okay, then a message to undo the administrator status is sent to the current server trying to become the administrator (752). Thereafter, the group coordinator selects a server to try to become the new administrator (704 of Fig. 5A).
If it is okay for this server to be the new administrator, the administrator is told to commit (754), and the administrator is committed (756). The coordinator then informs the other servers in the cluster about the new administrator (758).
Fig. 6 is flow diagram of a method according to an embodiment of the present invention for adding or removing shared storage. In this example, a request is sent from a server to the administrator to add or remove a shared storage (600), such as a disk. The disk is then added or removed to the naming database (602). The naming database is preferably maintained on the shared storage accessible by all servers and the location is known by all servers before they join the cluster. Servers with no knowledge of the location of a naming database are preferably not eligible to become a SAN administrator but may join a cluster with a valid administrator.
The SAN generation number is then incremented (604). Each server in the cluster is then informed of the SAN generation number and the addition or deletion of the new disk (606). When all the servers in the cluster acknowledge, the new SAN generation number is written to the database (608). The requesting server is then notified that the addition/removal of the disk is complete (610).
Fig. 7 is a flow diagram of a method according to an embodiment of the present invention for managing a storage area network. In this example, a plurality of nodes is provided (800). A plurality of storage is also provided, wherein the plurality of storage is shared by the plurality of nodes (802). It is determined whether a change in the storage area network has occurred (804). Examples of a change include structural changes such as adding a server, deleting a server, adding a storage, deleting a storage, connecting or disconnecting an interface. If a change has occurred, then the system dynamically adjusts to the change (806).
Fig. 8 is a flow diagram of a method managing a storage area network in a first node according to an embodiment of the present invention. In this example, the first node communicates with a second node (900), and communicates with at least one storage (902), wherein the storage is shared by the first node and the second node. It is determined if a change in the storage area network has occurred (904). If a change has occurred, then the first node adjusts dynamically to the change (906).
Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
WHAT IS CLAIMED IS :

Claims

1. A method of managing a storage area network comprising: providing a plurality of nodes; providing a plurality of storage, wherein the plurality of storage is shared by the plurality of nodes; determining if a change in the storage area network has occurred; and dynamically adjusting to the change if the change has occurred.
2. The method of claim 1 , wherein the change is adding a node to the plurality of nodes.
3. The method of claim 2, further determining if there is an administrator associated with the plurality of nodes.
4. The method of claim 2, further comprising determining if it is permissible for the node to be imported into the plurality of nodes.
5. The method of claim 2, further comprising sending device names to the node being added to the plurality of nodes.
6. The method of claim 1, wherein the change is deleting a node from the plurality of nodes.
7. The method of claim 6, further comprising isolating the node from the plurality of nodes.
8. The method of claim 6, further comprising updating a generation number.
9. The method of claim 6, further comprising selecting a second node for a new administrator if the first node was the administrator.
10. The method of claim 9, further comprising acquiring locks by the second node.
11. The method of claim 9, further comprising incrementing a generation number by the second node.
12. The method of claim 1, wherein the change is adding a storage to the plurality of storage.
13. The method of claim 12, further comprising adding the storage to a database.
14. The method of claim 12, further comprising incrementing a generation number.
15. The method of claim 1 , wherein the change is deleting a storage to the plurality of storage.
16. The method of claim 1 , wherein a node of the plurality of nodes is dynamically selected as an administrator.
17. A method of managing a storage area network in a first node, comprising: communicating with a second node; communicating with at least one storage, wherein the storage is shared by the first node and the second node; determining if a change in the storage area network has occurred; and dynamically adjusting to the change if the change has occurred.
18. A system of managing a storage area network comprising: a processor configured to communicate with a second node and at least one storage, wherein the storage is shared by the processor and the second node; the processor also being configured to determine if a change in the storage area network has occurred; and dynamically adjusting to the change if the change has occurred; and a memory coupled with the processor, the memory configured to provide instructions to the processor.
19. A system of managing a storage area network comprising: a plurality of nodes, wherein the plurality of nodes are configured to determine if a change in the storage area network has occurred, and also configured to dynamically adjust to the change if the change has occurred; and a plurality of storage, wherein the plurality of storage is shared by the plurality of nodes.
20. A computer program product for managing a storage area network in a first node, the computer program product being embodied in a computer readable medium and comprising computer instructions for: communicating with a second node; communicating with at least one storage, wherein the storage is shared by the first node and the second node; determining if a change in the storage area network has occurred; and dynamically adjusting to the change if the change has occurred.
PCT/US2002/029721 2001-09-21 2002-09-20 A system and method for management of a storage area network WO2003054711A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002336620A AU2002336620A1 (en) 2001-09-21 2002-09-20 A system and method for management of a storage area network

Applications Claiming Priority (30)

Application Number Priority Date Filing Date Title
US32419601P 2001-09-21 2001-09-21
US32419501P 2001-09-21 2001-09-21
US32424301P 2001-09-21 2001-09-21
US32422401P 2001-09-21 2001-09-21
US32424201P 2001-09-21 2001-09-21
US32422601P 2001-09-21 2001-09-21
US60/324,226 2001-09-21
US60/324,243 2001-09-21
US60/324,196 2001-09-21
US60/324,242 2001-09-21
US60/324,195 2001-09-21
US60/324,224 2001-09-21
US32478701P 2001-09-24 2001-09-24
US60/324,787 2001-09-24
US32719101P 2001-10-01 2001-10-01
US60/327,191 2001-10-01
US10/251,690 2002-09-20
US10/251,894 2002-09-20
US10/251,645 US20040202013A1 (en) 2001-09-21 2002-09-20 System and method for collaborative caching in a multinode system
US10/251,893 US7266722B2 (en) 2001-09-21 2002-09-20 System and method for efficient lock recovery
US10/251,626 US7111197B2 (en) 2001-09-21 2002-09-20 System and method for journal recovery for multinode environments
US10/251,689 US7149853B2 (en) 2001-09-21 2002-09-20 System and method for synchronization for enforcing mutual exclusion among multiple negotiators
US10/251,689 2002-09-20
US10/251,626 2002-09-20
US10/251,645 2002-09-20
US10/251,895 2002-09-20
US10/251,895 US7437386B2 (en) 2001-09-21 2002-09-20 System and method for a multi-node environment with shared storage
US10/251,893 2002-09-20
US10/251,690 US7496646B2 (en) 2001-09-21 2002-09-20 System and method for management of a storage area network
US10/251,894 US7240057B2 (en) 2001-09-21 2002-09-20 System and method for implementing journaling in a multi-node environment

Publications (2)

Publication Number Publication Date
WO2003054711A1 true WO2003054711A1 (en) 2003-07-03
WO2003054711A9 WO2003054711A9 (en) 2004-05-13

Family

ID=27585545

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2002/029721 WO2003054711A1 (en) 2001-09-21 2002-09-20 A system and method for management of a storage area network
PCT/US2002/030084 WO2003025802A1 (en) 2001-09-21 2002-09-20 A system and method for collaborative caching in a multinode system

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2002/030084 WO2003025802A1 (en) 2001-09-21 2002-09-20 A system and method for collaborative caching in a multinode system

Country Status (2)

Country Link
AU (1) AU2002336620A1 (en)
WO (2) WO2003054711A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2409306A (en) * 2003-12-20 2005-06-22 Autodesk Canada Inc Data processing network with switchable storage
US7546333B2 (en) 2002-10-23 2009-06-09 Netapp, Inc. Methods and systems for predictive change management for access paths in networks
US7617320B2 (en) 2002-10-23 2009-11-10 Netapp, Inc. Method and system for validating logical end-to-end access paths in storage area networks
US7702667B2 (en) 2005-09-27 2010-04-20 Netapp, Inc. Methods and systems for validating accessibility and currency of replicated data
US7961594B2 (en) 2002-10-23 2011-06-14 Onaro, Inc. Methods and systems for history analysis for access paths in networks
US8332860B1 (en) 2006-12-30 2012-12-11 Netapp, Inc. Systems and methods for path-based tier-aware dynamic capacity management in storage network environments
US8826032B1 (en) 2006-12-27 2014-09-02 Netapp, Inc. Systems and methods for network change discovery and host name resolution in storage network environments
US9042263B1 (en) 2007-04-06 2015-05-26 Netapp, Inc. Systems and methods for comparative load analysis in storage networks
US9246752B2 (en) 2013-06-18 2016-01-26 International Business Machines Corporation Ensuring health and compliance of devices

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131939B2 (en) 2004-11-15 2012-03-06 International Business Machines Corporation Distributed shared I/O cache subsystem
CN104035522A (en) * 2014-06-16 2014-09-10 南京云创存储科技有限公司 Large database appliance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009466A (en) * 1997-10-31 1999-12-28 International Business Machines Corporation Network management system for enabling a user to configure a network of storage devices via a graphical user interface
US6269410B1 (en) * 1999-02-12 2001-07-31 Hewlett-Packard Co Method and apparatus for using system traces to characterize workloads in a data storage system
US20010042221A1 (en) * 2000-02-18 2001-11-15 Moulton Gregory Hagan System and method for redundant array network storage
US20020069340A1 (en) * 2000-12-06 2002-06-06 Glen Tindal System and method for redirecting data generated by network devices
US20020091854A1 (en) * 2000-07-17 2002-07-11 Smith Philip S. Method and system for operating a commissioned e-commerce service prover
US6421723B1 (en) * 1999-06-11 2002-07-16 Dell Products L.P. Method and system for establishing a storage area network configuration

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3516362B2 (en) * 1995-03-01 2004-04-05 富士通株式会社 Shared data processing device and shared data processing system
US6044367A (en) * 1996-08-02 2000-03-28 Hewlett-Packard Company Distributed I/O store

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009466A (en) * 1997-10-31 1999-12-28 International Business Machines Corporation Network management system for enabling a user to configure a network of storage devices via a graphical user interface
US6269410B1 (en) * 1999-02-12 2001-07-31 Hewlett-Packard Co Method and apparatus for using system traces to characterize workloads in a data storage system
US6421723B1 (en) * 1999-06-11 2002-07-16 Dell Products L.P. Method and system for establishing a storage area network configuration
US20010042221A1 (en) * 2000-02-18 2001-11-15 Moulton Gregory Hagan System and method for redundant array network storage
US20020091854A1 (en) * 2000-07-17 2002-07-11 Smith Philip S. Method and system for operating a commissioned e-commerce service prover
US20020069340A1 (en) * 2000-12-06 2002-06-06 Glen Tindal System and method for redirecting data generated by network devices

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7546333B2 (en) 2002-10-23 2009-06-09 Netapp, Inc. Methods and systems for predictive change management for access paths in networks
US7617320B2 (en) 2002-10-23 2009-11-10 Netapp, Inc. Method and system for validating logical end-to-end access paths in storage area networks
US7961594B2 (en) 2002-10-23 2011-06-14 Onaro, Inc. Methods and systems for history analysis for access paths in networks
US8112510B2 (en) 2002-10-23 2012-02-07 Netapp, Inc. Methods and systems for predictive change management for access paths in networks
GB2409306A (en) * 2003-12-20 2005-06-22 Autodesk Canada Inc Data processing network with switchable storage
US7702667B2 (en) 2005-09-27 2010-04-20 Netapp, Inc. Methods and systems for validating accessibility and currency of replicated data
US8775387B2 (en) 2005-09-27 2014-07-08 Netapp, Inc. Methods and systems for validating accessibility and currency of replicated data
US8826032B1 (en) 2006-12-27 2014-09-02 Netapp, Inc. Systems and methods for network change discovery and host name resolution in storage network environments
US8332860B1 (en) 2006-12-30 2012-12-11 Netapp, Inc. Systems and methods for path-based tier-aware dynamic capacity management in storage network environments
US9042263B1 (en) 2007-04-06 2015-05-26 Netapp, Inc. Systems and methods for comparative load analysis in storage networks
US9246752B2 (en) 2013-06-18 2016-01-26 International Business Machines Corporation Ensuring health and compliance of devices
US9456005B2 (en) 2013-06-18 2016-09-27 International Business Machines Corporation Ensuring health and compliance of devices
US9626123B2 (en) 2013-06-18 2017-04-18 International Business Machines Corporation Ensuring health and compliance of devices

Also Published As

Publication number Publication date
WO2003025802A1 (en) 2003-03-27
AU2002336620A1 (en) 2003-07-09
WO2003054711A9 (en) 2004-05-13
AU2002336620A8 (en) 2003-07-09

Similar Documents

Publication Publication Date Title
US7496646B2 (en) System and method for management of a storage area network
CA2284376C (en) Method and apparatus for managing clustered computer systems
EP1024428B1 (en) Managing a clustered computer system
US7870230B2 (en) Policy-based cluster quorum determination
US7406473B1 (en) Distributed file system using disk servers, lock servers and file servers
US8856091B2 (en) Method and apparatus for sequencing transactions globally in distributed database cluster
US20070061379A1 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
US20080320113A1 (en) Highly Scalable and Highly Available Cluster System Management Scheme
EP1117210A2 (en) Method to dynamically change cluster or distributed system configuration
US7702757B2 (en) Method, apparatus and program storage device for providing control to a networked storage architecture
US8316110B1 (en) System and method for clustering standalone server applications and extending cluster functionality
US7516181B1 (en) Technique for project partitioning in a cluster of servers
US7246261B2 (en) Join protocol for a primary-backup group with backup resources in clustered computer system
WO2003054711A1 (en) A system and method for management of a storage area network
WO2007028249A1 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring
Kishida et al. SSDLM: architecture of a distributed lock manager with high degree of locality for clustered file systems
Node et al. Windows Server Failover Clustering
Austin et al. Oracle® Clusterware and Oracle RAC Administration and Deployment Guide, 10g Release 2 (10.2) B14197-07
Austin et al. Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide, 10g Release 2 (10.2) B14197-10

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
COP Corrected version of pamphlet

Free format text: PAGES 1/11-11/11, DRAWINGS, REPLACED BY NEW PAGES 1/11-11/11; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP