US20070198685A1 - Method and system for use of storage caching with a distributed file system - Google Patents

Method and system for use of storage caching with a distributed file system Download PDF

Info

Publication number
US20070198685A1
US20070198685A1 US11/496,032 US49603206A US2007198685A1 US 20070198685 A1 US20070198685 A1 US 20070198685A1 US 49603206 A US49603206 A US 49603206A US 2007198685 A1 US2007198685 A1 US 2007198685A1
Authority
US
United States
Prior art keywords
file
data file
cache
data
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/496,032
Inventor
Shirish Phatak
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CA Inc
Tacit Networks Inc
Original Assignee
Tacit Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tacit Networks Inc filed Critical Tacit Networks Inc
Priority to US11/496,032 priority Critical patent/US20070198685A1/en
Publication of US20070198685A1 publication Critical patent/US20070198685A1/en
Assigned to TACIT NETWORKS, INC. reassignment TACIT NETWORKS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PHATAK, SHIRISH HEMANT
Assigned to SYMANTEC CORPORATION reassignment SYMANTEC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BLUE COAT SYSTEMS, INC.
Assigned to CA, INC. reassignment CA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SYMANTEC CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99938Concurrency, e.g. lock management in shared database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users

Definitions

  • the present invention relates generally to managing shared access to data files and, more particularly, to a storage caching protocol which provides authorized computer workstations with shared access to real time data files while maintaining data file consistency and coherence.
  • a computer system that is a repository for data files is typically not the computer system on which processing of the data files is performed. Consequently, a user at a computer workstation associated with a remote site computer system, such as a laptop computer, networked computer or desktop computer, often will desire to access, i.e., view (read) or modify (write), a data file that is stored in an internal memory, on a disk or in network attached storage of a remotely located central data source computer system.
  • a communications channel such as a data bus, a communications network or the Internet, which typically introduces a delay or latency in the presentation of the data file at the system accessing the data file.
  • the latency is based on the need to transmit data between the system accessing the data file and the system that produces or stores the data file.
  • the data file is usually accessed in portions or blocks rather than as a continuous stream, which exacerbates the latency because each block experiences the channel delay upon transmission.
  • caching In order to mitigate the effects of channel delays, most current computer systems that perform distributed file system applications, which provide for shared access to data files, implement some form of caching.
  • caching a local copy of all or a portion of a data file, which is stored at a central source computer system, is maintained in a cache established at a remote system, such as in the local memory of a workstation associated with the remote system.
  • the workstation can read or write to the cached data file, where the cached data file mirrors all or a portion of the data file stored at the central system.
  • the cache also stores data that tracks any changes made to the cached data file, which are entered by the workstation and ultimately are to be incorporated into the data file stored at the file server.
  • channel latency can be mitigated and a user of the workstation of the remote system is not aware that the data file is accessed from a local source rather than a remotely located central source system.
  • caching may reduce latency in certain data file access circumstances, if access to a data file which has not yet been stored as a copy (mirrored) in the cache is attempted, the latency associated with retrieving a copy of the data file from the file server, known as a cache miss, still exists.
  • a caching system often implements a read-ahead technique, known as pre-populating the cache, in which data files that will be required for access in the future are stored in the cache.
  • Cache coherence is a guarantee that updates and the order of the updates to a cached data file are preserved and safe.
  • a remote system does not delete the cached update data before the update data is used to update the corresponding data file stored at the file server, and (ii) no other system updates the data file in a manner that potentially can compromise the update of the data file until the data file at the server has been updated using the update data from the cache.
  • Cache consistency is a guarantee that the updates to an opened, cached data file made by a workstation are reflected in the cached data file in a timely fashion.
  • coherence additionally ensures that updates on any cache corresponding to a data file stored at the file server do not override updates by another cache corresponding to the same data file.
  • Cache consistency additionally ensures that updates to the cached data file made at any cache are, in a timely fashion, incorporated into the cached data file at any other cache which is accessing the same data file.
  • a caching system includes a write-through architecture, which provides that all updates to the cached data file are immediately transmitted to the central computer system. This immediate transmission results in an immediate update of the data file stored at the file server of the central system.
  • write-through architecture provides that all updates to the cached data file are immediately transmitted to the central computer system. This immediate transmission results in an immediate update of the data file stored at the file server of the central system.
  • write-back Another caching architecture, known as write-back, evolved from the write-through architecture in an attempt to solve the latency problems of the latter.
  • a cache stores the updates to the cached data file for a period of time before transmitting (flushing) the cached updates to the central system. This periodic flushing updates the cached data file without significant latency.
  • the simplest form of write-back is write-behind architecture, where the updates to the cached data file are not immediately, in other words after some delay, transmitted to the central source in the same order that the updates to the cached data file are stored on the cache.
  • cached updates are not immediately available to either the central source or other remote systems in write-back caching architectures, such architectures are mostly useful only when a single remote system will be accessing the data file for reading or writing.
  • the write-back caching system often is enhanced with mechanisms that track updates performed at all of the caches and also at the central source system to ensure consistency of data files. These mechanisms typically substantially increase the complexity and cost of the cache, so as to make such caches impractical in many applications.
  • the performance benefits are significant, which makes these caches very attractive for high performance computing implementations, such as computer systems connected over computer networks.
  • a local area computer network remotely accesses data files over a distributed file system, such as NFS° (Network File System) for UNIXTM or CIFS® (Common Internet File System) for Microsoft WindowsTM systems.
  • NFS° Network File System
  • CIFS® Common Internet File System
  • These file systems provide workstations associated with remote computer systems with a mechanism to access data files stored at a file server of a central computer system.
  • each remote system utilizes local caching to increase efficiency of access to data files.
  • the caching is performed at a granularity of pages of a data file that usually constitute four Kilobyte blocks of data.
  • the actual number of pages cached is a function of the memory available for caching in a workstation that is incorporated in or coupled to a remote system.
  • these file systems utilize some measure of write-back caching to achieve acceptable performance.
  • cache consistency and cache coherence are important properties for a caching system, these properties are often very difficult to realize in a networked computer system having distributed file system performance capabilities, especially if the system uses write-back caching. Thus, many distributed file systems do not completely satisfy the guarantees of cache consistency and coherence. In practical implementations, a distributed file system relies on a crucial assumption that sharing of the same data file is rare and, therefore, makes a trade-off between performance and correctness when sharing of a data file does occur.
  • NFS currently is not particularly suitable for shared access because (i) it has weak consistency guarantees, namely, modifications to a cached data file for a first remote system may not be timely reflected at the central system and, thus, would not necessarily be mirrored at another remote system accessing the data file from the central system; and (ii) it has no coherence guarantees.
  • CIFS provides excellent consistency and coherence
  • shared access is at low performance because the consistency and coherence is achieved by utilizing write-through any time that more than one remote system is accessing any given data file.
  • NFS and CIFS also provide locking mechanisms that allow a file sharing application to control coherence and consistency aspects.
  • NFS allows sharing applications to voluntarily cooperate with each other without any operating system control, which is commonly known as advisory byte range locking.
  • CIFS provides operating system controlled locking, known as mandatory byte range locking, as well as explicit file sharing modes, which, for example, permit an application to control the manner in which a file is accessed such that no other application can access the file.
  • the file sharing application can use such mechanisms to improve the coherence and consistency properties provided by such prior art file distribution systems. For example, an application can use byte range locking to provide coherence and consistency even if the underlying system, e.g., NFS, does not have these properties.
  • AFS Transarc Andrew File System
  • NFS and CIFS which use local memory of the remote system, such as memory of a computer workstation, for storing pages of files
  • AFS uses an on-disk local file system as a cache for entire files.
  • AFS most operations occur on the local copy of the file and there is no need to retrieve data from the file server when access to the data file is requested.
  • the updates are transmitted (flushed) to the central system to update the corresponding data file at the file server, and then such updated data file is made available for access by other remote sites.
  • AFS provides flush on close consistency at file granularity, in other words, updates to a data file are immediately available when the data file is closed, but not as it is being written.
  • AFS weakens the coherence and consistency guarantees considerably to make WAN operation feasible.
  • AFS lacks coherence because it allows multiple remote systems to simultaneously update respective cached data files, each of which corresponds to a single data file, and provides that the last remote system that closes the file is the remote system that controls the changes to the data file at the server of the central system. In other words, the modifications of such last closing remote system supersede the changes apparently being made to the data file by other remote systems.
  • the consistency of AFS is weak because modifications are transmitted to the central source only when a remote system closes the file.
  • AFS is useful for a campus wide sharing application, it has multiple disadvantages when implemented in a business enterprise environment. For example, AFS must be installed on all computers. In addition, AFS cannot be operated in conjunction with NFS and CIFS distributed file systems or other like systems which are conventional in the prior art. Furthermore, the lack of consistency and coherence of AFS makes it unsuitable for many enterprise applications that require multiple remote systems to have shared access to a real time version of a data file.
  • a storage caching protocol system interfaces with a distributed file system to provide that authorized computer workstations have shared access to real time data files stored at a file server.
  • a data file stored at the file server is automatically updated, in substantially real time, by a cache server to include file update data representative of data file modifications entered at a workstation and incorporated into a corresponding cached data file which is stored at a storage cache. Consequently, the cache server can respond to an access request for the data file from a workstation using a real time, updated version of the data file, where the real time data file includes all of the data file modifications which were entered by workstations that previously accessed the data file and incorporated into corresponding cached data files respectively stored at storage caches associated with the individual workstations.
  • file update data is transmitted as streaming data to update the data file stored at the file server or a cached data file stored at a storage cache and, most preferably, the file update data is transmitted in compressed form and optionally generated using data differencing techniques.
  • the storage caching protocol system includes at least one storage cache and at least one cache server which are communicatively interconnected over a communications medium.
  • the cache server is associated with a file server containing data files, and the storage cache is associated with at least one authorized computer workstation.
  • the cache server transmits a copy of a data file stored at the file server to the storage cache.
  • the storage cache stores the data file copy as a cached data file, and automatically transmits to the cache server file update data representative of modifications to the cached data file entered by a workstation associated with the storage cache and incorporated into the cached data file.
  • the cache server uses the file update data to update the data file stored at the file server, and responds to subsequent access requests for the data file, such as from the same or another storage cache or an authorized computer workstation not associated with a storage cache, utilizing the updated version of the data file stored at the file server.
  • the response to the access request includes server file update data for updating a corresponding cached data file stored at the requesting storage cache.
  • the inventive storage caching system preferably operates in accordance with a leasing protocol that manages requests for access to a data file to ensure consistency and coherence among all remote computer systems that share access to a data file through use of a distributed file system.
  • a leasing protocol that manages requests for access to a data file to ensure consistency and coherence among all remote computer systems that share access to a data file through use of a distributed file system.
  • the remote system can either prohibit the requested access or pass the request to the file server without caching the data file, as updates to a cached data file are not allowed.
  • the workstation from which an access request originated only has a right to view and cannot cache the data file, i.e., has a reader right, as another storage cache continues to have a write lease to the data file. Every time that a workstation associated with the storage cache is granted a reader right, the corresponding cached data file is updated using the data file stored at the file server, and the cached data file cannot be modified by the workstation.
  • the cache server decides whether to grant or deny a request for a lease of a data file received from a first storage cache, based on (i) whether another storage cache already has a lease and the type of lease existing, which can be write or read, or (ii) whether the data file is already locked by some other mechanism, such as a mandatory or advisory lock associated with a prior art distributed file system protocol, such as CIFS and NFS.
  • some other mechanism such as a mandatory or advisory lock associated with a prior art distributed file system protocol, such as CIFS and NFS.
  • the lease request is processed based on the following criteria: a write lease cannot be granted if a read lease already exists at a second storage cache or the file is already locked for reading by another mechanism; only a pass through reader right can be granted if a write lease already exists at a second storage cache; and an additional read lease can be granted if a read lease already exists at a second storage cache or the file is only locked for reading.
  • the cache server locks the data file to prevent another application from locking the data file in a conflicting fashion.
  • the cache server ensures that any lease that is granted is compatible with an existing lease or any existing lock on the data file already taken by another mechanism.
  • the first storage cache autonomously updates the cached data file, based on data file modifications entered by an associated workstation, without intervention from the cache server. Further, following grant of a lease request or a reader right, the cache server and the first storage cache initially attend to automatically updating the cached data file, if any, stored at the first storage cache.
  • a storage cache responds to a request from an associated authorized workstation for access to a data file stored at the file server based on the strength of the lease, i.e., read lease or write lease, where a write lease is stronger than or includes file viewing rights associated with a read lease, if any, that the cache server has previously provided to the storage cache.
  • the access request is granted where the access request, which can be read or write, is of a level commensurate with that of the existing lease, if any, for the storage cache.
  • the storage cache does not have an existing lease of sufficient strength to satisfy the access request, it must first obtain a lease and therefore requests a lease for the data file from the cache server.
  • the lease request is granted if the cache server determines that a lease can be granted or that the requested access does not conflict with an existing lease of another storage cache as well as any existing locks on the data file.
  • the storage cache permits the cached data file to be opened at the workstation for read or write purposes, in accordance with the access request.
  • the storage cache interacts with the cache server to update the cached data file based on the version of the data file stored at the file server and only allows read access.
  • the cached data file at the storage cache is automatically updated, as needed, based on interaction between the cache server and the storage cache.
  • a storage cache typically releases or drops the lease only when all workstations associated with the storage cache have closed the cached data file and all pending updates to the data file, which are reflected in the cached data file, are transmitted from the storage cache to the cache server.
  • FIG. 1 is a system diagram illustrating implementation of a storage caching protocol in a distributed file system in accordance with the present invention.
  • FIG. 2 is a block diagram of a storage cache in accordance with the present invention.
  • FIG. 3 is a block diagram of a cache server in accordance with the present invention.
  • FIG. 4A is a flow diagram of a method for updating a data file stored at a file server based on the transmission of file update data from a storage cache to a cache server in accordance with the present invention.
  • FIG. 4B is a flow diagram of a method for updating a cached data file stored at a storage cache based on server file update data transmitted by a cache server in accordance with the present invention.
  • FIG. 5 is a flow diagram of a method for responding to a request for a lease from a storage cache in accordance with the present invention.
  • FIG. 6 is a flow diagram of a method for responding to a request for access to a data file received at a storage cache in accordance with the present invention.
  • FIG. 7 is a flow diagram of a method for releasing a lease of a data file in accordance with the present invention.
  • FIG. 8 is a system diagram illustrating implementation of a storage caching protocol in a distributed file system having a plurality of file servers in accordance with the present invention.
  • FIG. 9 is a system diagram illustrating implementation of a storage caching protocol in a distributed file system to provide for data backup in accordance with the present invention.
  • FIG. 1 is a system diagram of an illustrative computer system network 10 which operates in accordance with the present invention of a storage caching protocol that provides multiple computer systems shared access to real time data files.
  • the network 10 includes a storage caching protocol system 12 that interfaces with a distributed file system application operating at a data center computer system, which is a repository for data files, and a remote site computer system, which normally is located remotely from a data center system and is associated with a computer workstation that desires to access, i.e., view only (read) or modify (write), data files stored at a file server of a data center system.
  • the inventive system 12 includes at least one storage cache, which is coupled to a workstation of an associated remote system, and at least one cache server, which is coupled to a file server of a data center system, where the storage cache and the cache server utilize a communications link, such as a link established over the Internet, to transfer (i) copies of data files that the associated workstation desires to access, (ii) file update data representative of on any data file modifications entered by authorized workstations that access the data file, and (iii) data associated with the operating features of the storage caching protocol system 12 .
  • a communications link such as a link established over the Internet
  • the system 12 interfaces with remote work group computer systems 16 A and 16 B and a central work group data center computer system 20 .
  • the remote system 16 A includes computer workstations 22 A and 22 B interconnected over a communications channel 24 A, such as an Ethernet or like medium.
  • the remote system 16 B includes computer workstations 22 C and 22 D interconnected over a communications channel 24 B.
  • Each of the workstations 22 is part of or constitutes, for example, a personal computer, a personal digital assistant, or other like electronic device including a processor and memory and having communications capabilities.
  • the workstations of a remote system in combination with the Ethernet, form a local access network (“LAN”) and operate in accordance with a conventional prior art distributed file system, such as NFS or CIFS, which provides that a user of a workstation can access data files located remotely from the remote system in which the workstation is contained.
  • LAN local access network
  • a communications gateway 26 couples the Ethernet 24 of each of the remote systems 16 to a communications network 28 .
  • the network 28 can be a wide area network (“WAN”), LAN, the Internet or any like means for providing data communications links between geographically disparate locations.
  • the gateway 26 for example, is a standard VPN Internet connection having standard DSL speeds.
  • the gateway 26 provides that data, such as data files accessible in accordance with a prior art distributed file system such as NFS or CIFS, can be transferred between a workstation and a remotely located file server. It is noted that although the network 10 of FIG.
  • gateway 26 and network 28 are part of the storage caching system 12 , these components, which constitute well known, prior art devices, do not constitute inventive features although they are required for operation of the storage cache and cache server of the inventive system 12 , as described in further detail below.
  • the storage caching system 12 includes storage caches 30 A and 30 B which are associated with the remote systems 16 A and 16 B, respectively. Each storage cache 30 is coupled to the Ethernet 24 and the gateway 26 of the associated remote system 16 .
  • the storage caching system 12 includes a cache server 36 .
  • the cache server 36 is coupled to an associated gateway 26 C which is also coupled to the network 28 .
  • An Ethernet 24 C couples the gateway 26 C and the cache server 36 to a file server 38 and workstations 22 D and 22 E contained in the data center system 20 .
  • the file server 38 is a conventional file storage device, such as a NAS, which is a repository for data files and provides for distribution of stored data files to authorized workstations in accordance with operation of conventional distributed file systems, such as NFS or CIFS, which are implemented at the authorized workstations of the remote systems 16 and the data center 20 .
  • conventional distributed file systems such as NFS or CIFS
  • FIG. 2 is a preferred embodiment of the storage cache 30 in accordance with the present invention.
  • the storage cache 30 includes the modules of a cache manager 50 , a translator 52 , a leasing module 54 , and a local leased file storage 56 .
  • the cache manager 50 is coupled to the translator 52 and is for coupling to a cache server, such as the cache server 36 as shown in FIG. 1 , via gateways and a communications network.
  • the translator 52 is coupled to the leasing module 54 and the local storage 56 , and is for coupling to workstations of an associated remote system via an Ethernet connection.
  • the cache manager 50 controls routing of data files, file update data and data file leasing information to and from the cache server 36 .
  • the translator 52 stores copies of accessed data files at the storage 56 as a cached data file, makes the cached data file available for reading or writing purposes to an associated workstation that requested access to a data file corresponding to the cached data file, and updates the cached data file based on data file modifications entered by the workstation or update data supplied from the cache server.
  • the translator 52 preferably can generate a checksum representative of a first data file and determine the difference between another data file and the first data file based on the checksum using techniques that are well known in the art.
  • the leasing module 54 through interactions with the cache server 36 , determines whether to grant a request for access to a data file from an associated workstation, where the access request requires that the cached data file is made available to the associated workstation either for read or write purposes.
  • a storage cache is associated with every remote computer system that can access a data file stored at a file server of a data center system over the network 28 .
  • FIG. 3 is a preferred embodiment of the cache server 36 , in accordance with the present invention, that manages shared access to data files stored in the file server by multiple storage caches, such as the caches 30 A and 30 B, and also by workstations, such as the workstations 22 E and 22 F of the data center 20 , which are not associated with a storage cache.
  • the cache server is preferably a thin appliance having an architecture that makes it compatible and easily integrated with an existing distributed file system, such as NAS and SAN, implemented at a remote computer system and a data center computer system. See Ser. No. 09/766,526, filed Jan. 19, 2001, assigned to the assignee of this application and incorporated by reference herein.
  • the cache server 36 includes the modules of a server manager 60 , a translator 62 , a leasing module 64 , and a local file storage 66 .
  • the server manager 50 is coupled to the translator 62 , the leasing module 64 and the storage 66 and also is for coupling to storage caches, such as the storage caches 30 A and 30 B, via the gateway 26 C and the network 28 .
  • the translator 62 is coupled to the storage 66 and is for coupling to a file server of an associated data center computer system via an Ethernet connection.
  • the translator 62 temporarily stores at the storage 66 copies of data files stored at and obtained from the file server 36 , and performs processing using the stored data files and update data received from a storage cache to generate a replacement, updated data file.
  • the translator 62 also replaces a data file stored in the file server 38 with the replacement data file.
  • the translator 62 can supply to a workstation associated with the central system, such as the workstations 22 D and 22 E, a copy of a data file stored at the file server 36 only for viewing purposes in accordance with the inventive leasing protocol, described in further detail below.
  • the translator 62 like the translator 52 , can generate a checksum representative of a first data file and determine the difference between another data file and the first data file using the checksum.
  • the leasing module 64 through interactions with the storage caches included in the system 12 , determines whether a request for access to a data file from a workstation associated with a specific storage cache should be granted or denied.
  • each of the modules of each of the storage cache 30 and the cache server 36 which perform data processing operations in accordance with the present invention, constitutes a software module or, alternatively, a hardware module or a combined hardware/software module.
  • each of the modules suitably contains a memory storage area, such as RAM, for storage of data and instructions for performing processing operations in accordance with the present invention.
  • instructions for performing processing operations can be stored in hardware in one or more of the modules.
  • each of the cache server 36 and the storage cache 30 can be combined, as suitable, into composite modules, and that the cache server and storage cache can be combined into a single appliance which can provide both caching for a workstation and real time updating of the data files stored at a file server of a central data center computer system.
  • the storage caches and the cache server of the storage caching system 12 provide that a data file stored in a file server of a data center, and available for distribution to authorized workstations via a conventional prior art distributed file system, can be accessed for read or write purposes by the workstations, that the workstations experience a minimum of latency when accessing the file, and that the cached data file supplied to a workstation in response to an access request corresponds to a real time version of the data file.
  • a storage cache of the system 12 stores in the storage 56 only a current version of the cached data file corresponding to the data file that was the subject of an access request, where the single cached data file incorporates all of the data file modifications entered by a workstation associated with the storage cache while the file was accessed by the workstation.
  • File update data associated with the cached data file is automatically, and preferably at predetermined intervals, generated and then transmitted (flushed) to the cache server. Most preferably, the file update data is flushed with sufficient frequency to provide that a real time, updated version of the data file is stored at the file server and can be used by the cache server to respond to an access request from another storage cache or a workstation not associated with a storage cache.
  • the local storage 56 of the storage cache includes only cached data files corresponding to recently accessed data files.
  • FIG. 4A is a high level flow process 100 illustrating data processing operations performed at a storage cache and a cache server, in accordance with the present invention, for updating a data file at a file server.
  • FIGS. 4B, 5 , 6 , and 7 reference is made to the network 10 and operations that the components of the storage caching system 12 would perform in connection with requests for access to a data file from the remote system 16 A or 16 B where the data file is stored at the file server 36 of the source system 20 .
  • the storage module 56 of the storage cache 30 A does not initially contain a cached data file corresponding to a data file that the workstation 16 A seeks to access for write purposes.
  • step 102 the translator 62 communicates with the file server 38 and generates a copy of the data file that the workstation 16 A desires to access.
  • the server manager 60 then transmits a copy of the data file to the storage cache 30 A via the gateway 26 C, the network 28 and the gateway 26 A.
  • the cache manager 50 receives the transmitted copy of the data file from the gateway 26 A and stores the file in the storage 56 as a cached data file.
  • the translator 52 interacts with the distributed file system of the workstation 16 A to provide that the workstation 16 A can open, and enter data file modifications to (write) the cached data file.
  • the user of the workstation is presented with the cached data file, in other words, the user is permitted to open the cached data file following a request for access for the corresponding data file, the user is not aware of the location in the network 10 from which the file was obtained.
  • the user does not know whether he is working on a local copy of the data file, such as stored at a memory of the local remote system or at the storage cache 30 A, or a copy of a data file retrieved from a remote storage location, such as the remotely located data center computer system 20 .
  • the translator 52 monitors the modifications and incorporates these modifications into the cached data file at the storage 56 .
  • only a current version of the cached data file which includes all modifications to the cached data file previously made by any workstation within the remote system 16 A, is stored in the storage 56 .
  • Steps 106 , 108 , 110 , 112 and 114 set forth file update operations that the storage cache 26 A and the cache server 26 C automatically perform to update the version of the data file stored at the file server 38 , based on the modifications made to the corresponding cached data file stored at the storage cache 26 .
  • the cache server can transmit a real time, updated version of the data file in response to a request for access to the data file received subsequently from an authorized workstation other than the workstation 16 A, where the workstation may or may not be associated with a storage cache 30 A or another storage cache that is part of the system 12 .
  • the components of the system 12 implement the well known prior art technique of differencing as part of the inventive automatic updating of a data file to minimize potential latencies.
  • the cache manager 50 of the storage cache 30 A transmits a data file transfer request to the cache server 36 .
  • the server manager 60 based on receipt of this request, causes the translator 62 to generate a checksum for the data file currently stored at the file server 38 using techniques well known in the art.
  • the translator 62 generates the checksum by retrieving a copy of the data file from the file server 38 and storing data needed for checksum processing, such as the data file copy, in the storage 66 , as necessary.
  • step 108 the server manager 60 transmits the checksum to the storage cache 30 A.
  • step 110 the cache manager 50 retrieves the cached data file from the storage 56 and the translator 52 uses the checksum to compute file update data, which is in the form of difference data.
  • the difference data represents differences between the cached data file and the version of the data file currently stored at the file server and represented by the checksum.
  • step 112 the cache manager 50 transmits the difference data to the cache server 36 .
  • step 114 the translator 62 uses the difference data to generate an updated, replacement version of the data file.
  • the translator 62 retrieves a copy of the current version of the data file, which preferably is stored in the local file storage 66 at step 108 , and then processes the stored current version of the data file using the difference data to generate an updated data file.
  • the translator 62 then replaces the data file currently stored at the file server 38 with the replacement, updated data file.
  • the cache server 36 when the cache server 36 subsequently receives a request for access to the data file transmitted from another storage cache, such as the storage cache 30 , or from one of the workstations 22 E or 22 F in the data center system 20 , the cache server 36 uses the updated data file to respond to the request. Consequently, the subsequent requestor effectively is presented with a real time version of the data file, which incorporates previous changes to the data file based on entries made at the workstation 16 A.
  • the cache manager 50 transmits the file update data as streaming data to the cache server 36 .
  • the file update data is compressed before transmission to the cache server as streaming data to minimize the amount of data transferred over the network 28 , thereby reducing potential latency.
  • the cache server 36 continues to update a data file stored in the storage 66 based on file update data transmitted from a storage cache and, once transmission of all of the file update data is completed and the server cache has received all such transmitted data, the cache server then replaces the data file stored at the file server 36 with the updated data file.
  • FIG. 4B is a high level flow process 120 illustrating data processing operations that a storage cache and cache server perform, in accordance with the present invention, for updating a cached data file at a storage cache using the corresponding data file stored at the file server.
  • the storage cache 30 A has received a request for access to a data file from the workstation 16 A, a cached data file corresponding to the data file is stored at the storage module 56 and the workstation 22 A or 22 B previously accessed the data file for either read or write purposes.
  • any updates made to the data file since the workstation 22 A previously accessed the data file are incorporated into the cached data file.
  • the workstation 22 C may have previously written to a cached data file at the storage cache 30 B, which corresponds to the data file, and file update data representative of the modifications made to such cached data file may have been used to update the data file at the file server 36 , as explained above in connection with the process 100 , such that the data file at the file server 36 is different than the corresponding cached data file presently stored at the cache 30 A.
  • step 122 the cache manager 50 , following receipt of the access request from the workstation 22 A, and where it is assumed for simplicity that such access request would not impact coherence for the data file in the network 10 , automatically transmits to the cache server 36 a data file transfer request.
  • the translator 62 retrieves the data file from the file server 36 and the server manager 60 stores the data file in the storage 66 .
  • step 124 the translator 52 generates a checksum for the corresponding cached data file and the cache manager 50 transmits the checksum to the cache server 36 .
  • the translator 52 retrieves the cached data file from the storage module 56 and performs well known, prior art checksum processing on the cached data file.
  • the translator 62 In step 126 , the translator 62 generates server file update data using the checksum.
  • the server file update data preferably represents differences between the data file currently stored in the file server 36 , a copy of which was stored in the storage 66 in step 122 , and the current version of the cached data file stored at the storage cache 30 A and represented by the checksum.
  • step 128 the server manager 60 transmits the server file update data to the storage cache 30 A.
  • step 130 the translator 52 uses the server file update data to generate an updated cached data file which replaces the cached data file stored in the storage module 56 .
  • the translator 52 uses the cached data file, which has been updated based on any other data file modifications made by other workstations associated with a storage cache of the system 12 , to respond to the access request from the workstation 22 A.
  • user desired updates to an accessed data file are stored in the form of a single, current version cached data file at the storage 56 of a storage cache.
  • the server file update data is preferably transmitted as streaming data to the storage cache and, in addition, the server file update data is most preferably compressed before transmission as streaming data to the storage cache.
  • the process 120 is automatically performed for a storage cache at predetermined intervals to provide that a cached data file is updated before a time that a workstation associated with the storage cache is expected to request access to the data file.
  • the process 120 is automatically performed by a storage cache early in the morning, before employees would arrive at work and request access to data files from their workstations.
  • the process 120 is automatically performed to update the corresponding cached data files at the storage cache to minimize latency.
  • all data files that workstations of a remote system would seek to access are initially stored at the storage cache associated with the remote system.
  • the inventive storage caching protocol system constitutes an invisible interface between a remote system and a data center system which manages shared access to real time data files.
  • the changes that a workstation desires to make to a data file are not backed up at a storage cache.
  • the desired changes are represented in the cached data file, and file update data which, is derived from the cached data file, is constantly transmitted to the cache server.
  • the cache server uses the file update data to update the data file stored at the file server of a data center system. Therefore, the remote system or a storage cache does not require a large amount of memory for local storage of files.
  • the installation of the inventive cache server in association with a central data center system provides memory saving benefits throughout the computer network 10 with a minimum of administrative overhead, as each of the remote systems associated with a storage cache which operates in conjunction with the cache server has a minimum of local memory storage requirements. This achieves minimal memory requirement.
  • the inventive storage caching system has low memory requirements, is interoperable with existing distributed file system technology and, as discussed in detail below, also provides for network-wide coherence of shared data files when accessed by workstations.
  • inventive storage caching protocol performs read and write shared access operations on an entire data file, which is markedly different from prior art operating systems, such as used by AFS, NFS and CIFS, each of which primarily performs read and write operations using portions (data blocks) of a data file.
  • the storage caching system 12 can be implemented in connection with an existing, prior art distributed file system, such as NFS or CIFS, without adding to or modifying software at appliances already existing at the remote systems or the data center systems and without impacting the existing software architecture.
  • the system 12 can appear as a Windows file server to a Windows users and a Unix file server to a Unix users.
  • the storage cache and cache server of the system 12 are easily initialized to interface with workstations and a file server using conventional network configuration information. Further, after initial configuration of a storage cache, the storage cache does not require further administration, backup or management of any kind, such by a user of a workstation, and can be completely managed, monitored, provisioned and replicated from the cache server or a remote control center.
  • the system 12 implements a leasing protocol that ensures coherency and consistency of the real time data files available for shared access by workstations of the network 10 which operate using an existing distributed file system.
  • the leasing protocol permits multiple read leases for a data file, where the first read lease for a data file locks the data file so that a write lease subsequently cannot be granted.
  • no other read leases can be granted until the write lease is closed.
  • a reader right to a data file provides that a workstation, which may or may not be associated with a storage cache, can view the data file as a copy, such as obtained directly from the file server, or in the form of a cached data file which is stored at a storage cache.
  • FIG. 5 is a high level flow process 150 illustrating data processing operations performed by a cache server and a storage cache, in accordance with the present invention, for determining whether to grant a storage cache's request for a lease of a data file.
  • a first storage cache namely, the storage cache 30 A
  • a second storage cache namely, the storage cache 30 B
  • the leasing process 150 is also applicable where the network 12 includes more than two storage caches and that the leasing process 150 would be performed in connection with each of the storage caches holding a lease for the data file at issue.
  • step 152 the leasing module 54 causes the cache manager 50 of the storage cache 30 A to transmit a data file lease request to the cache server 36 .
  • step 154 the leasing module 64 determines if the storage cache 30 B already has a lease for the data file. If the determination in step 154 is yes, in step 156 the leasing module 64 determines if the lease held by the cache 30 B conflicts with the requested lease. Based on the leasing protocol criteria, as described above, a conflict does not exist if the cache 30 A lease request is read. In this circumstance, the leasing module performs step 158 to determine whether the file is already locked for read access based on distributed file system, such as CIFS or NFS, operations that control shared access to the file.
  • distributed file system such as CIFS or NFS
  • step 160 the leasing module 64 determines if the lock conflicts with the requested lease. A conflict would exist if (i) the lease request is a write lease and the existing lock is read or write lock, or (ii) if the lease request is a read lease and the existing lock is a write lock.
  • step 162 the leasing module 64 denies the lease request and provides a reader right to the workstation seeking access to the data file.
  • the storage cache associated with the workstation performs the process 120 to update the cached data file, if any, corresponding to the data file that was the subject of the lease request transmitted by the storage cache 30 A.
  • step 164 the leasing module 62 grants the request and records in its memory that the storage cache 30 A has a lease and the type of lease and locks the file so that no other workstation attached to the storage cache 30 B can have write access to the data file.
  • step 154 if the determination for this step is no, the leasing module 64 proceeds to step 158 .
  • step 166 the leasing module 64 determines if the requested lease is read. If yes, in step 168 the server manager 60 updates the data file at the server 36 based on the cached data file stored at the cache 30 B, preferably performing steps similar to the steps 108 , 110 , 112 and 114 of the process 100 . If step 168 is performed, the cache 30 B holds a write lease for the data file that is the subject of the lease request.
  • step 170 the server manager 60 transmits a response to the cache manager 50 of the storage cache 30 A that the lease request was denied and that the workstation can have reader rights to the data file.
  • the server manager 60 transmits a copy of the data file to the storage cache 30 A, or interacts with the storage 30 A to update a corresponding cached data file stored at the storage cache 30 A, preferably performing steps similar to the steps 124 , 126 , 128 and 130 of the process 120 .
  • the translator 52 supplies the cached data file, only with reader rights, to the workstation requesting access to the data file.
  • step 172 the leasing module 64 determines whether the lease for the data file held by the storage cache 30 B is read. If yes, the leasing module 64 in step 174 revokes the read lease of the storage cache 30 B, stores such information in its memory for future use in making a leasing decision and transmits data representative of this action to the storage cache 30 B so that its leasing module can update its memory and take appropriate action. Based on the revocation of the read lease, the storage cache 30 B only can provide a reader right to an associated workstation that seeks to access the data file.
  • Step 158 is performed following step 174 , as described above.
  • step 162 is performed as described above. In this outcome, the requested lease was for write access.
  • FIG. 6 is a high level flow process 180 illustrating data processing operations performed by a storage cache and a cache server, in accordance with the present invention, for determining whether or not to grant a request by a workstation associated with a storage cache for access to a data file, where the request is for read or write purposes.
  • the workstation 22 A is attempting to access a data file stored at the file server 36 .
  • the cache manager 50 of the storage cache 30 A determines that the workstation 22 A has made a request for access to a data file which is stored at the file server 36 .
  • the leasing module 54 determines if the storage cache 30 A already has a sufficiently strong lease for the data file.
  • Table 1 shows the relationship between a type of access request that has been made and the existing lease, if any, for a data file held by the storage cache.
  • the entries in Table 1 indicate whether, based on a particular access request, the existing lease, if any, for a data file held by the storage cache is sufficiently strong such that data file consistency and coherency are preserved among the remote systems associated with respective storage caches.
  • step 186 the translator 52 retrieves the cached data file from the storage 56 and transmits the cached data file to the workstation 22 A over the Ethernet 24 A. Consequently, a user at the workstation 22 A can open the cached data file for read or write purposes, depending on the nature of the access request. For example, if the access request was write, the user can enter data file modifications for the cached data file, and the translator 52 would monitor the modifications, and automatically and on an ongoing basis, update the cached data file stored in the storage module 56 to incorporate such modifications.
  • step 188 the leasing module 54 causes the cache manager 50 to transmit a new lease request to the cache server 36 .
  • an existing lease is not strong enough if the intersection of the access request and the existing lease is a NO, e.g., the access request is write and the existing lease for the data file at the storage cache is read.
  • the cache server 36 Based on the new lease request, the cache server 36 performs a process that is the same or substantially similar to the process 150 , as described above, to determine whether a lease can be granted. After the leasing module 64 determines whether and what type of lease can be granted, the server manager 60 transmits this information to the storage cache 30 A.
  • step 190 the leasing module 54 receives and processes the response to the lease request transmitted by the cache server 36 to determine whether a lease has been granted. If yes, in step 192 , the cache manager 50 and translator 52 of the storage cache 30 A perform a process, such as the process 120 described above, to update the corresponding cached data file in the storage module 56 . The cache manager 50 then performs step 186 .
  • step 194 the leasing module 54 determines whether the access request was read. If yes, steps 192 and 186 are performed as described above, except that in step 186 read access to the cached data file is provided.
  • step 196 the leasing module 54 prevents the cached data file from being accessed by the workstation 22 A. This outcome ensures data file coherence and consistency throughout the network 10 .
  • Step 196 is performed where the access request was write and another read or write lease for the data file existed at another storage cache associated with the distributed file system, such as the storage cache 30 B.
  • FIG. 7 is a high level flow process 200 illustrating data processing operations performed by a storage cache and cache server, in accordance with the present invention, for updating a data file stored at a file server after a storage cache that has obtained a lease for the data file no longer needs to maintain the lease active.
  • a storage cache and cache server for updating a data file stored at a file server after a storage cache that has obtained a lease for the data file no longer needs to maintain the lease active.
  • the workstation 22 A of the remote system 16 A previously obtained read or write access to the data file and the workstation 22 A closed the accessed cached data file, which it had been viewing or modifying on its operating system and which corresponds to the data file for which the storage cache 30 A holds a write lease or a read lease.
  • the cache manager 50 monitors data transmissions between the translator 52 and the workstation 22 A to determine when the workstation 22 A has closed the cached data file.
  • the translator 52 determines whether the workstation 22 A modified the cached data file. If yes, in step 206 the translator 206 and the cache manager 50 perform a file update process, preferably including differencing data processing similar to that described in the process 100 , to update the data file stored at the file server which corresponds the cached data file that was closed by the workstation 22 A.
  • the leasing module 54 which also received the transmission indicating that the cached data file was closed, in step 208 causes the cache manager 50 to transmit a release lease signal for the data file to the cache server 36 . Further in step 208 , at the cache server 36 , the leasing module 64 , upon receipt of the release lease signal, resets its memory concerning the data file. If a write lease was released, the reset provides that another storage cache, such as the storage cache 30 B, can obtain write lease access to the data file.
  • another storage cache such as the storage cache 30 B
  • step 210 the leasing module 54 determines whether the storage cache 30 A holds a lease for the corresponding data file. If yes, which means that the cache 30 A had a read lease for the data file, the leasing module 54 performs step 208 .
  • the storage cache 30 A did not have a lease for the data file, no further action is taken because the workstation 22 A that opened the file was a reader, i.e., could only read the file, and another storage cache, such as the storage cache 30 B, had obtained write access rights for an associated workstation.
  • the inventive storage caching system manages data files of a distributed file system to make them available for coherent and consistent shared real time access by multiple remote systems.
  • the data files can be accessed by users, who may be located at different remote locations, and are presented to the users in the form of a cached data file or a copy of the data file currently stored at the file server, each of which includes all previous modifications so as to constitute a real time, updated version of the data file.
  • the preferred transmission of file update data and data files between a storage cache and a cache server as compressed, streaming data provides that a user at a workstation experiences substantially LAN-speed access to a data file, although the data file may be physically stored at a file server located remotely from the workstation.
  • FIG. 8 is a system diagram of a network 310 including a preferred storage caching protocol system 312 which operates to manage access to shared real time data files which are stored at multiple file servers and to maintain data file coherence and consistency in the network 310 in accordance with the present invention.
  • the system 312 includes a plurality of cache servers 336 A, 336 B and 336 C, which are respectively coupled to associated data center systems 320 A, 320 B and 320 C, and also storage caches 30 A and 30 B, which are respectively coupled to the remote systems 16 A and 16 B in the same manner as described above for the network 10 .
  • each of the data centers 320 is constructed and functions in the same or substantially the same manner as the data center 20 in the network 10 .
  • the data center 320 A includes an Ethernet 324 A which couples workstations 322 A and 322 B and a file server 338 A to the cache server 336 A and a gateway 326 A, and the gateway 326 A is coupled to the communications network 28 .
  • each of the storage caches 30 can communicate with any of the cache servers 336 , which are likely located at different remote locations, and vice versa.
  • the cache servers 336 can communicate with respective associated file servers 338 for retrieving copies of and updating data files that are the subject of access requests from any of the storage caches 30 , in accordance with the inventive storage caching protocol.
  • the inventive cache server has a software infrastructure to act as a client for standard LAN file sharing protocols (NFS and CIFS), which makes it readily configurable to retrieve copies of a data file from or replace data files stored at any of the file servers 338 in the network 310 , where each of the file servers 338 can have any operating system format.
  • the cache server can also access files from and replace files on a local file system using standard filesystem APIs.
  • the inventive storage caching system correctly multiplexes an access request to the appropriate cache server, the location from which a copy the data file is presented to the user is unknown to the user at the workstation.
  • a user can access and operate on a sharable data file without knowing, being concerned with or ascertaining which data source system physically contains the data file.
  • each storage cache or server cache can be constructed to operate as both a storage cache and cache server.
  • a single combination storage cache and cache server appliance can be associated with a remote computer system or a data center computer system. The user at a workstation of an associated remote system would not be aware that, in some circumstances, the storage cache communicates with a cache server that is within the same appliance.
  • inventive storage caching protocol system provides tremendous flexibility in the allocation and sharing of file server and memory resources, as storage caches and cache servers can serve as simple building blocks for implementing very sophisticated topologies, such as cliques where every cache/server combination is connected with every other cache/server combination in the network.
  • the inventive storage caching system including the leasing protocol uses the cached data file stored at a storage cache and being modified by entries by the workstation, the version of the data file stored at the storage of the cache server or the data file stored at the file server to update a data file or a cached data file and maintain data file coherency and consistency in a network in the event of (i) a disconnection of a communication link established between a cache server and a storage cache, (ii) a failure of either the cache server or the storage cache, or (iii) an unexpected reboot of a workstation.
  • additional data for tracking file update status is not required.
  • FIG. 9 is a system diagram of a network 410 including a further preferred embodiment of a storage caching protocol system 412 which manages shared access to real time data files while maintaining data file coherence and consistency and also backing up data files in accordance with the present invention.
  • the system 412 includes cache servers 436 A and 436 B, which are respectively coupled to associated data center systems 420 A and 420 B, and a storage cache 30 A which is coupled to the remote system 16 A in the same manner as in the network 10 .
  • each of the data center systems 420 is constructed and operates in substantially the same manner as the data center 20 of the network 10 .
  • the backing-up of data files in accordance with the present invention is initiated when the cache manager 50 of the storage cache 30 A detects, for example, a network communication failure at the gateway 26 .
  • the cache manager 50 automatically and periodically attempts to reestablish a communications link to the cache server 436 A.
  • the storage cache 30 A also continues to operate without interruption, i.e., continues to monitor modifications to the cached data file entered by a workstation and stores only the current version of the cached data file, incorporating the modifications, in the storage 56 .
  • the cache manager 50 simultaneously attempts to establish a communications link with a back-up data center, such as the data center 420 B, via the cache server 436 B, as the cache servers 436 A and 436 B have different and unique IP routing addresses. If this back-up link can be established, the storage cache 30 A proceeds to perform the process 100 for updating a back-up copy of the data file stored at the file server 420 B. In other words, the storage cache 30 A continues the process of updating of the data file at the cache server 436 B at the point where the disconnection to the cache server 436 A occurred, assuming the data centers are mirrored.
  • the storage cache 30 A resumes the process for updating the data file by performing, for example, the steps 106 , 108 , 110 , 112 and 114 of the process 100 .
  • a checksum representing the version of the data file existing at the cache server 436 A or the file server 438 A at the time the disconnection occurred is used to compute the difference data in step 110 . Therefore, the storage cache 30 A effectively always maintains the file update data, because only a current version of the cached data file is stored and this current cached data file is used to update the version of the data file at the file server based on the checksum transmitted from the cache server.
  • the storage caching protocol system advantageously provides that the exact status of updating of the data file prior to the disconnection need not be tracked or known.
  • the storage cache can interact with multiple cache severs and easily can establish a communications link with the cache server of a back-up data center, should a communications link to the cache server of the primary data center fail.
  • the end user at a workstation does not experience or realize the disruption to the communications link when the primary data center fails, while attempts are made to re-establish a link to the primary data center or to a new link to the back-up data center and when the link is finally re-established to the primary data center.
  • the previous state of the data file is automatically restored from the memory in a storage cache or cache server to ensure that coherency is always maintained and pending write-back data is not lost in the case of reboots or system restarts.
  • a combination of streaming (for read-ahead), compression and differencing for better channel utilization is performed to make a cache hit extremely likely, enable substantial write behind and make a cache miss as efficient as possible.
  • the storage cache can attempt to establish communication links at multiple IP addresses for the same data center on different carriers when a network failure is experienced.
  • the failed storage cache is simply replaced and the new storage cache promptly establishes a connection with the cache server at the remote data center and immediately resumes caching and updating in accordance with the processes 100 and 120 .

Abstract

A storage caching method and system manages shared access to real time data files while maintaining data file coherency and consistency in a computer network including a plurality of remote computer workstations and at least one file server. The storage caching system is implemented by storage caches, which are associated with workstations, and a cache server, which is associated with a file server, where the storage caches and the cache server interface with a distributed file system to provide shared access to real time data files by remote workstations.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This application claims the benefit of U.S. Provisional Application No. 60/440,750 filed Jan. 17, 2003, assigned to the assignee of this application and incorporated by reference herein.
  • FIELD OF THE INVENTION
  • The present invention relates generally to managing shared access to data files and, more particularly, to a storage caching protocol which provides authorized computer workstations with shared access to real time data files while maintaining data file consistency and coherence.
  • BACKGROUND OF THE INVENTION
  • In modern computer system and networking architectures, a computer system that is a repository for data files is typically not the computer system on which processing of the data files is performed. Consequently, a user at a computer workstation associated with a remote site computer system, such as a laptop computer, networked computer or desktop computer, often will desire to access, i.e., view (read) or modify (write), a data file that is stored in an internal memory, on a disk or in network attached storage of a remotely located central data source computer system. Such remote access of data files is performed over a communications channel, such as a data bus, a communications network or the Internet, which typically introduces a delay or latency in the presentation of the data file at the system accessing the data file. The latency is based on the need to transmit data between the system accessing the data file and the system that produces or stores the data file. In addition, the data file is usually accessed in portions or blocks rather than as a continuous stream, which exacerbates the latency because each block experiences the channel delay upon transmission.
  • In order to mitigate the effects of channel delays, most current computer systems that perform distributed file system applications, which provide for shared access to data files, implement some form of caching. In caching, a local copy of all or a portion of a data file, which is stored at a central source computer system, is maintained in a cache established at a remote system, such as in the local memory of a workstation associated with the remote system. The workstation can read or write to the cached data file, where the cached data file mirrors all or a portion of the data file stored at the central system. The cache also stores data that tracks any changes made to the cached data file, which are entered by the workstation and ultimately are to be incorporated into the data file stored at the file server. Thus, with caching, channel latency can be mitigated and a user of the workstation of the remote system is not aware that the data file is accessed from a local source rather than a remotely located central source system.
  • Although caching may reduce latency in certain data file access circumstances, if access to a data file which has not yet been stored as a copy (mirrored) in the cache is attempted, the latency associated with retrieving a copy of the data file from the file server, known as a cache miss, still exists. To avoid cache misses and consequently improve distributed file system performance, a caching system often implements a read-ahead technique, known as pre-populating the cache, in which data files that will be required for access in the future are stored in the cache.
  • In a distributed file system that provides for shared access to data files among a plurality of remote systems, the caching system that is implemented needs to maintain cache coherence and cache consistency to avoid different versions of a data file being accessed by different respective remote systems. Cache coherence is a guarantee that updates and the order of the updates to a cached data file are preserved and safe. Thus, in a coherent distributed file system, there is a guarantee that (i) a remote system does not delete the cached update data before the update data is used to update the corresponding data file stored at the file server, and (ii) no other system updates the data file in a manner that potentially can compromise the update of the data file until the data file at the server has been updated using the update data from the cache. Cache consistency is a guarantee that the updates to an opened, cached data file made by a workstation are reflected in the cached data file in a timely fashion.
  • The properties of cache coherence and cache consistency are equally important when multiple remote systems access the same data file. In this circumstance, coherence additionally ensures that updates on any cache corresponding to a data file stored at the file server do not override updates by another cache corresponding to the same data file. Cache consistency additionally ensures that updates to the cached data file made at any cache are, in a timely fashion, incorporated into the cached data file at any other cache which is accessing the same data file.
  • Cache consistency and cache coherence are easily maintained where a caching system includes a write-through architecture, which provides that all updates to the cached data file are immediately transmitted to the central computer system. This immediate transmission results in an immediate update of the data file stored at the file server of the central system. Although such architectures improve the performance associated with having multiple caches perform a read access of the data file from the central system, the latency associated with updating the data file based on write accesses still exists. Hence, this architecture typically only performs extremely well for a distributed file system where data file updates are infrequent.
  • Another caching architecture, known as write-back, evolved from the write-through architecture in an attempt to solve the latency problems of the latter. In a write-back architecture, a cache stores the updates to the cached data file for a period of time before transmitting (flushing) the cached updates to the central system. This periodic flushing updates the cached data file without significant latency. The simplest form of write-back is write-behind architecture, where the updates to the cached data file are not immediately, in other words after some delay, transmitted to the central source in the same order that the updates to the cached data file are stored on the cache. As cached updates are not immediately available to either the central source or other remote systems in write-back caching architectures, such architectures are mostly useful only when a single remote system will be accessing the data file for reading or writing.
  • If access to a data file by multiple remote systems is contemplated, the write-back caching system often is enhanced with mechanisms that track updates performed at all of the caches and also at the central source system to ensure consistency of data files. These mechanisms typically substantially increase the complexity and cost of the cache, so as to make such caches impractical in many applications. The performance benefits, however, are significant, which makes these caches very attractive for high performance computing implementations, such as computer systems connected over computer networks.
  • In a typical computer system architecture having file sharing capabilities, a local area computer network (“LAN”) remotely accesses data files over a distributed file system, such as NFS° (Network File System) for UNIX™ or CIFS® (Common Internet File System) for Microsoft Windows™ systems. These file systems provide workstations associated with remote computer systems with a mechanism to access data files stored at a file server of a central computer system. In addition, each remote system utilizes local caching to increase efficiency of access to data files. Typically, the caching is performed at a granularity of pages of a data file that usually constitute four Kilobyte blocks of data. The actual number of pages cached is a function of the memory available for caching in a workstation that is incorporated in or coupled to a remote system. In addition, these file systems utilize some measure of write-back caching to achieve acceptable performance.
  • Although cache consistency and cache coherence are important properties for a caching system, these properties are often very difficult to realize in a networked computer system having distributed file system performance capabilities, especially if the system uses write-back caching. Thus, many distributed file systems do not completely satisfy the guarantees of cache consistency and coherence. In practical implementations, a distributed file system relies on a crucial assumption that sharing of the same data file is rare and, therefore, makes a trade-off between performance and correctness when sharing of a data file does occur. For example, NFS currently is not particularly suitable for shared access because (i) it has weak consistency guarantees, namely, modifications to a cached data file for a first remote system may not be timely reflected at the central system and, thus, would not necessarily be mirrored at another remote system accessing the data file from the central system; and (ii) it has no coherence guarantees. In addition, although CIFS provides excellent consistency and coherence, shared access is at low performance because the consistency and coherence is achieved by utilizing write-through any time that more than one remote system is accessing any given data file.
  • In addition to automatic measures for maintaining consistency and coherence, NFS and CIFS also provide locking mechanisms that allow a file sharing application to control coherence and consistency aspects. In particular, NFS allows sharing applications to voluntarily cooperate with each other without any operating system control, which is commonly known as advisory byte range locking. CIFS provides operating system controlled locking, known as mandatory byte range locking, as well as explicit file sharing modes, which, for example, permit an application to control the manner in which a file is accessed such that no other application can access the file. The file sharing application can use such mechanisms to improve the coherence and consistency properties provided by such prior art file distribution systems. For example, an application can use byte range locking to provide coherence and consistency even if the underlying system, e.g., NFS, does not have these properties.
  • Further, the performance issues faced by a networked system over a local area network, where typical latencies are well under a millisecond, are compounded when file sharing is performed over a wide area network (“WAN”). One prior art system, known as Transarc Andrew File System (AFS), was created to overcome the latency existing in WANs that are geographically small, such as a WAN of a university campus. In contrast to NFS and CIFS, which use local memory of the remote system, such as memory of a computer workstation, for storing pages of files, AFS uses an on-disk local file system as a cache for entire files. In AFS, most operations occur on the local copy of the file and there is no need to retrieve data from the file server when access to the data file is requested. As each cached data file is modified and closed, the updates are transmitted (flushed) to the central system to update the corresponding data file at the file server, and then such updated data file is made available for access by other remote sites.
  • Thus, AFS provides flush on close consistency at file granularity, in other words, updates to a data file are immediately available when the data file is closed, but not as it is being written. AFS, however, weakens the coherence and consistency guarantees considerably to make WAN operation feasible. In particular, AFS lacks coherence because it allows multiple remote systems to simultaneously update respective cached data files, each of which corresponds to a single data file, and provides that the last remote system that closes the file is the remote system that controls the changes to the data file at the server of the central system. In other words, the modifications of such last closing remote system supersede the changes apparently being made to the data file by other remote systems. In addition, the consistency of AFS is weak because modifications are transmitted to the central source only when a remote system closes the file.
  • Consequently, although AFS is useful for a campus wide sharing application, it has multiple disadvantages when implemented in a business enterprise environment. For example, AFS must be installed on all computers. In addition, AFS cannot be operated in conjunction with NFS and CIFS distributed file systems or other like systems which are conventional in the prior art. Furthermore, the lack of consistency and coherence of AFS makes it unsuitable for many enterprise applications that require multiple remote systems to have shared access to a real time version of a data file.
  • Therefore, a need exists for a system and method for providing real time, shared access to data files through use of a distributed file system, and where the system and method exploit the benefits of caching while also providing data file coherence and consistency and ease of interoperability and interfacing with an existing distributed file system.
  • SUMMARY OF THE INVENTION
  • In accordance with the present invention, a storage caching protocol system interfaces with a distributed file system to provide that authorized computer workstations have shared access to real time data files stored at a file server. A data file stored at the file server is automatically updated, in substantially real time, by a cache server to include file update data representative of data file modifications entered at a workstation and incorporated into a corresponding cached data file which is stored at a storage cache. Consequently, the cache server can respond to an access request for the data file from a workstation using a real time, updated version of the data file, where the real time data file includes all of the data file modifications which were entered by workstations that previously accessed the data file and incorporated into corresponding cached data files respectively stored at storage caches associated with the individual workstations. In a preferred embodiment, file update data is transmitted as streaming data to update the data file stored at the file server or a cached data file stored at a storage cache and, most preferably, the file update data is transmitted in compressed form and optionally generated using data differencing techniques.
  • In a preferred embodiment, the storage caching protocol system includes at least one storage cache and at least one cache server which are communicatively interconnected over a communications medium. The cache server is associated with a file server containing data files, and the storage cache is associated with at least one authorized computer workstation. The cache server transmits a copy of a data file stored at the file server to the storage cache. The storage cache stores the data file copy as a cached data file, and automatically transmits to the cache server file update data representative of modifications to the cached data file entered by a workstation associated with the storage cache and incorporated into the cached data file. The cache server uses the file update data to update the data file stored at the file server, and responds to subsequent access requests for the data file, such as from the same or another storage cache or an authorized computer workstation not associated with a storage cache, utilizing the updated version of the data file stored at the file server. In a preferred embodiment, the response to the access request includes server file update data for updating a corresponding cached data file stored at the requesting storage cache.
  • The inventive storage caching system preferably operates in accordance with a leasing protocol that manages requests for access to a data file to ensure consistency and coherence among all remote computer systems that share access to a data file through use of a distributed file system. Each time that a remote computer system associated with a storage cache desires to access, i.e., to view only (read) or to modify (write), a data file stored at the file server, the storage cache associated with the remote system determines if it has an appropriate lease for the data file and, if not, transmits a lease request to the cache server. The cache server grants the lease request if cache consistency and cache coherence with any other remote system including a storage cache that can access the data file can be preserved. If the cache server denies a lease request, the remote system can either prohibit the requested access or pass the request to the file server without caching the data file, as updates to a cached data file are not allowed. When the request is passed to the file server, the workstation from which an access request originated only has a right to view and cannot cache the data file, i.e., has a reader right, as another storage cache continues to have a write lease to the data file. Every time that a workstation associated with the storage cache is granted a reader right, the corresponding cached data file is updated using the data file stored at the file server, and the cached data file cannot be modified by the workstation.
  • In a further preferred embodiment, the cache server decides whether to grant or deny a request for a lease of a data file received from a first storage cache, based on (i) whether another storage cache already has a lease and the type of lease existing, which can be write or read, or (ii) whether the data file is already locked by some other mechanism, such as a mandatory or advisory lock associated with a prior art distributed file system protocol, such as CIFS and NFS. The lease request is processed based on the following criteria: a write lease cannot be granted if a read lease already exists at a second storage cache or the file is already locked for reading by another mechanism; only a pass through reader right can be granted if a write lease already exists at a second storage cache; and an additional read lease can be granted if a read lease already exists at a second storage cache or the file is only locked for reading. In addition, after a lease is granted, the cache server locks the data file to prevent another application from locking the data file in a conflicting fashion. Thus, the cache server ensures that any lease that is granted is compatible with an existing lease or any existing lock on the data file already taken by another mechanism. If a write lease is granted, the first storage cache autonomously updates the cached data file, based on data file modifications entered by an associated workstation, without intervention from the cache server. Further, following grant of a lease request or a reader right, the cache server and the first storage cache initially attend to automatically updating the cached data file, if any, stored at the first storage cache.
  • In another preferred embodiment, a storage cache responds to a request from an associated authorized workstation for access to a data file stored at the file server based on the strength of the lease, i.e., read lease or write lease, where a write lease is stronger than or includes file viewing rights associated with a read lease, if any, that the cache server has previously provided to the storage cache. The access request is granted where the access request, which can be read or write, is of a level commensurate with that of the existing lease, if any, for the storage cache. In addition, where the storage cache does not have an existing lease of sufficient strength to satisfy the access request, it must first obtain a lease and therefore requests a lease for the data file from the cache server. The lease request is granted if the cache server determines that a lease can be granted or that the requested access does not conflict with an existing lease of another storage cache as well as any existing locks on the data file. Following a grant of the lease request, the storage cache permits the cached data file to be opened at the workstation for read or write purposes, in accordance with the access request. If the lease request is denied, the storage cache interacts with the cache server to update the cached data file based on the version of the data file stored at the file server and only allows read access. The cached data file at the storage cache is automatically updated, as needed, based on interaction between the cache server and the storage cache. A storage cache typically releases or drops the lease only when all workstations associated with the storage cache have closed the cached data file and all pending updates to the data file, which are reflected in the cached data file, are transmitted from the storage cache to the cache server.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Other objects and advantages of the present invention will be apparent from the following detailed description of the presently preferred embodiments, which description should be considered in conjunction with the accompanying drawings in which like references indicate similar elements and in which:
  • FIG. 1 is a system diagram illustrating implementation of a storage caching protocol in a distributed file system in accordance with the present invention.
  • FIG. 2 is a block diagram of a storage cache in accordance with the present invention.
  • FIG. 3 is a block diagram of a cache server in accordance with the present invention.
  • FIG. 4A is a flow diagram of a method for updating a data file stored at a file server based on the transmission of file update data from a storage cache to a cache server in accordance with the present invention.
  • FIG. 4B is a flow diagram of a method for updating a cached data file stored at a storage cache based on server file update data transmitted by a cache server in accordance with the present invention.
  • FIG. 5 is a flow diagram of a method for responding to a request for a lease from a storage cache in accordance with the present invention.
  • FIG. 6 is a flow diagram of a method for responding to a request for access to a data file received at a storage cache in accordance with the present invention.
  • FIG. 7 is a flow diagram of a method for releasing a lease of a data file in accordance with the present invention.
  • FIG. 8 is a system diagram illustrating implementation of a storage caching protocol in a distributed file system having a plurality of file servers in accordance with the present invention.
  • FIG. 9 is a system diagram illustrating implementation of a storage caching protocol in a distributed file system to provide for data backup in accordance with the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 1 is a system diagram of an illustrative computer system network 10 which operates in accordance with the present invention of a storage caching protocol that provides multiple computer systems shared access to real time data files. The network 10 includes a storage caching protocol system 12 that interfaces with a distributed file system application operating at a data center computer system, which is a repository for data files, and a remote site computer system, which normally is located remotely from a data center system and is associated with a computer workstation that desires to access, i.e., view only (read) or modify (write), data files stored at a file server of a data center system. The inventive system 12 includes at least one storage cache, which is coupled to a workstation of an associated remote system, and at least one cache server, which is coupled to a file server of a data center system, where the storage cache and the cache server utilize a communications link, such as a link established over the Internet, to transfer (i) copies of data files that the associated workstation desires to access, (ii) file update data representative of on any data file modifications entered by authorized workstations that access the data file, and (iii) data associated with the operating features of the storage caching protocol system 12.
  • In the implementation of the storage caching protocol system 12 in the illustrative network 10 shown in FIG. 1, the system 12 interfaces with remote work group computer systems 16A and 16B and a central work group data center computer system 20. The remote system 16A includes computer workstations 22A and 22B interconnected over a communications channel 24A, such as an Ethernet or like medium. Similarly, the remote system 16B includes computer workstations 22C and 22D interconnected over a communications channel 24B. Each of the workstations 22 is part of or constitutes, for example, a personal computer, a personal digital assistant, or other like electronic device including a processor and memory and having communications capabilities. In addition, the workstations of a remote system, in combination with the Ethernet, form a local access network (“LAN”) and operate in accordance with a conventional prior art distributed file system, such as NFS or CIFS, which provides that a user of a workstation can access data files located remotely from the remote system in which the workstation is contained.
  • A communications gateway 26 couples the Ethernet 24 of each of the remote systems 16 to a communications network 28. The network 28, for example, can be a wide area network (“WAN”), LAN, the Internet or any like means for providing data communications links between geographically disparate locations. The gateway 26, for example, is a standard VPN Internet connection having standard DSL speeds. As well known in the art, the gateway 26 provides that data, such as data files accessible in accordance with a prior art distributed file system such as NFS or CIFS, can be transferred between a workstation and a remotely located file server. It is noted that although the network 10 of FIG. 1 shows the gateway 26 and network 28 as being part of the storage caching system 12, these components, which constitute well known, prior art devices, do not constitute inventive features although they are required for operation of the storage cache and cache server of the inventive system 12, as described in further detail below.
  • Referring again to FIG. 1, the storage caching system 12 includes storage caches 30A and 30B which are associated with the remote systems 16A and 16B, respectively. Each storage cache 30 is coupled to the Ethernet 24 and the gateway 26 of the associated remote system 16. In addition, the storage caching system 12 includes a cache server 36. The cache server 36 is coupled to an associated gateway 26C which is also coupled to the network 28. An Ethernet 24C couples the gateway 26C and the cache server 36 to a file server 38 and workstations 22D and 22E contained in the data center system 20. The file server 38 is a conventional file storage device, such as a NAS, which is a repository for data files and provides for distribution of stored data files to authorized workstations in accordance with operation of conventional distributed file systems, such as NFS or CIFS, which are implemented at the authorized workstations of the remote systems 16 and the data center 20. For purposes of illustration, it is assumed that all of the workstations 22 in the remote systems 16 and in the data center 20 constitute authorized workstations and operate in accordance with a distributed file system compatible with that of the server 38.
  • FIG. 2 is a preferred embodiment of the storage cache 30 in accordance with the present invention. Referring to FIG. 2, the storage cache 30 includes the modules of a cache manager 50, a translator 52, a leasing module 54, and a local leased file storage 56. The cache manager 50 is coupled to the translator 52 and is for coupling to a cache server, such as the cache server 36 as shown in FIG. 1, via gateways and a communications network. The translator 52 is coupled to the leasing module 54 and the local storage 56, and is for coupling to workstations of an associated remote system via an Ethernet connection. As explained in detail below, the cache manager 50 controls routing of data files, file update data and data file leasing information to and from the cache server 36. The translator 52 stores copies of accessed data files at the storage 56 as a cached data file, makes the cached data file available for reading or writing purposes to an associated workstation that requested access to a data file corresponding to the cached data file, and updates the cached data file based on data file modifications entered by the workstation or update data supplied from the cache server. In addition, the translator 52 preferably can generate a checksum representative of a first data file and determine the difference between another data file and the first data file based on the checksum using techniques that are well known in the art. The leasing module 54, through interactions with the cache server 36, determines whether to grant a request for access to a data file from an associated workstation, where the access request requires that the cached data file is made available to the associated workstation either for read or write purposes. In a preferred embodiment, a storage cache is associated with every remote computer system that can access a data file stored at a file server of a data center system over the network 28.
  • FIG. 3 is a preferred embodiment of the cache server 36, in accordance with the present invention, that manages shared access to data files stored in the file server by multiple storage caches, such as the caches 30A and 30B, and also by workstations, such as the workstations 22E and 22F of the data center 20, which are not associated with a storage cache. The cache server is preferably a thin appliance having an architecture that makes it compatible and easily integrated with an existing distributed file system, such as NAS and SAN, implemented at a remote computer system and a data center computer system. See Ser. No. 09/766,526, filed Jan. 19, 2001, assigned to the assignee of this application and incorporated by reference herein.
  • Referring to FIG. 3, the cache server 36 includes the modules of a server manager 60, a translator 62, a leasing module 64, and a local file storage 66. The server manager 50 is coupled to the translator 62, the leasing module 64 and the storage 66 and also is for coupling to storage caches, such as the storage caches 30A and 30B, via the gateway 26C and the network 28. The translator 62 is coupled to the storage 66 and is for coupling to a file server of an associated data center computer system via an Ethernet connection. The translator 62 temporarily stores at the storage 66 copies of data files stored at and obtained from the file server 36, and performs processing using the stored data files and update data received from a storage cache to generate a replacement, updated data file. The translator 62 also replaces a data file stored in the file server 38 with the replacement data file. In addition, the translator 62 can supply to a workstation associated with the central system, such as the workstations 22D and 22E, a copy of a data file stored at the file server 36 only for viewing purposes in accordance with the inventive leasing protocol, described in further detail below. In a preferred embodiment, the translator 62, like the translator 52, can generate a checksum representative of a first data file and determine the difference between another data file and the first data file using the checksum. In addition, the leasing module 64, through interactions with the storage caches included in the system 12, determines whether a request for access to a data file from a workstation associated with a specific storage cache should be granted or denied.
  • It is to be understood that each of the modules of each of the storage cache 30 and the cache server 36, which perform data processing operations in accordance with the present invention, constitutes a software module or, alternatively, a hardware module or a combined hardware/software module. In addition, each of the modules suitably contains a memory storage area, such as RAM, for storage of data and instructions for performing processing operations in accordance with the present invention. Alternatively, instructions for performing processing operations can be stored in hardware in one or more of the modules. Further, it is to be understood that, in a preferred embodiment, the modules within each of the cache server 36 and the storage cache 30 can be combined, as suitable, into composite modules, and that the cache server and storage cache can be combined into a single appliance which can provide both caching for a workstation and real time updating of the data files stored at a file server of a central data center computer system.
  • In accordance with the present invention, the storage caches and the cache server of the storage caching system 12 provide that a data file stored in a file server of a data center, and available for distribution to authorized workstations via a conventional prior art distributed file system, can be accessed for read or write purposes by the workstations, that the workstations experience a minimum of latency when accessing the file, and that the cached data file supplied to a workstation in response to an access request corresponds to a real time version of the data file. A storage cache of the system 12 stores in the storage 56 only a current version of the cached data file corresponding to the data file that was the subject of an access request, where the single cached data file incorporates all of the data file modifications entered by a workstation associated with the storage cache while the file was accessed by the workstation. File update data associated with the cached data file is automatically, and preferably at predetermined intervals, generated and then transmitted (flushed) to the cache server. Most preferably, the file update data is flushed with sufficient frequency to provide that a real time, updated version of the data file is stored at the file server and can be used by the cache server to respond to an access request from another storage cache or a workstation not associated with a storage cache. In a preferred embodiment, the local storage 56 of the storage cache includes only cached data files corresponding to recently accessed data files.
  • FIG. 4A is a high level flow process 100 illustrating data processing operations performed at a storage cache and a cache server, in accordance with the present invention, for updating a data file at a file server. For purposes of illustrating the process 100, and also the processes described below with reference to FIGS. 4B, 5, 6, and 7, reference is made to the network 10 and operations that the components of the storage caching system 12 would perform in connection with requests for access to a data file from the remote system 16A or 16B where the data file is stored at the file server 36 of the source system 20. For highlighting the features of the process 100, it is assumed that the storage module 56 of the storage cache 30A does not initially contain a cached data file corresponding to a data file that the workstation 16A seeks to access for write purposes.
  • Referring to FIGS. 1, 2, 3 and 4A, in step 102, the translator 62 communicates with the file server 38 and generates a copy of the data file that the workstation 16A desires to access. The server manager 60 then transmits a copy of the data file to the storage cache 30A via the gateway 26C, the network 28 and the gateway 26A.
  • In step 104, the cache manager 50 receives the transmitted copy of the data file from the gateway 26A and stores the file in the storage 56 as a cached data file. In addition, the translator 52 interacts with the distributed file system of the workstation 16A to provide that the workstation 16A can open, and enter data file modifications to (write) the cached data file. When the user of the workstation is presented with the cached data file, in other words, the user is permitted to open the cached data file following a request for access for the corresponding data file, the user is not aware of the location in the network 10 from which the file was obtained. The user does not know whether he is working on a local copy of the data file, such as stored at a memory of the local remote system or at the storage cache 30A, or a copy of a data file retrieved from a remote storage location, such as the remotely located data center computer system 20. As the user enters data file modifications at the workstation 16A, the translator 52 monitors the modifications and incorporates these modifications into the cached data file at the storage 56. In other words, only a current version of the cached data file, which includes all modifications to the cached data file previously made by any workstation within the remote system 16A, is stored in the storage 56.
  • Steps 106, 108, 110, 112 and 114 set forth file update operations that the storage cache 26A and the cache server 26C automatically perform to update the version of the data file stored at the file server 38, based on the modifications made to the corresponding cached data file stored at the storage cache 26. Based on this automatic updating, the cache server can transmit a real time, updated version of the data file in response to a request for access to the data file received subsequently from an authorized workstation other than the workstation 16A, where the workstation may or may not be associated with a storage cache 30A or another storage cache that is part of the system 12. In the preferred illustrated embodiment of the process 100, the components of the system 12 implement the well known prior art technique of differencing as part of the inventive automatic updating of a data file to minimize potential latencies.
  • Referring again to FIG. 4A, in step 106, the cache manager 50 of the storage cache 30A transmits a data file transfer request to the cache server 36. At the cache server 36, the server manager 60, based on receipt of this request, causes the translator 62 to generate a checksum for the data file currently stored at the file server 38 using techniques well known in the art. The translator 62 generates the checksum by retrieving a copy of the data file from the file server 38 and storing data needed for checksum processing, such as the data file copy, in the storage 66, as necessary.
  • In step 108, the server manager 60 transmits the checksum to the storage cache 30A. In step 110, the cache manager 50 retrieves the cached data file from the storage 56 and the translator 52 uses the checksum to compute file update data, which is in the form of difference data. The difference data represents differences between the cached data file and the version of the data file currently stored at the file server and represented by the checksum.
  • In step 112, the cache manager 50 transmits the difference data to the cache server 36. Then in step 114, the translator 62 uses the difference data to generate an updated, replacement version of the data file. In particular, the translator 62 retrieves a copy of the current version of the data file, which preferably is stored in the local file storage 66 at step 108, and then processes the stored current version of the data file using the difference data to generate an updated data file. The translator 62 then replaces the data file currently stored at the file server 38 with the replacement, updated data file. Thus, when the cache server 36 subsequently receives a request for access to the data file transmitted from another storage cache, such as the storage cache 30, or from one of the workstations 22E or 22F in the data center system 20, the cache server 36 uses the updated data file to respond to the request. Consequently, the subsequent requestor effectively is presented with a real time version of the data file, which incorporates previous changes to the data file based on entries made at the workstation 16A.
  • In a preferred embodiment, in step 112 the cache manager 50 transmits the file update data as streaming data to the cache server 36. In an alternative preferred embodiment, the file update data is compressed before transmission to the cache server as streaming data to minimize the amount of data transferred over the network 28, thereby reducing potential latency.
  • In a preferred operation of the process 100, the cache server 36 continues to update a data file stored in the storage 66 based on file update data transmitted from a storage cache and, once transmission of all of the file update data is completed and the server cache has received all such transmitted data, the cache server then replaces the data file stored at the file server 36 with the updated data file.
  • FIG. 4B is a high level flow process 120 illustrating data processing operations that a storage cache and cache server perform, in accordance with the present invention, for updating a cached data file at a storage cache using the corresponding data file stored at the file server. For purposes of highlighting the features of the process 120, it is assumed that the storage cache 30A has received a request for access to a data file from the workstation 16A, a cached data file corresponding to the data file is stored at the storage module 56 and the workstation 22A or 22B previously accessed the data file for either read or write purposes. By updating the cached data file before it is presented to the workstation 22A in response to an access request, any updates made to the data file since the workstation 22A previously accessed the data file are incorporated into the cached data file. For example, the workstation 22C may have previously written to a cached data file at the storage cache 30B, which corresponds to the data file, and file update data representative of the modifications made to such cached data file may have been used to update the data file at the file server 36, as explained above in connection with the process 100, such that the data file at the file server 36 is different than the corresponding cached data file presently stored at the cache 30A.
  • Referring to FIG. 4B, in step 122 the cache manager 50, following receipt of the access request from the workstation 22A, and where it is assumed for simplicity that such access request would not impact coherence for the data file in the network 10, automatically transmits to the cache server 36 a data file transfer request. In response to the file transfer request, the translator 62 retrieves the data file from the file server 36 and the server manager 60 stores the data file in the storage 66.
  • In step 124, the translator 52 generates a checksum for the corresponding cached data file and the cache manager 50 transmits the checksum to the cache server 36. To compute the checksum, the translator 52 retrieves the cached data file from the storage module 56 and performs well known, prior art checksum processing on the cached data file.
  • In step 126, the translator 62 generates server file update data using the checksum. The server file update data preferably represents differences between the data file currently stored in the file server 36, a copy of which was stored in the storage 66 in step 122, and the current version of the cached data file stored at the storage cache 30A and represented by the checksum.
  • In step 128, the server manager 60 transmits the server file update data to the storage cache 30A. Then in step 130, the translator 52 uses the server file update data to generate an updated cached data file which replaces the cached data file stored in the storage module 56. Thereafter, the translator 52 uses the cached data file, which has been updated based on any other data file modifications made by other workstations associated with a storage cache of the system 12, to respond to the access request from the workstation 22A. Thus, user desired updates to an accessed data file are stored in the form of a single, current version cached data file at the storage 56 of a storage cache.
  • Similar to the process 100, the server file update data is preferably transmitted as streaming data to the storage cache and, in addition, the server file update data is most preferably compressed before transmission as streaming data to the storage cache.
  • In a preferred embodiment, the process 120 is automatically performed for a storage cache at predetermined intervals to provide that a cached data file is updated before a time that a workstation associated with the storage cache is expected to request access to the data file. For example, in an enterprise implementation of the inventive storage caching protocol system 12, the process 120 is automatically performed by a storage cache early in the morning, before employees would arrive at work and request access to data files from their workstations. In another preferred embodiment where none of the workstations of a remote system have accessed a particular data file for longer than a predetermined interval, the process 120 is automatically performed to update the corresponding cached data files at the storage cache to minimize latency. In a further preferred embodiment, all data files that workstations of a remote system would seek to access are initially stored at the storage cache associated with the remote system.
  • Thus, the inventive storage caching protocol system constitutes an invisible interface between a remote system and a data center system which manages shared access to real time data files. Advantageously, the changes that a workstation desires to make to a data file are not backed up at a storage cache. The desired changes are represented in the cached data file, and file update data which, is derived from the cached data file, is constantly transmitted to the cache server. The cache server, in turn, uses the file update data to update the data file stored at the file server of a data center system. Therefore, the remote system or a storage cache does not require a large amount of memory for local storage of files. Consequently, the installation of the inventive cache server in association with a central data center system provides memory saving benefits throughout the computer network 10 with a minimum of administrative overhead, as each of the remote systems associated with a storage cache which operates in conjunction with the cache server has a minimum of local memory storage requirements. This achieves minimal memory requirement. Unlike prior art file sharing systems, which are complete and separate systems, the inventive storage caching system has low memory requirements, is interoperable with existing distributed file system technology and, as discussed in detail below, also provides for network-wide coherence of shared data files when accessed by workstations. Further, the inventive storage caching protocol performs read and write shared access operations on an entire data file, which is markedly different from prior art operating systems, such as used by AFS, NFS and CIFS, each of which primarily performs read and write operations using portions (data blocks) of a data file.
  • Advantageously, the storage caching system 12 can be implemented in connection with an existing, prior art distributed file system, such as NFS or CIFS, without adding to or modifying software at appliances already existing at the remote systems or the data center systems and without impacting the existing software architecture. For example, the system 12 can appear as a Windows file server to a Windows users and a Unix file server to a Unix users. In addition, in operation, the storage cache and cache server of the system 12 are easily initialized to interface with workstations and a file server using conventional network configuration information. Further, after initial configuration of a storage cache, the storage cache does not require further administration, backup or management of any kind, such by a user of a workstation, and can be completely managed, monitored, provisioned and replicated from the cache server or a remote control center.
  • In accordance with a preferred aspect of the present invention, the system 12 implements a leasing protocol that ensures coherency and consistency of the real time data files available for shared access by workstations of the network 10 which operate using an existing distributed file system. The leasing protocol permits multiple read leases for a data file, where the first read lease for a data file locks the data file so that a write lease subsequently cannot be granted. In addition, following grant of a write lease for a data file, no other read leases can be granted until the write lease is closed. Further, where a write lease for a data file already exists, there can be multiple reader rights of the data file. A reader right to a data file provides that a workstation, which may or may not be associated with a storage cache, can view the data file as a copy, such as obtained directly from the file server, or in the form of a cached data file which is stored at a storage cache.
  • FIG. 5 is a high level flow process 150 illustrating data processing operations performed by a cache server and a storage cache, in accordance with the present invention, for determining whether to grant a storage cache's request for a lease of a data file. For purposes of highlighting the features of the leasing protocol set forth in the preferred process 150, it is assumed that a first storage cache, namely, the storage cache 30A, is initiating a lease request for a data file, which is stored at the file server 36, based on an access request received from the workstation 22A. In addition, for simplicity and clarity of description, it is also assumed that a second storage cache, namely, the storage cache 30B, is the only other storage cache in the network 10 that can be granted a lease for a data file. It is to be understood, however, the leasing process 150 is also applicable where the network 12 includes more than two storage caches and that the leasing process 150 would be performed in connection with each of the storage caches holding a lease for the data file at issue.
  • Referring to FIG. 5, in step 152 the leasing module 54 causes the cache manager 50 of the storage cache 30A to transmit a data file lease request to the cache server 36. In step 154, the leasing module 64 determines if the storage cache 30B already has a lease for the data file. If the determination in step 154 is yes, in step 156 the leasing module 64 determines if the lease held by the cache 30B conflicts with the requested lease. Based on the leasing protocol criteria, as described above, a conflict does not exist if the cache 30A lease request is read. In this circumstance, the leasing module performs step 158 to determine whether the file is already locked for read access based on distributed file system, such as CIFS or NFS, operations that control shared access to the file. If the determination in step 158 is that the data file is already locked, then in step 160 the leasing module 64 determines if the lock conflicts with the requested lease. A conflict would exist if (i) the lease request is a write lease and the existing lock is read or write lock, or (ii) if the lease request is a read lease and the existing lock is a write lock.
  • If the determination in step 160 is that a conflict exists, in step 162 the leasing module 64 denies the lease request and provides a reader right to the workstation seeking access to the data file. When reader rights are provided, the storage cache associated with the workstation performs the process 120 to update the cached data file, if any, corresponding to the data file that was the subject of the lease request transmitted by the storage cache 30A.
  • Referring to steps 158 and 160, if the determination for either of these steps is no, then in step 164 the leasing module 62 grants the request and records in its memory that the storage cache 30A has a lease and the type of lease and locks the file so that no other workstation attached to the storage cache 30B can have write access to the data file.
  • Referring again to step 154, if the determination for this step is no, the leasing module 64 proceeds to step 158.
  • Referring again to step 156, if there is a conflict, then in step 166 the leasing module 64 determines if the requested lease is read. If yes, in step 168 the server manager 60 updates the data file at the server 36 based on the cached data file stored at the cache 30B, preferably performing steps similar to the steps 108, 110, 112 and 114 of the process 100. If step 168 is performed, the cache 30B holds a write lease for the data file that is the subject of the lease request.
  • Following step 168, in step 170 the server manager 60 transmits a response to the cache manager 50 of the storage cache 30A that the lease request was denied and that the workstation can have reader rights to the data file. As part of the response, the server manager 60 transmits a copy of the data file to the storage cache 30A, or interacts with the storage 30A to update a corresponding cached data file stored at the storage cache 30A, preferably performing steps similar to the steps 124, 126, 128 and 130 of the process 120. The translator 52, in turn, supplies the cached data file, only with reader rights, to the workstation requesting access to the data file.
  • Referring again to step 166, if the determination is that the requested lease is not read, in step 172 the leasing module 64 determines whether the lease for the data file held by the storage cache 30B is read. If yes, the leasing module 64 in step 174 revokes the read lease of the storage cache 30B, stores such information in its memory for future use in making a leasing decision and transmits data representative of this action to the storage cache 30B so that its leasing module can update its memory and take appropriate action. Based on the revocation of the read lease, the storage cache 30B only can provide a reader right to an associated workstation that seeks to access the data file. In the circumstance where the storage cache 30B already has a read lease for the data file and an associated workstation is reading the file based on the read lease, the read viewing continues for the workstation and the user does not realize the changed status from read lease to reader rights. Step 158 is performed following step 174, as described above.
  • Alternatively, if the determination in step 172 is that the storage cache 30B lease is read, then step 162 is performed as described above. In this outcome, the requested lease was for write access.
  • FIG. 6 is a high level flow process 180 illustrating data processing operations performed by a storage cache and a cache server, in accordance with the present invention, for determining whether or not to grant a request by a workstation associated with a storage cache for access to a data file, where the request is for read or write purposes. For purposes of illustration, the workstation 22A is attempting to access a data file stored at the file server 36. Referring to FIG. 6, in step 182, the cache manager 50 of the storage cache 30A determines that the workstation 22A has made a request for access to a data file which is stored at the file server 36. In response to the access request, in step 184 the leasing module 54 determines if the storage cache 30A already has a sufficiently strong lease for the data file. Table 1 shows the relationship between a type of access request that has been made and the existing lease, if any, for a data file held by the storage cache. The entries in Table 1 indicate whether, based on a particular access request, the existing lease, if any, for a data file held by the storage cache is sufficiently strong such that data file consistency and coherency are preserved among the remote systems associated with respective storage caches.
    TABLE 1
    EXISTING LEASE
    Read Write No Lease
    ACCESS Read Yes Yes No
    REQUEST Write No Yes No
  • If the existing lease is sufficiently strong in relation to the access request, in step 186 the translator 52 retrieves the cached data file from the storage 56 and transmits the cached data file to the workstation 22A over the Ethernet 24A. Consequently, a user at the workstation 22A can open the cached data file for read or write purposes, depending on the nature of the access request. For example, if the access request was write, the user can enter data file modifications for the cached data file, and the translator 52 would monitor the modifications, and automatically and on an ongoing basis, update the cached data file stored in the storage module 56 to incorporate such modifications.
  • If the determination in step 184 is that the existing lease is not strong enough in relation to the access request, in step 188 the leasing module 54 causes the cache manager 50 to transmit a new lease request to the cache server 36. Referring to Table 1, an existing lease is not strong enough if the intersection of the access request and the existing lease is a NO, e.g., the access request is write and the existing lease for the data file at the storage cache is read. Based on the new lease request, the cache server 36 performs a process that is the same or substantially similar to the process 150, as described above, to determine whether a lease can be granted. After the leasing module 64 determines whether and what type of lease can be granted, the server manager 60 transmits this information to the storage cache 30A.
  • In step 190, the leasing module 54 receives and processes the response to the lease request transmitted by the cache server 36 to determine whether a lease has been granted. If yes, in step 192, the cache manager 50 and translator 52 of the storage cache 30A perform a process, such as the process 120 described above, to update the corresponding cached data file in the storage module 56. The cache manager 50 then performs step 186.
  • If the determination in step 190 is that a lease has not been granted, in step 194 the leasing module 54 determines whether the access request was read. If yes, steps 192 and 186 are performed as described above, except that in step 186 read access to the cached data file is provided.
  • If the determination in step 194 is no, in step 196 the leasing module 54 prevents the cached data file from being accessed by the workstation 22A. This outcome ensures data file coherence and consistency throughout the network 10. Step 196 is performed where the access request was write and another read or write lease for the data file existed at another storage cache associated with the distributed file system, such as the storage cache 30B.
  • FIG. 7 is a high level flow process 200 illustrating data processing operations performed by a storage cache and cache server, in accordance with the present invention, for updating a data file stored at a file server after a storage cache that has obtained a lease for the data file no longer needs to maintain the lease active. For purposes of highlighting the features of the process 200, it is assumed that only the workstation 22A of the remote system 16A previously obtained read or write access to the data file and the workstation 22A closed the accessed cached data file, which it had been viewing or modifying on its operating system and which corresponds to the data file for which the storage cache 30A holds a write lease or a read lease.
  • Referring to FIG. 7, in step 202, the cache manager 50 monitors data transmissions between the translator 52 and the workstation 22A to determine when the workstation 22A has closed the cached data file. After the cache manager 50 determines that the cached data file has been closed, in step 204 the translator 52 determines whether the workstation 22A modified the cached data file. If yes, in step 206 the translator 206 and the cache manager 50 perform a file update process, preferably including differencing data processing similar to that described in the process 100, to update the data file stored at the file server which corresponds the cached data file that was closed by the workstation 22A.
  • Following step 206, the leasing module 54, which also received the transmission indicating that the cached data file was closed, in step 208 causes the cache manager 50 to transmit a release lease signal for the data file to the cache server 36. Further in step 208, at the cache server 36, the leasing module 64, upon receipt of the release lease signal, resets its memory concerning the data file. If a write lease was released, the reset provides that another storage cache, such as the storage cache 30B, can obtain write lease access to the data file.
  • Referring again to step 204, if the determination is that the workstation 22A did not modify the cached data file while the workstation had access to the cached data file, then in step 210 the leasing module 54 determines whether the storage cache 30A holds a lease for the corresponding data file. If yes, which means that the cache 30A had a read lease for the data file, the leasing module 54 performs step 208. Alternatively, if the storage cache 30A did not have a lease for the data file, no further action is taken because the workstation 22A that opened the file was a reader, i.e., could only read the file, and another storage cache, such as the storage cache 30B, had obtained write access rights for an associated workstation.
  • Thus, the inventive storage caching system manages data files of a distributed file system to make them available for coherent and consistent shared real time access by multiple remote systems. The data files can be accessed by users, who may be located at different remote locations, and are presented to the users in the form of a cached data file or a copy of the data file currently stored at the file server, each of which includes all previous modifications so as to constitute a real time, updated version of the data file. The preferred transmission of file update data and data files between a storage cache and a cache server as compressed, streaming data provides that a user at a workstation experiences substantially LAN-speed access to a data file, although the data file may be physically stored at a file server located remotely from the workstation.
  • In accordance with a preferred embodiment of the present inventive storage caching protocol system including a leasing protocol, a workstation associated with a storage cache can access data files stored at multiple file servers. FIG. 8 is a system diagram of a network 310 including a preferred storage caching protocol system 312 which operates to manage access to shared real time data files which are stored at multiple file servers and to maintain data file coherence and consistency in the network 310 in accordance with the present invention. Referring to FIG. 8, the system 312 includes a plurality of cache servers 336A, 336B and 336C, which are respectively coupled to associated data center systems 320A, 320B and 320C, and also storage caches 30A and 30B, which are respectively coupled to the remote systems 16A and 16B in the same manner as described above for the network 10. For purposes of illustration, each of the data centers 320 is constructed and functions in the same or substantially the same manner as the data center 20 in the network 10. For example, the data center 320A includes an Ethernet 324A which couples workstations 322A and 322B and a file server 338A to the cache server 336A and a gateway 326A, and the gateway 326A is coupled to the communications network 28.
  • Referring to FIG. 8, each of the storage caches 30 can communicate with any of the cache servers 336, which are likely located at different remote locations, and vice versa. In addition, the cache servers 336 can communicate with respective associated file servers 338 for retrieving copies of and updating data files that are the subject of access requests from any of the storage caches 30, in accordance with the inventive storage caching protocol. Advantageously, the inventive cache server has a software infrastructure to act as a client for standard LAN file sharing protocols (NFS and CIFS), which makes it readily configurable to retrieve copies of a data file from or replace data files stored at any of the file servers 338 in the network 310, where each of the file servers 338 can have any operating system format. In addition, the cache server can also access files from and replace files on a local file system using standard filesystem APIs.
  • In operation of the system 312, when a workstation desires to access a data file for read or write purposes, and the inventive storage caching system correctly multiplexes an access request to the appropriate cache server, the location from which a copy the data file is presented to the user is unknown to the user at the workstation. In other words, a user can access and operate on a sharable data file without knowing, being concerned with or ascertaining which data source system physically contains the data file.
  • In a further preferred embodiment, each storage cache or server cache can be constructed to operate as both a storage cache and cache server. Thus, a single combination storage cache and cache server appliance can be associated with a remote computer system or a data center computer system. The user at a workstation of an associated remote system would not be aware that, in some circumstances, the storage cache communicates with a cache server that is within the same appliance.
  • Further, the inventive storage caching protocol system provides tremendous flexibility in the allocation and sharing of file server and memory resources, as storage caches and cache servers can serve as simple building blocks for implementing very sophisticated topologies, such as cliques where every cache/server combination is connected with every other cache/server combination in the network.
  • In a further preferred embodiment, the inventive storage caching system including the leasing protocol uses the cached data file stored at a storage cache and being modified by entries by the workstation, the version of the data file stored at the storage of the cache server or the data file stored at the file server to update a data file or a cached data file and maintain data file coherency and consistency in a network in the event of (i) a disconnection of a communication link established between a cache server and a storage cache, (ii) a failure of either the cache server or the storage cache, or (iii) an unexpected reboot of a workstation. Significantly, additional data for tracking file update status is not required. FIG. 9 is a system diagram of a network 410 including a further preferred embodiment of a storage caching protocol system 412 which manages shared access to real time data files while maintaining data file coherence and consistency and also backing up data files in accordance with the present invention. Referring to FIG. 9, the system 412 includes cache servers 436A and 436B, which are respectively coupled to associated data center systems 420A and 420B, and a storage cache 30A which is coupled to the remote system 16A in the same manner as in the network 10. For purposes of illustration, each of the data center systems 420 is constructed and operates in substantially the same manner as the data center 20 of the network 10. To highlight the back-up protocol features, it is assumed that a communications link between the storage cache 30A and the cache server 436A has been established for transmitting file update data to the cache server 436 based on modifications being made to a cached data file at the storage cache 30A, where the cached data file corresponds to a data file stored at the file server 438A, and where the file server 438A, which is a primary data file storage facility for the system 410 and the file server 438B is the back-up storage facility.
  • Referring to FIG. 9, the backing-up of data files in accordance with the present invention is initiated when the cache manager 50 of the storage cache 30A detects, for example, a network communication failure at the gateway 26. In turn, the cache manager 50 automatically and periodically attempts to reestablish a communications link to the cache server 436A. The storage cache 30A also continues to operate without interruption, i.e., continues to monitor modifications to the cached data file entered by a workstation and stores only the current version of the cached data file, incorporating the modifications, in the storage 56.
  • Further, the cache manager 50 simultaneously attempts to establish a communications link with a back-up data center, such as the data center 420B, via the cache server 436B, as the cache servers 436A and 436B have different and unique IP routing addresses. If this back-up link can be established, the storage cache 30A proceeds to perform the process 100 for updating a back-up copy of the data file stored at the file server 420B. In other words, the storage cache 30A continues the process of updating of the data file at the cache server 436B at the point where the disconnection to the cache server 436A occurred, assuming the data centers are mirrored.
  • When a connection is re-established to the cache server 436A following a disconnection, the storage cache 30A resumes the process for updating the data file by performing, for example, the steps 106, 108, 110, 112 and 114 of the process 100. In other words, a checksum representing the version of the data file existing at the cache server 436A or the file server 438A at the time the disconnection occurred is used to compute the difference data in step 110. Therefore, the storage cache 30A effectively always maintains the file update data, because only a current version of the cached data file is stored and this current cached data file is used to update the version of the data file at the file server based on the checksum transmitted from the cache server. Thus, as the cached data file continues to be updated and is used to update the data file currently stored at the file server, the storage caching protocol system advantageously provides that the exact status of updating of the data file prior to the disconnection need not be tracked or known.
  • Consequently, the storage cache can interact with multiple cache severs and easily can establish a communications link with the cache server of a back-up data center, should a communications link to the cache server of the primary data center fail. The end user at a workstation, however, does not experience or realize the disruption to the communications link when the primary data center fails, while attempts are made to re-establish a link to the primary data center or to a new link to the back-up data center and when the link is finally re-established to the primary data center. The previous state of the data file is automatically restored from the memory in a storage cache or cache server to ensure that coherency is always maintained and pending write-back data is not lost in the case of reboots or system restarts.
  • In a preferred embodiment of the inventive storage caching system including the leasing protocol, a combination of streaming (for read-ahead), compression and differencing for better channel utilization is performed to make a cache hit extremely likely, enable substantial write behind and make a cache miss as efficient as possible.
  • In a further preferred embodiment, the storage cache can attempt to establish communication links at multiple IP addresses for the same data center on different carriers when a network failure is experienced.
  • In another preferred embodiment, after a failure occurs at a storage cache, the failed storage cache is simply replaced and the new storage cache promptly establishes a connection with the cache server at the remote data center and immediately resumes caching and updating in accordance with the processes 100 and 120.
  • Although preferred embodiments of the present invention have been described and illustrated, it will be apparent to those skilled in the art that various modifications may be made without departing from the principles of the invention.

Claims (35)

1-22. (canceled)
23. A method for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the method comprising:
receiving, at a first storage cache, a copy of a data file retrieved from the file server by a cache server for reading or updating, wherein the first storage cache is operative to associate with a plurality of first authorized computer workstations and store the copy of the data file as a cached data file;
at the first storage cache, incorporating data file modifications entered by any of the first workstations into the cached data file as the modifications are entered, such that the cached data file is a current version;
automatically transmitting file update data from the first storage cache to the cache server, wherein the file update data is a function of the modifications incorporated into the cached data file which make the cached data file the current version.
24. The method of claim 23, wherein the file update data is transmitted as streaming data to the cache server.
25. The method of claim 23, further comprising:
compressing the file update data prior to transmission to the cache server.
26. The method of claim 23, wherein the file update data includes difference data, wherein the difference data represents the difference between the cache data file at the first storage cache and the version of the data file currently stored at the file server or the cache server.
27. The method of claim 23, wherein the cache server includes a plurality of cache servers and wherein a replacement version of the data file is generated at least one of the cache servers.
28. The method of claim 27, wherein when a communications connection between a first of the cache servers and the first storage cache fails, the first storage cache automatically attempts to establish a communications connection with at least one of the first cache server and a second of the cache servers.
29. The method of claim 23, wherein the file update data is automatically transmitted to the cache server at predetermined intervals.
30. The method of claim 23 further comprising
receiving an access request for the data file from one of the first workstations; and
transmitting a message to the cache server operative to cause the cache server to retrieve the data file from the file server.
31. The method of claim 23 further comprising
transmitting to the cache server a file transfer request identifying the data file;
receiving at the storage cache a checksum corresponding to the data file stored in the file server; and
computing, using the checksum, the file update data representing one or more differences between the cached data file and the data file stored at the file server represented by checksum.
32. Logic for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the logic encoded in one or more media for execution and when executed operable to:
receive, at a first storage cache, a copy of a data file retrieved from the file server by a cache server for reading or updating,
associate with a plurality of first authorized computer workstations,
store the copy of the data file, at the first storage cache, as a cached data file;
incorporate, at the first storage cache, data file modifications entered by any of the first workstations into the cached data file as the modifications are entered, such that the cached data file is a current version;
automatically transmit file update data from the first storage cache to the cache server, wherein the file update data is a function of the modifications incorporated into the cached data file which make the cached data file the current version.
33. The logic of claim 32, wherein the file update data is transmitted as streaming data to the cache server.
34. The logic of claim 32, wherein the logic is further operable to compress the file update data prior to transmission to the cache server.
35. The logic of claim 32, wherein the file update data includes difference data, wherein the difference data represents the difference between the cache data file at the first storage cache and the version of the data file currently stored at the file server or the cache server.
36. The logic of claim 32, wherein the file update data is automatically transmitted to the cache server at predetermined intervals.
37. The logic of claim 32, wherein the logic is further operable to
receive an access request for the data file from one of the first workstations; and
transmit a message to the cache server operative to cause the cache server to retrieve the data file from the file server.
38. The logic of claim 32, wherein the logic is further operable to
transmit to the cache server a file transfer request identifying the data file;
receive at the storage cache a checksum corresponding to the data file stored in the file server; and
compute, using the checksum, the file update data representing one or more differences between the cached data file and the data file stored at the file server represented by checksum.
39. A method for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the method comprising:
receiving, at a first storage cache, file update data from a cache server in response to a workstation request for access to a data file which is stored at a file server associated with the cache server, wherein the first storage cache is operative to associate with a plurality of first authorized workstations, and wherein the file update data is a function of differences between the data file as currently stored at the file server or the cache server and a cached data file stored at the first storage cache and corresponding to the data file;
incorporating the file update data into the cached data file at the first storage cache such that the cached data file is updated to be the same as the data file currently stored at the file server or the cache server;
at the first storage cache, incorporating data file modifications entered by any of the first workstations into the cached data file as the modifications are entered, such that the cached data file is a current version; and
automatically transmitting file update data from the first storage cache to the cache server, wherein the file update data is a function of the modifications incorporated into the cached data file which make the cached data file the current version.
40. The method of claim 39, wherein the file update data is received as streaming data at the first storage cache.
41. The method of claim 39, wherein the file update data is compressed.
42. The method of claim 39, wherein the automatically transmitting step is performed at predetermined intervals.
43. The method of claim 39, further comprising
receiving an access request identifying a data file stored as a cached data file at the storage cache; and
transmitting a checksum representing the cached data file to the cache server;
wherein the file update data represents one or more differences between the data file stored at the file server and the cached data file stored represented by checksum.
44. The method according to claim 39 further comprising
receiving, at a storage cache, an access request from a workstation identifying a data file stored at a file server;
conditionally transmitting a least request from the storage cache to a cache server, if the storage cache does not have an appropriate existing lease to the data file;
if the lease request is granted, then performing the receiving step;
if the lease request is denied, then performing a distributed differencing process with the cache server to update the cached data file at the storage cache, and allowing the workstation access to the cached data file.
45. Logic for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the logic encoded in one or more media for execution and when executed operable to:
establish a communications connection with a cache server,
associate with a plurality of workstations and incorporate data file modifications entered by any of the corresponding associated workstations into a cached data file as the modifications are entered, such that the cached data file is a current version;
automatically transmit file update data to the cache server, wherein the file update data is a function of the modifications incorporated into the cached data file which make the cached data file the current version,
control whether a request for access to a data file from an associated workstation should be granted or denied, wherein the access request is a request to read or write a data file stored at the file server, and
wherein, following receipt of the request, the logic is operable to:
determine a lease condition for the data file existing at the storage cache, wherein the lease condition is one of read, write and no lease;
grant the request if the request is read and the existing lease is read or write, or if the request is write and the lease condition is write;
request a new lease from the cache server if the request is read and the lease condition is no lease, or if the request is write and the lease condition is read or no lease, and
automatically update the cached data file at the storage cache based on the current version of the data file stored at the file server, if a lease is granted or the request is a read.
46. The logic of claim 45, wherein the logic is operable to, when a cached data file corresponding to a data file stored at the file server is no longer opened at a workstation, determine whether the cached data file copy was modified based on entries made by the workstation while the cached data file was open;
if the data file copy was modified, automatically transmit file update data to the cache server; and
release any lease for the data file.
47. A method for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the method comprising:
supplying to a first storage cache a copy of a data file retrieved from the file server by a cache server for reading or updating, wherein the first storage cache is operative to associate with a plurality of first authorized computer workstations, store the copy of the data file as a cached data file, and incorporate data file modifications entered by any of the first workstations into the cached data file as the modifications are entered, such that the cached data file is a current version;
receiving file update data automatically transmitted from the first storage cache to the cache server, wherein the file update data is a function of the modifications incorporated into the cached data file which make the cached data file the current version; and
at the cache server, generating a replacement version of the data file stored at the file server based on the file update data.
48. The method of claim 47 further comprising:
at the cache server, if the file is accessed for updating by the first storage cache, protecting the data file stored at the file server from updates from other storage caches until all file update data from the first storage cache has been incorporated into the replacement version of the data file and the replacement version has replaced the data file stored at the file server.
49. The method of claim 48, wherein the protecting comprises protecting the data file stored at the file server from updates from other storage caches while the file update data from the first storage cache is transmitted to the cache server.
50. The method of claim 47 further comprising:
replacing the data file stored at the file server with the replacement version of the data file; and
responding to a request for access to the data file subsequently transmitted to the cache server from at least one of a second storage cache and an authorized computer workstation using the replacement version of the data file.
51. The method of claim 47, wherein the file update data is transmitted as streaming data to the cache server.
52. The method of claim 47, wherein the file update data transmitted to the cache server is compressed.
53. The method of claim 47, wherein the file update data includes difference data, wherein the difference data represents the difference between the cached data file at the first storage cache and the version of the data file currently stored at the file server or the cache server.
54. The method of claim 47, wherein the file update data is automatically transmitted to the cache server at predetermined intervals.
55. Logic for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the logic encoded in one or more media for execution and when executed operable to:
supply to a first storage cache a copy of a data file retrieved from the file server by a cache server for reading or updating, wherein the first storage cache is operative to associate with a plurality of first authorized computer workstations, store the copy of the data file as a cached data file, and incorporate data file modifications entered by any of the first workstations into the cached data file as the modifications are entered, such that the cached data file is a current version;
receive file update data automatically transmitted from the first storage cache to the cache server, wherein the file update data is a function of the modifications incorporated into the cached data file which make the cached data file the current version; and
at the cache server, generate a replacement version of the data file stored at the file server based on the file update data.
56. Logic for managing shared access to data files stored in a file server by a plurality of authorized computer workstations, the logic encoded in one or more media for execution and when executed operable to:
associate with a file server and one or more storage caches, wherein each of the storage caches is operative to associate with a plurality of workstations and incorporate data file modifications entered by any of the corresponding associated workstations into a cached data file as the modifications are entered, such that the cached data file is a current version;
receive file update data automatically transmitted by one of more storage caches, wherein the file update data is a function of the modifications incorporated into the cached data file at the storage cache which make the cached data file the current version; and
wherein logic includes leasing logic, wherein the leasing logic is operable to decide whether to grant or deny a request for a lease for a data file received from a first of the storage caches based on whether, and what type of, lease already exists for the data file or whether the data file is already locked, wherein the decision is made in accordance with criteria that a write lease cannot be granted if a read lease already exists, only a reader right can be granted if a write lease already exists and an additional read lease can be granted if a read lease already exists; and
automatically update the cached data file at the first storage cache if a reader right or a read lease is granted.
US11/496,032 2003-01-17 2006-07-26 Method and system for use of storage caching with a distributed file system Abandoned US20070198685A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/496,032 US20070198685A1 (en) 2003-01-17 2006-07-26 Method and system for use of storage caching with a distributed file system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US44075003P 2003-01-17 2003-01-17
US10/756,986 US7103617B2 (en) 2003-01-17 2004-01-13 Method and system for use of storage caching with a distributed file system
US11/496,032 US20070198685A1 (en) 2003-01-17 2006-07-26 Method and system for use of storage caching with a distributed file system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/756,986 Continuation US7103617B2 (en) 2003-01-17 2004-01-13 Method and system for use of storage caching with a distributed file system

Publications (1)

Publication Number Publication Date
US20070198685A1 true US20070198685A1 (en) 2007-08-23

Family

ID=32825140

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/756,986 Expired - Lifetime US7103617B2 (en) 2003-01-17 2004-01-13 Method and system for use of storage caching with a distributed file system
US11/496,032 Abandoned US20070198685A1 (en) 2003-01-17 2006-07-26 Method and system for use of storage caching with a distributed file system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/756,986 Expired - Lifetime US7103617B2 (en) 2003-01-17 2004-01-13 Method and system for use of storage caching with a distributed file system

Country Status (7)

Country Link
US (2) US7103617B2 (en)
EP (1) EP1584036A4 (en)
JP (1) JP2006516341A (en)
CN (1) CN1754155A (en)
AU (1) AU2004207357A1 (en)
CA (1) CA2513503A1 (en)
WO (1) WO2004068469A2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070008940A1 (en) * 2005-06-21 2007-01-11 Gideon Eden Instrumentation network data system
US20070239791A1 (en) * 2006-03-28 2007-10-11 Sun Microsystems, Inc. Systems and methods for a distributed cache
US20070288835A1 (en) * 2006-06-07 2007-12-13 Fuji Xerox Co., Ltd. Apparatus, computer readable medium, data signal, and method for document management
US20070300068A1 (en) * 2006-06-21 2007-12-27 Rudelic John C Method and apparatus for flash updates with secure flash
US20100180015A1 (en) * 2009-01-15 2010-07-15 Microsoft Corporation Performing configuration in a multimachine environment
US20100241711A1 (en) * 2006-12-29 2010-09-23 Prodea Systems, Inc. File sharing through multi-services gateway device at user premises
US7921179B1 (en) * 2008-01-15 2011-04-05 Net App, Inc. Reducing latency of access requests in distributed storage systems having a shared data set
US20110161754A1 (en) * 2009-12-29 2011-06-30 Cleversafe, Inc. Revision synchronization of a dispersed storage network
US20120303680A1 (en) * 2011-05-23 2012-11-29 Ilefay Technology Goup, LLC Method for the preemptive creation of binary delta information within a computer network
US8458295B1 (en) * 2005-11-14 2013-06-04 Sprint Communications Company L.P. Web content distribution devices to stage network device software
US8849940B1 (en) * 2007-12-14 2014-09-30 Blue Coat Systems, Inc. Wide area network file system with low latency write command processing
CN105338026A (en) * 2014-07-24 2016-02-17 阿里巴巴集团控股有限公司 Data resource acquisition method, device and system
CN110247937A (en) * 2018-03-07 2019-09-17 中移(苏州)软件技术有限公司 The management of elastic storage systems share files, access method and relevant device

Families Citing this family (132)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08263438A (en) * 1994-11-23 1996-10-11 Xerox Corp Distribution and use control system of digital work and access control method to digital work
US20020161860A1 (en) * 2001-02-28 2002-10-31 Benjamin Godlin Method and system for differential distributed data file storage, management and access
US8364815B2 (en) * 2005-03-18 2013-01-29 Riverbed Technology, Inc. Reliability and availability of distributed servers
JP2006516341A (en) * 2003-01-17 2006-06-29 タシット ネットワークス,インク. Method and system for storage caching with distributed file system
JP4131514B2 (en) * 2003-04-21 2008-08-13 インターナショナル・ビジネス・マシーンズ・コーポレーション Network system, server, data processing method and program
US7409389B2 (en) * 2003-04-29 2008-08-05 International Business Machines Corporation Managing access to objects of a computing environment
US7480699B2 (en) * 2004-01-20 2009-01-20 International Business Machines Corporation System and method for replacing an application on a server
US7702678B2 (en) * 2004-03-12 2010-04-20 Microsoft Corporation Search capture
US20050216837A1 (en) * 2004-03-12 2005-09-29 Onfolio, Inc. Unread-state management
US20050240489A1 (en) * 2004-03-12 2005-10-27 Onfolio, Inc. Retaining custom item order
US20050216886A1 (en) * 2004-03-12 2005-09-29 Onfolio, Inc. Editing multi-layer documents
US20050216825A1 (en) * 2004-03-12 2005-09-29 Onfolio, Inc. Local storage of script-containing content
US7434107B2 (en) * 2004-07-19 2008-10-07 Dell Products L.P. Cluster network having multiple server nodes
US8005710B2 (en) * 2004-09-28 2011-08-23 Microsoft Corporation Methods and systems for caching and synchronizing project data
CN100336343C (en) * 2004-10-10 2007-09-05 中兴通讯股份有限公司 Method for keeping multiple data copy consistency in distributed system
US8233594B2 (en) 2005-02-07 2012-07-31 Avaya Inc. Caching message information in an integrated communication system
US7808980B2 (en) 2005-02-07 2010-10-05 Avaya Inc. Integrated multi-media communication system
US8059793B2 (en) 2005-02-07 2011-11-15 Avaya Inc. System and method for voicemail privacy
US7330537B2 (en) 2005-02-07 2008-02-12 Adomo, Inc. Integrating messaging server directory service with a communication system voice mail message interface
US7724880B2 (en) 2005-02-07 2010-05-25 Avaya Inc. Networked voicemail
US8559605B2 (en) 2005-02-07 2013-10-15 Avaya Inc. Extensible diagnostic tool
US7321655B2 (en) 2005-02-07 2008-01-22 Adomo, Inc. Caching user information in an integrated communication system
US8175233B2 (en) * 2005-02-07 2012-05-08 Avaya Inc. Distributed cache system
JP4332126B2 (en) 2005-03-24 2009-09-16 富士通株式会社 Caching control program, caching control device, and caching control method
JP4756914B2 (en) * 2005-05-30 2011-08-24 キヤノン株式会社 Remote cooperative work support system and control method thereof
US20070100902A1 (en) * 2005-10-27 2007-05-03 Dinesh Sinha Two way incremental dynamic application data synchronization
KR100825721B1 (en) 2005-12-08 2008-04-29 한국전자통신연구원 System and method of time-based cache coherency maintenance in user file manager of object-based storage system
US8122070B1 (en) * 2005-12-29 2012-02-21 United States Automobile Association (USAA) Document management system user interfaces
US8577940B2 (en) * 2006-03-20 2013-11-05 Parallels IP Holdings GmbH Managing computer file system using file system trees
US8463843B2 (en) * 2006-05-26 2013-06-11 Riverbed Technology, Inc. Throttling of predictive ACKs in an accelerated network communication system
CN101127625B (en) * 2006-08-18 2013-11-06 华为技术有限公司 A system and method for authorizing access request
US9141627B2 (en) * 2006-09-26 2015-09-22 Sony Corporation Providing a user access to data files distributed in a plurality of different types of user devices
US9489456B1 (en) 2006-11-17 2016-11-08 Blue Coat Systems, Inc. Previewing file information over a network
JP4763587B2 (en) * 2006-12-11 2011-08-31 株式会社ソニー・コンピュータエンタテインメント Cache server, cache server control method, program, and information storage medium
GB0625330D0 (en) * 2006-12-20 2007-01-24 Ibm System,method and computer program product for managing data using a write-back cache unit
US20080177907A1 (en) * 2007-01-23 2008-07-24 Paul Boerger Method and system of a peripheral port of a server system
US8064576B2 (en) 2007-02-21 2011-11-22 Avaya Inc. Voicemail filtering and transcription
US8107598B2 (en) 2007-02-21 2012-01-31 Avaya Inc. Voicemail filtering and transcription
US8160212B2 (en) 2007-02-21 2012-04-17 Avaya Inc. Voicemail filtering and transcription
US20080243847A1 (en) * 2007-04-02 2008-10-02 Microsoft Corporation Separating central locking services from distributed data fulfillment services in a storage system
US8433693B2 (en) * 2007-04-02 2013-04-30 Microsoft Corporation Locking semantics for a storage system based on file types
US20080270480A1 (en) * 2007-04-26 2008-10-30 Hanes David H Method and system of deleting files from a remote server
US8005993B2 (en) 2007-04-30 2011-08-23 Hewlett-Packard Development Company, L.P. System and method of a storage expansion unit for a network attached storage device
US8488751B2 (en) 2007-05-11 2013-07-16 Avaya Inc. Unified messenging system and method
US20090037915A1 (en) * 2007-07-31 2009-02-05 Rothman Michael A Staging block-based transactions
CN101146127B (en) * 2007-10-30 2010-06-09 金蝶软件(中国)有限公司 A client buffer update method and device in distributed system
US8458127B1 (en) 2007-12-28 2013-06-04 Blue Coat Systems, Inc. Application data synchronization
CN101470645B (en) * 2007-12-29 2012-04-25 华为技术有限公司 High-speed cache data recovery method and apparatus
CN101499073B (en) * 2008-01-29 2011-10-12 国际商业机器公司 Continuous storage data storing and managing method and system based on access frequency
EP2111011A1 (en) 2008-04-16 2009-10-21 Thomson Telecom Belgium Device and method for sharing files
US8484162B2 (en) 2008-06-24 2013-07-09 Commvault Systems, Inc. De-duplication systems and methods for application-specific data
US8171111B1 (en) * 2008-08-07 2012-05-01 United Services Automobile Association (Usaa) Systems and methods for non-specific address routing
US8706878B1 (en) 2008-08-21 2014-04-22 United Services Automobile Association Preferential loading in data centers
US8185566B2 (en) 2009-01-15 2012-05-22 Microsoft Corporation Client-based caching of remote files
US8799409B2 (en) * 2009-01-15 2014-08-05 Ebay Inc. Server side data cache system
US8930306B1 (en) 2009-07-08 2015-01-06 Commvault Systems, Inc. Synchronized data deduplication
CN101997902B (en) * 2009-08-28 2015-07-22 云端容灾有限公司 Remote on-line backup system and method based on posthouse segmentation transmission
EP2363795A1 (en) * 2010-03-05 2011-09-07 Sven Dunker A method and a system for providing a user with a virtual external storage
US8589553B2 (en) * 2010-09-17 2013-11-19 Microsoft Corporation Directory leasing
US8364652B2 (en) 2010-09-30 2013-01-29 Commvault Systems, Inc. Content aligned block-based deduplication
US8572340B2 (en) 2010-09-30 2013-10-29 Commvault Systems, Inc. Systems and methods for retaining and using data block signatures in data protection operations
CN101980198B (en) * 2010-11-01 2012-06-06 福州星网视易信息系统有限公司 Method for carrying karaoke
US9020900B2 (en) 2010-12-14 2015-04-28 Commvault Systems, Inc. Distributed deduplicated storage system
US20120150818A1 (en) 2010-12-14 2012-06-14 Commvault Systems, Inc. Client-side repository in a networked deduplicated storage system
WO2012101855A1 (en) * 2011-01-28 2012-08-02 株式会社日立製作所 Communication system, communication device, and communication control method
CN102333108A (en) * 2011-03-18 2012-01-25 北京神州数码思特奇信息技术股份有限公司 Distributed cache synchronization system and method
CN102694828B (en) * 2011-03-23 2016-03-30 中兴通讯股份有限公司 A kind of method of distributed cache system data access and device
CN102317901B (en) * 2011-07-25 2013-09-11 华为技术有限公司 Methods for object adjustment and devices for remove control, node and storage system
CN102325169A (en) * 2011-08-22 2012-01-18 盛乐信息技术(上海)有限公司 Network file system supporting sharing and cooperation and method thereof
KR101175505B1 (en) * 2011-10-06 2012-08-20 한화에스앤씨주식회사 System for providing user data storage enviroment using network based file system in n-screen
US9122535B2 (en) * 2011-11-22 2015-09-01 Netapp, Inc. Optimizing distributed data analytics for shared storage
US9110807B2 (en) * 2012-05-23 2015-08-18 Sybase, Inc. Cache conflict detection
US20130339310A1 (en) 2012-06-13 2013-12-19 Commvault Systems, Inc. Restore using a client side signature repository in a networked storage system
GB2503266A (en) * 2012-06-21 2013-12-25 Ibm Sharing aggregated cache hit and miss data in a storage area network
AT513242B1 (en) * 2012-07-02 2018-07-15 Frequentis Ag Method for synchronizing data in a computer network
US10095663B2 (en) 2012-11-14 2018-10-09 Amazon Technologies, Inc. Delivery and display of page previews during page retrieval events
US9665591B2 (en) 2013-01-11 2017-05-30 Commvault Systems, Inc. High availability distributed deduplicated storage system
US9418072B2 (en) 2013-03-04 2016-08-16 Vmware, Inc. Cross-file differential content synchronization
WO2014137938A1 (en) * 2013-03-04 2014-09-12 Vmware, Inc. Cross-file differential content synchronization
US9355116B2 (en) 2013-03-04 2016-05-31 Vmware, Inc. Cross-file differential content synchronization using cached patches
CN103136080B (en) * 2013-03-12 2016-07-13 青岛中星微电子有限公司 The method of testing of a kind of cache lock function and device
US9514007B2 (en) * 2013-03-15 2016-12-06 Amazon Technologies, Inc. Database system with database engine and separate distributed storage service
US9460296B2 (en) 2013-07-19 2016-10-04 Appsense Limited Systems, methods and media for selective decryption of files containing sensitive data
US9398111B1 (en) * 2013-08-30 2016-07-19 hopTo Inc. File caching upon disconnection
TWI502372B (en) * 2013-09-27 2015-10-01 Acer Inc Network storage system and method for caching file
US20160253241A1 (en) * 2013-10-28 2016-09-01 Longsand Limited Instant streaming of the latest version of a file
JP6244916B2 (en) * 2014-01-06 2017-12-13 富士通株式会社 Arithmetic processing apparatus, control method for arithmetic processing apparatus, and information processing apparatus
US9633056B2 (en) 2014-03-17 2017-04-25 Commvault Systems, Inc. Maintaining a deduplication database
US10380072B2 (en) 2014-03-17 2019-08-13 Commvault Systems, Inc. Managing deletions from a deduplication database
US9563929B1 (en) 2014-05-22 2017-02-07 Amazon Technologies, Inc. Caching of content page layers
US9922007B1 (en) 2014-05-22 2018-03-20 Amazon Technologies, Inc. Split browser architecture capable of determining whether to combine or split content layers based on the encoding of content within each layer
US11169666B1 (en) 2014-05-22 2021-11-09 Amazon Technologies, Inc. Distributed content browsing system using transferred hardware-independent graphics commands
US9563928B1 (en) 2014-05-22 2017-02-07 Amazon Technlogies, Inc. Bandwidth reduction through delivery of hardware-independent graphics commands for portions of content pages
US9720888B1 (en) 2014-05-22 2017-08-01 Amazon Technologies, Inc. Distributed browsing architecture for the delivery of graphics commands to user devices for assembling a plurality of layers of a content page
US10042521B1 (en) 2014-05-22 2018-08-07 Amazon Technologies, Inc. Emulation of control resources for use with converted content pages
WO2015180070A1 (en) * 2014-05-28 2015-12-03 北京大学深圳研究生院 Data caching method and device for distributed storage system
US9454515B1 (en) 2014-06-17 2016-09-27 Amazon Technologies, Inc. Content browser system using graphics commands and native text intelligence
US9373003B2 (en) * 2014-06-27 2016-06-21 Appsense Limited Systems and methods for automatically handling multiple levels of encryption and decryption
US11249858B2 (en) 2014-08-06 2022-02-15 Commvault Systems, Inc. Point-in-time backups of a production application made accessible over fibre channel and/or ISCSI as data sources to a remote application by representing the backups as pseudo-disks operating apart from the production application and its host
US9852026B2 (en) 2014-08-06 2017-12-26 Commvault Systems, Inc. Efficient application recovery in an information management system based on a pseudo-storage-device driver
US9575673B2 (en) 2014-10-29 2017-02-21 Commvault Systems, Inc. Accessing a file system using tiered deduplication
US10313243B2 (en) 2015-02-24 2019-06-04 Commvault Systems, Inc. Intelligent local management of data stream throttling in secondary-copy operations
US9740635B2 (en) * 2015-03-12 2017-08-22 Intel Corporation Computing method and apparatus associated with context-aware management of a file cache
US10339106B2 (en) 2015-04-09 2019-07-02 Commvault Systems, Inc. Highly reusable deduplication database after disaster recovery
US20160350391A1 (en) 2015-05-26 2016-12-01 Commvault Systems, Inc. Replication using deduplicated secondary copy data
US10248314B2 (en) * 2015-06-04 2019-04-02 Quest Software Inc. Migrate nickname cache for email systems and devices
US9766825B2 (en) 2015-07-22 2017-09-19 Commvault Systems, Inc. Browse and restore for block-level backups
US10298680B1 (en) 2015-09-23 2019-05-21 Cohesity, Inc. Dynamic throughput ingestion of backup sources
US10310953B2 (en) 2015-12-30 2019-06-04 Commvault Systems, Inc. System for redirecting requests after a secondary storage computing device failure
US10296368B2 (en) 2016-03-09 2019-05-21 Commvault Systems, Inc. Hypervisor-independent block-level live browse for access to backed up virtual machine (VM) data and hypervisor-free file-level recovery (block-level pseudo-mount)
CN105955983A (en) * 2016-04-18 2016-09-21 国电南瑞科技股份有限公司 Historical sampled data caching method for super-large scale power grid regulation and control system
CN105975521A (en) * 2016-04-28 2016-09-28 乐视控股(北京)有限公司 Stream data uploading method and device
US10846024B2 (en) 2016-05-16 2020-11-24 Commvault Systems, Inc. Global de-duplication of virtual disks in a storage platform
US10795577B2 (en) 2016-05-16 2020-10-06 Commvault Systems, Inc. De-duplication of client-side data cache for virtual disks
CN106352247B (en) * 2016-08-31 2018-05-01 哈尔滨圣昌科技开发有限公司 A kind of pipe network monitoring control system and the monitoring and control method realized using the system
CN107885752A (en) * 2016-09-30 2018-04-06 阿里巴巴集团控股有限公司 Data processing and querying method and device
CN106352243B (en) * 2016-10-20 2018-06-26 山东科技大学 A kind of gas pipeline leak detection systems based on sonic method
CN106802950A (en) * 2017-01-16 2017-06-06 郑州云海信息技术有限公司 A kind of method of distributed file system small documents write buffer optimization
US10303401B2 (en) 2017-01-26 2019-05-28 International Business Machines Corporation Data caching for block storage systems
US10387383B2 (en) 2017-02-15 2019-08-20 Google Llc Systems and methods for providing access to a data file stored at a data storage system
US10740193B2 (en) 2017-02-27 2020-08-11 Commvault Systems, Inc. Hypervisor-independent reference copies of virtual machine payload data based on block-level pseudo-mount
JP6891603B2 (en) * 2017-03-31 2021-06-18 日本電気株式会社 Backup system, storage device, data transfer method and program
US10664352B2 (en) 2017-06-14 2020-05-26 Commvault Systems, Inc. Live browsing of backed up data residing on cloned disks
US10976966B2 (en) * 2018-06-29 2021-04-13 Weka.IO Ltd. Implementing coherency and page cache support in a distributed way for files
US11010258B2 (en) 2018-11-27 2021-05-18 Commvault Systems, Inc. Generating backup copies through interoperability between components of a data storage management system and appliances for data storage and deduplication
US11698727B2 (en) 2018-12-14 2023-07-11 Commvault Systems, Inc. Performing secondary copy operations based on deduplication performance
US10922018B2 (en) * 2019-03-04 2021-02-16 Verizon Media Inc. System and method for latency aware data access
US20200327017A1 (en) 2019-04-10 2020-10-15 Commvault Systems, Inc. Restore using deduplicated secondary copy data
US11677624B2 (en) 2019-04-12 2023-06-13 Red Hat, Inc. Configuration of a server in view of a number of clients connected to the server
US11463264B2 (en) 2019-05-08 2022-10-04 Commvault Systems, Inc. Use of data block signatures for monitoring in an information management system
US20210173811A1 (en) 2019-12-04 2021-06-10 Commvault Systems, Inc. Optimizing the restoration of deduplicated data stored in multi-node replicated file systems
US11687424B2 (en) 2020-05-28 2023-06-27 Commvault Systems, Inc. Automated media agent state management

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452447A (en) * 1992-12-21 1995-09-19 Sun Microsystems, Inc. Method and apparatus for a caching file server
US5594863A (en) * 1995-06-26 1997-01-14 Novell, Inc. Method and apparatus for network file recovery
US5611049A (en) * 1992-06-03 1997-03-11 Pitts; William M. System for accessing distributed data cache channel at each network node to pass requests and data
US5634122A (en) * 1994-12-30 1997-05-27 International Business Machines Corporation System and method for multi-level token management for distributed file systems
US5689706A (en) * 1993-06-18 1997-11-18 Lucent Technologies Inc. Distributed systems with replicated files
US5706435A (en) * 1993-12-06 1998-01-06 Panasonic Technologies, Inc. System for maintaining data coherency in cache memory by periodically broadcasting a single invalidation report from server to clients
US5740370A (en) * 1996-03-27 1998-04-14 Clinton Battersby System for opening cache file associated with designated file of file server only if the file is not subject to being modified by different program
US5878218A (en) * 1997-03-17 1999-03-02 International Business Machines Corporation Method and system for creating and utilizing common caches for internetworks
US6012085A (en) * 1995-11-30 2000-01-04 Stampede Technolgies, Inc. Apparatus and method for increased data access in a network file object oriented caching system
US6049874A (en) * 1996-12-03 2000-04-11 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
US6085234A (en) * 1994-11-28 2000-07-04 Inca Technology, Inc. Remote file services network-infrastructure cache
US6119151A (en) * 1994-03-07 2000-09-12 International Business Machines Corp. System and method for efficient cache management in a distributed file system
US6122629A (en) * 1998-04-30 2000-09-19 Compaq Computer Corporation Filesystem data integrity in a single system image environment
US20010011300A1 (en) * 1992-06-03 2001-08-02 Pitts William Michael System for accessing distributed data cache channel at each network node to pass requests and data
US20010047482A1 (en) * 2000-01-20 2001-11-29 Harris Gordon J. Distributed storage resource management in a storage area network
US20010052058A1 (en) * 1999-02-23 2001-12-13 Richard S. Ohran Method and system for mirroring and archiving mass storage
US20020083111A1 (en) * 1989-09-08 2002-06-27 Auspex Systems, Inc. Parallel I/O network file server architecture
US6453404B1 (en) * 1999-05-27 2002-09-17 Microsoft Corporation Distributed data cache with memory allocation model
US6587921B2 (en) * 2001-05-07 2003-07-01 International Business Machines Corporation Method and apparatus for cache synchronization in a clustered environment
US20040260768A1 (en) * 2003-04-22 2004-12-23 Makio Mizuno Storage system
US6944676B1 (en) * 1997-06-24 2005-09-13 Transcore Link Logistics Corp. Information dissemination system and method with central and distributed caches
US7103617B2 (en) * 2003-01-17 2006-09-05 Tacit Networks, Inc. Method and system for use of storage caching with a distributed file system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5805809A (en) * 1995-04-26 1998-09-08 Shiva Corporation Installable performance accelerator for maintaining a local cache storing data residing on a server computer
US5864837A (en) * 1996-06-12 1999-01-26 Unisys Corporation Methods and apparatus for efficient caching in a distributed environment
US5717897A (en) * 1996-09-09 1998-02-10 Unisys Corporation System for coordinating coherency of cache memories of multiple host computers of a distributed information system
US6597956B1 (en) * 1999-08-23 2003-07-22 Terraspring, Inc. Method and apparatus for controlling an extensible computing system

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083111A1 (en) * 1989-09-08 2002-06-27 Auspex Systems, Inc. Parallel I/O network file server architecture
US5611049A (en) * 1992-06-03 1997-03-11 Pitts; William M. System for accessing distributed data cache channel at each network node to pass requests and data
US20010016896A1 (en) * 1992-06-03 2001-08-23 Pitts William Michael System for accessing distributed data cache channel at each network node to pass requests and data-
US6366952B2 (en) * 1992-06-03 2002-04-02 Network Caching Technology, L.L.C. Cache channel at network nodes to pass request and data or pointer to shared buffer
US20010011300A1 (en) * 1992-06-03 2001-08-02 Pitts William Michael System for accessing distributed data cache channel at each network node to pass requests and data
US6505241B2 (en) * 1992-06-03 2003-01-07 Network Caching Technology, L.L.C. Network intermediate node cache serving as proxy to client node to request missing data from server
US5452447A (en) * 1992-12-21 1995-09-19 Sun Microsystems, Inc. Method and apparatus for a caching file server
US5689706A (en) * 1993-06-18 1997-11-18 Lucent Technologies Inc. Distributed systems with replicated files
US5706435A (en) * 1993-12-06 1998-01-06 Panasonic Technologies, Inc. System for maintaining data coherency in cache memory by periodically broadcasting a single invalidation report from server to clients
US6119151A (en) * 1994-03-07 2000-09-12 International Business Machines Corp. System and method for efficient cache management in a distributed file system
US6085234A (en) * 1994-11-28 2000-07-04 Inca Technology, Inc. Remote file services network-infrastructure cache
US5634122A (en) * 1994-12-30 1997-05-27 International Business Machines Corporation System and method for multi-level token management for distributed file systems
US5594863A (en) * 1995-06-26 1997-01-14 Novell, Inc. Method and apparatus for network file recovery
US6012085A (en) * 1995-11-30 2000-01-04 Stampede Technolgies, Inc. Apparatus and method for increased data access in a network file object oriented caching system
US5740370A (en) * 1996-03-27 1998-04-14 Clinton Battersby System for opening cache file associated with designated file of file server only if the file is not subject to being modified by different program
US6049874A (en) * 1996-12-03 2000-04-11 Fairbanks Systems Group System and method for backing up computer files over a wide area computer network
US5878218A (en) * 1997-03-17 1999-03-02 International Business Machines Corporation Method and system for creating and utilizing common caches for internetworks
US6944676B1 (en) * 1997-06-24 2005-09-13 Transcore Link Logistics Corp. Information dissemination system and method with central and distributed caches
US6122629A (en) * 1998-04-30 2000-09-19 Compaq Computer Corporation Filesystem data integrity in a single system image environment
US20010052058A1 (en) * 1999-02-23 2001-12-13 Richard S. Ohran Method and system for mirroring and archiving mass storage
US6397307B2 (en) * 1999-02-23 2002-05-28 Legato Systems, Inc. Method and system for mirroring and archiving mass storage
US20020144068A1 (en) * 1999-02-23 2002-10-03 Ohran Richard S. Method and system for mirroring and archiving mass storage
US6609183B2 (en) * 1999-02-23 2003-08-19 Legato Systems, Inc. Method and system for mirroring and archiving mass storage
US6453404B1 (en) * 1999-05-27 2002-09-17 Microsoft Corporation Distributed data cache with memory allocation model
US20010047482A1 (en) * 2000-01-20 2001-11-29 Harris Gordon J. Distributed storage resource management in a storage area network
US6587921B2 (en) * 2001-05-07 2003-07-01 International Business Machines Corporation Method and apparatus for cache synchronization in a clustered environment
US7103617B2 (en) * 2003-01-17 2006-09-05 Tacit Networks, Inc. Method and system for use of storage caching with a distributed file system
US20040260768A1 (en) * 2003-04-22 2004-12-23 Makio Mizuno Storage system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070008940A1 (en) * 2005-06-21 2007-01-11 Gideon Eden Instrumentation network data system
US8458295B1 (en) * 2005-11-14 2013-06-04 Sprint Communications Company L.P. Web content distribution devices to stage network device software
US20070239791A1 (en) * 2006-03-28 2007-10-11 Sun Microsystems, Inc. Systems and methods for a distributed cache
US8117153B2 (en) * 2006-03-28 2012-02-14 Oracle America, Inc. Systems and methods for a distributed cache
US20070288835A1 (en) * 2006-06-07 2007-12-13 Fuji Xerox Co., Ltd. Apparatus, computer readable medium, data signal, and method for document management
US8001385B2 (en) * 2006-06-21 2011-08-16 Intel Corporation Method and apparatus for flash updates with secure flash
US20070300068A1 (en) * 2006-06-21 2007-12-27 Rudelic John C Method and apparatus for flash updates with secure flash
US8078688B2 (en) * 2006-12-29 2011-12-13 Prodea Systems, Inc. File sharing through multi-services gateway device at user premises
US20100241711A1 (en) * 2006-12-29 2010-09-23 Prodea Systems, Inc. File sharing through multi-services gateway device at user premises
US8849940B1 (en) * 2007-12-14 2014-09-30 Blue Coat Systems, Inc. Wide area network file system with low latency write command processing
US20110145189A1 (en) * 2008-01-15 2011-06-16 Cindy Zheng Reducing Latency of Access Requests in Distributed Storage Systems Having a Shared Data Set
US7921179B1 (en) * 2008-01-15 2011-04-05 Net App, Inc. Reducing latency of access requests in distributed storage systems having a shared data set
US8171100B2 (en) * 2008-01-15 2012-05-01 Netapp, Inc. Reducing latency of access requests in distributed storage systems having a shared data set
US8271623B2 (en) * 2009-01-15 2012-09-18 Microsoft Corporation Performing configuration in a multimachine environment
US20100180015A1 (en) * 2009-01-15 2010-07-15 Microsoft Corporation Performing configuration in a multimachine environment
US20110161754A1 (en) * 2009-12-29 2011-06-30 Cleversafe, Inc. Revision synchronization of a dispersed storage network
US9152489B2 (en) * 2009-12-29 2015-10-06 Cleversafe, Inc. Revision synchronization of a dispersed storage network
US20120303680A1 (en) * 2011-05-23 2012-11-29 Ilefay Technology Goup, LLC Method for the preemptive creation of binary delta information within a computer network
US20130185265A1 (en) * 2011-05-23 2013-07-18 Ilesfay Technology Group, LLC Method for horizontal scale delta encoding
US8996655B2 (en) * 2011-05-23 2015-03-31 Autodesk, Inc. Method for horizontal scale delta encoding
US8407315B2 (en) * 2011-05-23 2013-03-26 Ilesfay Technology Group, LLC Method for horizontal scale delta encoding
CN105338026A (en) * 2014-07-24 2016-02-17 阿里巴巴集团控股有限公司 Data resource acquisition method, device and system
CN110247937A (en) * 2018-03-07 2019-09-17 中移(苏州)软件技术有限公司 The management of elastic storage systems share files, access method and relevant device

Also Published As

Publication number Publication date
US7103617B2 (en) 2006-09-05
CN1754155A (en) 2006-03-29
AU2004207357A1 (en) 2004-08-12
CA2513503A1 (en) 2004-08-12
WO2004068469A3 (en) 2005-03-03
US20040186861A1 (en) 2004-09-23
JP2006516341A (en) 2006-06-29
WO2004068469A2 (en) 2004-08-12
EP1584036A2 (en) 2005-10-12
EP1584036A4 (en) 2008-06-18

Similar Documents

Publication Publication Date Title
US7103617B2 (en) Method and system for use of storage caching with a distributed file system
US10534681B2 (en) Clustered filesystems for mix of trusted and untrusted nodes
US7552223B1 (en) Apparatus and method for data consistency in a proxy cache
US8086581B2 (en) Method for managing lock resources in a distributed storage system
US9275058B2 (en) Relocation of metadata server with outstanding DMAPI requests
US7711788B2 (en) Double-proxy remote data access system
US6950833B2 (en) Clustered filesystem
US20030028514A1 (en) Extended attribute caching in clustered filesystem
US10042916B2 (en) System and method for storing data in clusters located remotely from each other
KR101150146B1 (en) System and method for managing cached objects using notification bonds
US7765329B2 (en) Messaging between heterogeneous clients of a storage area network
US20040210656A1 (en) Failsafe operation of storage area network
US20040139125A1 (en) Snapshot copy of data volume during data access
US20040143607A1 (en) Recovery and relocation of a distributed name service in a cluster filesystem
KR20070061120A (en) System and method of time-based cache coherency maintenance in user file manager of object-based storage system
US6687716B1 (en) File consistency protocols and methods for carrying out the protocols
US6633870B1 (en) Protocols for locking sharable files and methods for carrying out the protocols
US7734733B1 (en) WAFS disconnected-mode read-write access
Zhou Distributed File Systems
Coulouris et al. Distributed File Systems
Poornima et al. Achieving Coherent and Aggressive Caching in DFS, GlusterFS
Boyd et al. In-Flight Data Management for Distributed Storage Systems

Legal Events

Date Code Title Description
AS Assignment

Owner name: TACIT NETWORKS, INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PHATAK, SHIRISH HEMANT;REEL/FRAME:021569/0422

Effective date: 20050506

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: SYMANTEC CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BLUE COAT SYSTEMS, INC.;REEL/FRAME:039851/0044

Effective date: 20160801

AS Assignment

Owner name: CA, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SYMANTEC CORPORATION;REEL/FRAME:052700/0638

Effective date: 20191104