WO2000004483A2 - Hierarchical data storage management - Google Patents

Hierarchical data storage management Download PDF

Info

Publication number
WO2000004483A2
WO2000004483A2 PCT/US1999/016051 US9916051W WO0004483A2 WO 2000004483 A2 WO2000004483 A2 WO 2000004483A2 US 9916051 W US9916051 W US 9916051W WO 0004483 A2 WO0004483 A2 WO 0004483A2
Authority
WO
WIPO (PCT)
Prior art keywords
file
files
media
store
dsm
Prior art date
Application number
PCT/US1999/016051
Other languages
French (fr)
Other versions
WO2000004483A3 (en
Inventor
Larry R. Sitka
Original Assignee
Imation Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imation Corp. filed Critical Imation Corp.
Publication of WO2000004483A2 publication Critical patent/WO2000004483A2/en
Publication of WO2000004483A3 publication Critical patent/WO2000004483A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F2003/0697Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers device management, e.g. handlers, drivers, I/O schedulers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/912Applications of a database
    • Y10S707/913Multimedia
    • Y10S707/915Image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/912Applications of a database
    • Y10S707/941Human sciences
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99944Object-oriented database structure
    • Y10S707/99945Object-oriented database structure processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • Y10S707/99953Recoverability
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99956File allocation

Definitions

  • the present invention relates to data storage and, more particularly, to systems and methods for hierarchical storage management.
  • HSM Hierarchical storage management
  • HSM systems allow the managed storage of data files among a variety of media such as magnetic hard disks, magneto-optic disks, and magnetic tape.
  • the various media differ in access time, capacity, and cost.
  • HSM systems typically are configured such that files that are accessed more frequently or created more recently are stored on "short-term" media having the shortest access time.
  • Short-term media often includes a group of magnetic disks, which may be arranged as a redundant array of independent disks (RAID).
  • RAID redundant array of independent disks
  • Files that are accessed less frequently, created less recently, or have larger sizes are stored on "long-term” media having longer access times and larger storage capacities.
  • Long- term media in an HSM system may include rewritable optical disks and magnetic tape media, which can be arranged in a jukebox of magneto-optical disks or a tape library, respectively.
  • HSM systems typically allocate individual files across the hierarchy of storage media based on frequency of use, creation date, or file size, as discussed above. Accordingly, HSM systems generally seek to avoid excessive access delays in retrieving information that is likely to be accessed most often or most frequently by users. As new files are generated, the system stores the files on the short-term media using a "best-fit" approach. In this manner, the system distributes files across the short-term media in order to minimize wasted storage space. Thus, each file may be stored on a different medium in order to most efficiently manage storage space.
  • a central database maintains the storage location of each file within the HSM system. If users do not request a particular file for an extended period of time, the system automatically migrates the corresponding file to the longer-term storage media and updates the file location database. Again, the system distributes the relocated file across the long-term storage media in a manner calculated to minimize wasted storage space.
  • an HSM system may store a number of copies at different display resolutions and on different media to facilitate identification and retrieval by a user.
  • the system accesses the database to determine the current location of the file. If the desired files reside on longer-term storage media, the system automatically retrieves the files and moves them to the short-term media. If some of the media is not currently loaded into the longer-term storage device, the system generates a request for personnel to physically locate the media and load it into the storage device.
  • the present invention is directed to a system and method for managing the storage of files within an HSM system.
  • the system and method are especially useful in managing the storage of larger files that include graphic imagery.
  • the system and method may incorporate an architecture and methodology that facilitate the storage and retrieval of image files as part of an overall image processing workflow.
  • the system and method may find ready application in a workflow that involves the processing of groups of images associated with particular customers, projects, or transactions, and may act as a storage server for a client application that implements the workflow.
  • the system and method may be useful, for example, in handling the storage of images uploaded from scanned photographic film, or digital images submitted to a photo-processing shop by amateur or professional photographers.
  • the client application can be a photo-processing application that could provide for various media formats, sizes, and quantities of image reproductions for a consumer.
  • the system and method may be useful in handling the storage of medical diagnostic images associated with a particular medical patient or study.
  • the client application could be a picture archival communication system (PACS) that manages the archival of imagery for viewing by physicians.
  • PPS picture archival communication system
  • the system and method may be useful in handling the storage of images associated with particular printing jobs, e.g., for publishers, advertising customers, and the like.
  • the client application could be a digital prepress workflow application.
  • the system and method may incorporate a number of architectural and functional features capable of improving overall economy of storage while maintaining workflow efficiency for the user, as described below. Such features may be used together or independently of one another according to the needs of the particular storage and workflow applications.
  • a fileset feature can be provided that permits grouping of images associated with a particular customer, project, or transaction.
  • the fileset feature allows a user to group files into a logical collection, and perform operations on the files as a group.
  • filesets can be configured to have their member files reside together on common media, so that operations on the fileset such as archival, migration, retrieval, and deletion can be performed significantly faster than operating on the individual files, which might otherwise be distributed across multiple media.
  • the system and method are capable of intelligently grouping images for storage together on common storage media to alleviate excessive access times in retrieving the images.
  • the fileset feature can be used to avoid the scattering of associated images across diverse storage media having different access times, and resulting delays in incorporating such images into the workflow of the client application.
  • a fileset can be configured such that activity on an individual file member triggers the same activity on the other file members of the fileset.
  • a media independence feature can be incorporated whereby data can be stored on particular volumes without knowledge of the media type of the volume.
  • the system and method provide an application programming interface that is substantially identical for all media types, for both direct access and sequential access, and for removable and non-removable media.
  • the system and method may further include a feature that allows the storage volumes to support self-describing media.
  • a feature that allows the storage volumes to support self-describing media.
  • an inventory of the contents of a volume are maintained as entries in a database.
  • This self-describing media feature provides the capability to reconstruct the database entries for the files stored on a volume when a volume is physically moved to another system that is not on the same network.
  • a volume may be physically moved, for example, when the database has been corrupted, when identifying unlabeled or mislabeled media from shelf storage, or when a volume is moved to a remote server
  • the self-describing media feature can be implemented by storing volume metadata on each physical volume and by storing file metadata for each data file on the same physical volume on which the data file is stored.
  • each file may include two files: a blob file with the actual file contents and a metadata file with identifying information.
  • the metadata provides sufficient information to rebuild an inventory of the contents of a volume without access to the original file location database.
  • the metadata can be useful in tracking volumes and files, verification of the identities of loaded volumes, and database recovery.
  • the metadata can be specified by a client application and can be substantially transparent to the storage management server, providing information known to the client application such as the particular client applications and users associated with the volume or file.
  • the metadata may incorporate a global unique identifier (guid) that is unique on a single server, as well as across a multi-server system.
  • a guid can be generated for each storage device.
  • the guid can be used to track the location of a volume or file in the event the volume is moved across the system, or the file is moved across the various media volumes on a server or across the system.
  • the guid for a particular file can be generated by the storage management system, but preferably is generated by the client application along with other metadata useful in referencing the file.
  • a client application programming interface also can be included to provide a number of features to a programmatic user of the system and method.
  • the client API may allow for file storage and retrieval that is either directed by a user to or from a particular collection of media, or is undirected.
  • the client API can facilitate file grouping by allowing fileset creation and modification to generate logical collections of files.
  • the client API also can be used to provide restricted, password-protected logons to distributed server systems, user and user group permissions on files, and a set of privileged, administrator-only functions.
  • the client API can provide additional levels of security.
  • the server system can be protected first by a firewall, second by a HSM system server logon and password, third by file system access control lists, such as the NTFS ACL provided by the Windows NT operating system, and finally by database server logon and password.
  • the client API enables policy creation and modification to provide settings for determining default behavior of the storage management function and its background agents.
  • System configuration also can be managed via the client API, such that hardware and software setup is API-configurable.
  • client API the server systems can be local or remote.
  • client API implementation can be made to communicate through industry-standard TCP/IP protocols to distributed server systems running on any machine that is IP-addressable by the machine on which an application runs.
  • policies can be defined to control the set of media to be used for storage of files associated with particular clients, the movement and replication of the files to other sets of media, peer-to-peer data movement between servers, the timing of data movement and replication, maximum file size allowed for various sets of media, retention intervals on various sets of media, versioning, fileset behaviors, media lifetime, tape retensioning, tape compaction, automated media replacement, and automated media upgrades.
  • the system and method may employ a unique algorithm for migration of data.
  • This algorithm migrates data based on four distinct watermark levels: the deletion high-watermark and low-watermark, and the copy high- watermark and low- watermark. Different combinations of values for these watermarks allow the single migration algorithm to function in various ways, based on customer requirements.
  • Three of the common scenarios that can be supported by the migration algorithm are (1) watermark migration with deletion; (2) copy without deletion; and (3) copy immediately.
  • scenario (1) files are copied to another storage device when a high- watermark is reached, and deleted from one storage level soon after they are copied to another level. This keeps large areas of space available to handle burst activity, and allows migration to occur in batches and be performed at low-load intervals.
  • files are copied to another storage device when a high-watermark is . reached, but not deleted from a storage until space on that storage is low. This can improve caching by retaining files for a longer period of time, but may cause delays during large peak loads while files are being deleted. It also allows migration to occur in batches, and can be performed at low-load intervals.
  • files are scheduled to be copied soon after they are stored, but not deleted from the storage until space on the storage is low.
  • migration can occur as a continuous, low- priority task.
  • the system and method also can be configured to provide direct access to the storage function instead of indirect access, e.g., via a SQL server.
  • system and method can be arranged to give privileged clients direct read-only access to the file-oriented storage media, with guarantees that the files they specify will remain on that media until they specify otherwise. This provides, for example, the fastest and most direct access to media files that need to be published to web sites, by allowing web servers to publish stored files without having to make a copy of the file elsewhere.
  • This direct access feature also can support specific user-specified file extensions so that applications that trigger on file extension, such as certain web browsers, can use this feature.
  • the system and method may employ an architecture that allows a variety of functions to be allocated to the same or different computer servers in a single storage management system.
  • a central controller e.g., one per system
  • handles initial requests from client applications communicates with library controllers to get volumes mounted, and sets up communication paths to data movers.
  • Library controllers e.g., one or more per system, handle the movement of media in manual, fixed-media, and/or automated media libraries.
  • Data Movers e.g., one or more per DSM system, handle the actual movement of data to and from clients and other data movers.
  • a database e.g., one per DSM system, provides database services to the central controller.
  • the system and method also can make use of peer-to-peer configurations in which multiple storage management systems can participate as peers in storage management policies. Files that are stored in one system can be migrated or copied to other systems either automatically through administrator-defined policies, or explicitly through client application control.
  • the system and method are configured to maintain file identity, even across multiple servers, by assigning each file a globally unique identifier (guid), as described above.
  • Automation servers In order to link web pages to executable programs, the system and method can use an interface referred to as an "automation server.”
  • the automation server is created as prescribed by applicable Microsoft specifications.
  • Automation servers having features unique to the system and method of the present invention are (1) a Storage Automation Server; (2) an Exe Automation Server; (3) an Image Tool
  • Storage Automation Server (1) allows a web interface to attach and utilize the client API described above. This interface is portable across any user which requires a web connection to the storage management system. Another unique component is the locking down of individual images on a "Hot " " cache to avoid migration. Finally, in storing the images, this automation server specifies a file extension to append to a GUID assigned to the file. This provides great advantage when viewing the images from a browser which does not support any type of image file decoding. For example, when viewing an image, the Netscape Navigator web browser only examines the file name extension such as .jpg , .gif, .tif, etc.
  • the Exe Automation Server (2) allows a user to kick off and get a return result from any executable program out of a web page.
  • the Image Tool Automation Server (3) allows the calling of image processing tools from a web page. Because web interfaces are page driven and stateless, debugging is extremely difficult.
  • the Event Log Automation Server (4) allows any web interface to log errors encountered from within a web page to the NT event logging facility on any NT server. In this manner, any event at the web interface is translated for programmatic delivery to the NT server, and can be logged for analysis.
  • the system and method also can be configured to facilitate web-based workflow within an application that invokes the storage management function.
  • an application can scan and/or acquire images from a high speed film scanner, photoscanner, flatbed scanner, PCMCIA cards (digital cameras), or any TWAIN compatible device. Those images then can be compressed into three image file types: (1) a thumbnail image, (2) a screen resolution image, and (3) a high resolution image. All three versions of each image are sent and "checked-in" to the storage management function with a file extension known to be processable by the web browser, e.g., .jpg, via the automation server.
  • a file extension known to be processable by the web browser
  • the client application supporting the web-based workflow converts each acquired image to the supported format such that the user can have direct access to the images maintained by the storage management function within the web browser.
  • the screen resolution and thumbnail versions can be "locked" down on the short-term media, such as a RAID, and never allowed to migrate offline to tape.
  • the high resolution image may be allowed to migrate according to user-defined migration policies then in effect. In this manner, internet access to the images is as quick as possible.
  • the locked-down images are not stored by the database server, but rather by the HSM server for direct access by the user. By migrating the high resolution image, however, storage costs for RAID are dramatically reduced.
  • an application can be configured to store images as economically and efficiently as possible using the system and method of the present invention, with the potential for growth of storage capacity being unlimited and scalable.
  • the present invention may provide a "digital vault" function within a client application.
  • This function can be supported in part by the fileset feature.
  • each consumer has his/her own unique digital vault that is accessible as a web page and resembles an electronic safety deposit box.
  • This digital vault contains images that are categorized and stored in folders that are graphically represented on the web page as a set of drawers.
  • Each set consists of a single or multiple set of images that were acquired from one of the acquisition devices described above.
  • One set could be a single roll of film, another could be a scanned legal document or documents, and another set can be a VHS tape or tape library.
  • This vault can be password and login protected. All image transmissions can be done under SSL.
  • the vault image viewing is also secured through a virtual directory service, another sequence of logins into the storage management system, a Microsoft SQL Server, and the Windows NT NTFS file system itself. From the vault, the consumer can proof and create his/her own media order. That media order is placed into a "shopping basket" for the consumer, and totaled for payment and shipping. The consumer may also send those images to a third party via internet mail. Those images stored within the vault are set on an aging algorithm where after a predetermined number of days, the images are deleted from the system.
  • FIG. 1 is a functional block diagram illustrating the architecture of a hierarchical storage management system
  • FIG. 2 is a functional block diagram of server component in a system as shown in FIG. 1 ;
  • FIG. 3 is a diagram illustrating implementation of migration policy in a system as shown in FIG. 1;
  • FIG. 4 is a diagram further illustrating implementation migration policy in a system as shown in FIG. 1 ;
  • FIG. 5 is a state diagram illustrating state transitions during execution of a migration policy in a system as shown in FIG. 1 ;
  • FIG. 6 is a diagram mapping migration states to watermarks in a system as shown in FIG. 1 ;
  • FIG. 7 is a functional block diagram illustrating the interaction between a client application and a server component in a system as shown in FIG. 1 ;
  • FIG. 8 is a functional block diagram illustrating the interaction between a web- based client application and a server component, via an automation server component that links the client and server, in a system as shown in FIG. 1.
  • FIG. 1 is a functional block diagram illustrating the architecture of a hierarchical storage management system 10, in accordance with an embodiment of the present invention.
  • System 10 will be described herein as a "directed storage management (DSM)" system inasmuch as it allows a system administrator to introduce, on a selective and reconfigurable basis, significant direction concerning . storage management policies including migration. That is, the client and an administrator have the ability to direct a file to a particular location. This differs from a simple hierarchical storage management (HSM) system that handles the selection internally. The difference is that DSM system 10 provides a much broader set of data- movement policies than a typical HSM, and gives a client ways to override those policies.
  • DSM directed storage management
  • System 10 can be used to implement a method for hierarchical storage management in accordance with an embodiment of the present invention.
  • the following terms will be used herein to describe the structure and functionality of DSM system 10, and are generally defined as follows.
  • the term "media” refers to a physical piece of data storage media such as a tape, an MO cartridge, a CD, or a hard disk.
  • a "Store” is any collection of like media.
  • the media in a Store can exist in any number of locations, i.e., on shelves, in one or more robotic libraries, or in individual drives.
  • a "freeStore” is a collection of "empty" volumes with the same MediaType.
  • the media in a freeStore can exist in any number of locations, i.e., on shelves, in one or more robotic libraries, or in individual drives.
  • a "File” is a user-defined blob of information that can be stored in DSM system 10.
  • a DSM File will often correspond to a file in a file system.
  • a File may exist in more than one Store at any given time, and will be moved or copied between Stores according to the policies that an administrator has put in place and specific user requests.
  • a "policy” is a rule that governs the automatic movement, i.e., "migration,” of files between stores, and removal of files from Stores.
  • a "ticket” or “token” is a unique identifier returned to a client application when a File is stored by DSM system 10. The client will use the ticket for all subsequent references to the File.
  • DSM system 10 is implemented as a software system having a plurality of logical software components. Such components may include a DSM client library 12, a DSM server process 14, library server processes 16, volume server processes 18, DSM data mover processes 20, client data mover processes 21, DSM agent processes 22, DSM administrator processes 24, and a database server 26.
  • DSM client library 12 is linked with a client application to provide client-side data mover services.
  • DSM server process 14 operates to control the storage and retrieval of files from various data storage devices.
  • Library server processes 16 direct media handling in automated and manual libraries.
  • Volume server processes 18 handle mounted volumes and set up data mover processes 20, 21 to transfer data.
  • Data mover processes 20 move files between DSM clients and mounted DSM volumes.
  • DSM agent processes 22 perform routine management functions such as migration and compaction.
  • DSM administrator applications 24 provide a special type of client application for managing DSM system 10.
  • database server 26 provides robust, recoverable, database storage and services for the other processes.
  • the above component processes 12-26 can reside on a single host machine, or can be distributed across many hosts.
  • the term "process" in this discussion should be inte ⁇ reted broadly.
  • a process could be a separate application, or one or more threads within an application.
  • FIG. 1 illustrates how the component processes are related, with bold lines indicating networkable interfaces between component processes.
  • Each host machine may be based on any conventional general purpose single- or multi-chip microprocessor such as, for example, a Pentium® processor, a Pentium Pro® processor, an 8051 processor, a MIPS processor, a Power PC® processor, or an Alpha® processor.
  • the processor can be integrated within a personal computer, computer workstation, or network server to form the host machine, and may be configured to run on any number of operating system environments.
  • the Windows NT operating system platform may be particularly suitable to the storage management tasks carried out by DSM system 10, although the system can be readily adapted for other operating systems such as Unix.
  • DSM client library 12 may be configured to implement
  • DSM application programming interface
  • User applications 28 are software applications for manipulation of files stored by DSM system 10.
  • the client API functions can be callable, for example, from C and C++ application programs, and from other languages that can link in C libraries.
  • the basic functions provided by the DSM client library are (1) establish secure connections to DSM server 14; (2) translate client API calls into networkable calls to DSM server 14; and (3) establish client-side data mover processes 21 that allow data to move efficiently between user data streams or files 30 and DSM devices and media.
  • DSM server process 14 is responsible for servicing multiple concurrent requests from its various clients, i.e., user applications 28, DSM agents 22, and DSM administrator applications 24.
  • the basic responsibilities of DSM server process 14 are (a) provide logon and file-access security for its clients; (b) handle concurrent client requests to a given file by providing simultaneous or sequential access depending on the nature of the requests; (c) translate requests for files into requests for data transfer to or from specific media volume locations; (d) sequence data transfer requests to maximize the utilization and throughput of available devices, while providing a good quality of service to clients; (e) direct the library server process 16 to mount the required volumes at the appropriate points in time; (f) direct the volume server process 18 to establish DSM data mover processes 20 for mounted volumes, and connect those processes to client data mover processes 21, or other DSM data mover processes, as required; and (g) issue commands to data mover processes 20, 21 to effect the requested data transfer.
  • DSM server process 14 also is responsible for communicating with the DSM administrator processes 24 to report errors and effect changes requested by administrators.
  • a DSM server 14 also can communicate with other DSM server processes 14 to allow clients to share remote files in a multi-server host environment.
  • a library server process 16 executes on each host that manages removable media for DSM system 10.
  • Library server process 16 issues media movement commands to automated libraries 32, and interfaces to operator consoles for manual media operations, i.e., shelf management.
  • Volume server process 18 handles mounted volumes, issuing drive-related commands, such as locking drives 34 and assigning drives to processes.
  • Volume server process 18 also sets up DSM data movers 20 that are configured to read and write DSM media volumes, and to communicate with other client data movers 21 and DSM data movers 20.
  • a data mover process 20, 21 is instantiated at each data endpoint.
  • DSM client library 12 sets up client data movers 21 to communicate with the user application or the user file system 30.
  • the volume server 18 sets up DSM data movers 20 to read and write DSM volumes.
  • the basic functions of the data movers 20, 21 are (1) receive instructions from the DSM server 14 to begin a data transfer, and return status; (2) read from or write to a data endpoint, e.g., a DSM device, or user file/user stream 30; and (3) transfer data to, or receive data from, another (possibly remote) data mover.
  • a DSM agent 22 is a privileged client application that is tightly integrated into DSM system 10.
  • DSM agents 22 help enforce policies that are set up by the DSM administrator 24. Examples of operations performed by DSM agents 22 are (a) migrate files from one DSM Store to another; for example, an agent may move files from a RAID disk to a magneto-optical (MO) drive and/or to a tape based on the time of their last reference; (b) delete files that have reached a certain age; (c) compress tape volumes by removing stale files (files that have been deleted by an application); (d) retension tapes; (e) copy and remove aged media; (f) copy or move files to remote DSM system(s); (g) import volumes from a foreign system; and (h) transfer a Store to a different (perhaps more modern) media type.
  • MO magneto-optical
  • DSM administrator applications 24 are privileged applications used to configure, monitor, and manage the DSM system 10. In particular, DSM administrator applications 24 allow setting of migration policy and security levels. Administrator applications 24 are written using the privileged DSM administrator API functions. Database server 26 stores information vital to the operation of DSM system 10. The database server 26 provides secure, robust, transaction-oriented storage and efficient search and retrieval mechanisms.
  • DSM client library 12 provides the client-side implementation of the DSM Client API.
  • the DSM Client API functions are presented to client applications as "C" functions. Internally, the functions are implemented in C++, and a C++ object interface could also be available to client applications.
  • the client library is thread- safe, allowing for concurrent multi-threaded user applications.
  • a user application thread can have multiple DSM connections, and there may be multiple DSM files open on each connection.
  • the library maintains state for each connection and each open file.
  • All Client API functions can be made synchronous. Certain functions, such as file reloads (DSMReloadFile), can either schedule a task, or wait for the task to complete. In either case, the function returns as soon as the operation it requests (the scheduling or the task) is complete. In a preferred embodimeent, there are no asynchronous mechanisms such as callbacks, waits, or event notifications. For scheduled tasks, there are no specific API functions to find out if the task is completed, although the application can be configured to determine if the task is complete by examining the state of the File or Store involved.
  • Most of the Client API functions preferably can be translated directly into DSM server function calls and processed entirely in the user thread space.
  • the following functions can be provided to establish a data connection, and create a separate data mover thread to handle the movement of the data between the client file or stream 30 and the DSM device 31 : (1) DsmCreateFile; (2) DsmOpenFile; (3) DsmStoreFile; (4) DsmRetrieveFile; and (5) DsmReplaceFile.
  • Data mover threads are used by DsmReadFile and DsmWriteFile functions. Threads created by DsmCreateFile and DsmOpenFile are destroyed by DsmCloseFile. Threads created by DsmStoreFile, DsmRetrieveFile, and DsmReplaceFile are destroyed at the completion of the operation.
  • FIG. 2 shows the major components of an exemplary DSM server 14 forming part of system 10 as shown in FIG. 1.
  • FIG. 2 illustrates the relationships of the DSM server components to other components of DSM system 10.
  • the bold boxes in FIG. 2 denote DSM server components.
  • the first DSM server component to be described is request manager 36.
  • Request manager 36 provides a network connection point for client and administrator applications.
  • the DSM server 14 preferably is multi-threaded, and request manager 36 dispatches incoming requests to appropriate threads so that it can handle other new requests.
  • the various components of DSM server 14 will be described in greater detail below.
  • request manager 36 routes incoming requests based on the type of request. Requests that do not deal with specific files and filesets are handled by security manager 38. These requests include Connection API requests, e.g., Logon, and Security API requests. Requests that change the DSM configuration are handled by configuration manager 40. These requests include all requests that modify the physical configuration (Libraries, and Media), and those that modify the logical configuration (Stores, Volumes, and Remote Servers). Fileset requests are handled by fileset manager 42. Each Fileset denotes a group of associated files. Fileset manager 42 decomposes Fileset requests into multiple File requests and passes them off to file manager 44. Requests that process individual files are handled by File manager 42, Requests for explicit file locks, i.e., a locking down of files onto a particular volume, are handled by lock manager 44. When an input/output operation
  • IO completion function 50 is dispatched.
  • Security manager 38 interfaces to a DSM security database, and is responsible for establishing logon sessions and maintaining all session information.
  • the tasks handled by security manager 38 include (1) connection management; (2) user administration; (3) file permissions; and (4) store permissions.
  • the connection management task handles logon attempts, validates username and password, and issues a security token to the requesting application (via the client library). Also, the connection management task validates the security token on all other requests.
  • the user administration task handles all maintenance of Users, Groups, and passwords.
  • the file permissions task maintains and validates Users and Group permissions for accessing Files.
  • DSM system 10 may implement a security model having the following characteristics: (1) privileges; (2) DSM Users; and (3) Groups of Users. Privileges include Create, Read, Write (where Write implies overwrite and delete). For every DSM User, a Logon and a Password are assigned.
  • Every User is a member of one or more Groups; (b) every User has a Default Group of which he is a member, (c) for each session, the User is assigned an Active Group for that session, and Files created during that session are associated with the creator's Active Group; (d) at logon, the Active Group for the User is set equal to the User's Default group; (e) every Group has a DefaultStorelD; (f) the administrator creates Users and Groups, assigns Users to Groups, and sets the User's Default Group; (g) every file has one Owner and one Group; (h) every file has separate read and write permissions for the owner, for the group associated with the file, and for others; and (i) every Store has a set of Create, Read, and Write permissions for each group or user that can directly access the files in that Store.
  • the Active Group can be changed by, the User through the API to any group of which the User is a member and remains in effect for that session or until the User changes it again. If a User creates a new file without specifying a StorelD, the new file is created using the DefaultStorelD associated with the User's current Active Group. The User can change his Default Group, but cannot change the groups to which he belongs.
  • the Owner of a file is the user who creates the file and the Group is the owner's Active Group when the file is created.
  • the owner has read and write permissions, the group has read permissions, and others have neither. This default can be changed by the administrator, while file permissions can be set and changed by the owner and administrators. Finally, store permissions are set by the administrator.
  • the Collections of Stores may have the following characteristics: (i) Every Store belongs to zero or more collections; (ii) User and group permissions may be changed on a Collection basis; and (iii) Collections are strictly an administrative convenience. Permissions are always stored on a Store basis, and never on a Collection basis.
  • DSM system 10 When a user connects to DSM system 10, the DSM system verifies that the Userld and Password are known. When a user performs a function, DSM system 10 determines which Store, file, and operation (create, read, or write) is involved. There may be two files, and the files may be in the same or different Stores. Each file may have a different operation. For each file, DSM system 10 will: (1) verify that the user or some group the user belongs to has permission to perform that operation in the Store; and (2) verify that the user has permission to perform that operation.
  • Verification of user permission proceeds as follows: (i) if the user is the owner, compare operation with owner permissions, (ii) else if the user is a member of the file group, then compare operation with group permissions, (iii) else compare operation with other users' permission.
  • the Store permissions need only be checked against the Store specified in the user command. If the file is actually stored to or fetched from another Store, the permissions for the Store used are not checked. No security check . is made on DSM system-initiated operations.
  • GroupName The name of the group DefaultStorelD New files will be created in this Store if the user does not specify a StorelD, where the GroupID is the User's
  • GroupMember Table (Primary key on UserlD and GroupID fields): UserlD The user who is part of the group
  • File Table (only security-related attributes are listed): FilelD The GUID for this file OwnerlD The UserlD of the owner of the file
  • GroupID The GroupID of the group the file is associated with
  • Store Table (only security-related attributes are listed): StorelD The GUID for this Store
  • StoreAccess Table indexes on StorelD and on AccessID: StorelD The Store ID
  • AccessID A GroupID or UserlD that has access to this Store.
  • CollectionID A collection a Store belongs to StorelD The Store that is a member of the collection
  • GroupMember and StoreAccess tables and cached by DSM system 10 for the open connections The number of Stores is typically small, but it could be relatively expensive to check if the user is a member of a number of groups. Once this information is cached, checking Store permissions during this session is trivial. If permissions change while the user is active, the cached permissions become invalid and need to be regenerated.
  • the cached permissions list for the user contains any Stores for which the user has specific permissions, and Stores for which any group that the user belongs to has permission. User permissions will override all group permissions. Multiple group permissions will be cumulative. That is, if there is a permission record for the individual user for a Store in the StoreAccess table, then records for that Store for any groups to which the user belongs are ignored. If there is no record for the user, then his permissions in a given Store will be the sum of permissions for the groups to which he belongs within that Store.
  • File permission for a user can be checked as follows: (1 ) the logged on user's
  • the above security implementation provides a number of advantages. For example, it is easy to implement basic security, but possible to implement detailed security. Also, this security implementation requires very low overhead for sites that have relaxed security requirements, or implement security using native OS facilities.
  • Configuration Manager 40 interfaces to a Configuration Database, and handles all requests that modify the configuration. These include all requests that add, delete, or modify parameters of the following entities: (1) sites and DSM services; (2) racks and shelves; (3) libraries, drives, pickers, slots, and ports; and (4) Media and
  • Volumes. Configuration manager 40 interfaces with volume manager 52 and with IO Scheduler 54 for media-related and volume-related requests. For example, a request to format a volume will build an IO Request and add it to the IO Request Queue 56.
  • IO scheduler 54 will handle the request when a drive becomes available, and volume manager 52 will issue the appropriate requests to the library server 16 to mount the volume, and to the Volume Server 18 to format the volume.
  • Fileset Manager 42 handles requests that deal with all files in a Fileset. Fileset manager 42 translates the Fileset request into a set of File requests for the file manager, tracks the completion of the associated File requests, and notifies the requestor when the Fileset operation is complete.
  • File manager 58 translates client requests and fileset manager requests into FileRequest objects and places them on the FileRequest Queue 60. File manager 58 tracks the completion of File requests and notifies the requestor when a file operation completes.
  • File scheduler 62 sequences requests for a given file when necessary.
  • File scheduler 62 maintains a First-In-First-Out (FIFO) queue of requests for each file.
  • FIFO First-In-First-Out
  • a new request to read a file can be started if there are no write requests for that file ahead of it in the queue.
  • a write request for a file can be started only if there are no other requests ahead of it in the queue.
  • File scheduler 62 When File scheduler 62 determines that a file request can be started, it creates one or more IO Requests and places them on the lORequest Queue 56. The file request remains in the FileRequest Queue 60 until it is complete, so that file manager
  • a DSM file may be split into multiple chunks on one or more physical volumes. Furthermore, if both the source and destination of a file operation are on . DSM media, the source and destination may have an unequal number of chunks and the chunks may be of unequal size.
  • File Scheduler 62 converts a File Request into one or more segment IO requests, where each segment is of equal size on the source and destination volumes.
  • File Scheduler 62 generates an IO Request for each segment, which includes information about the volumes that are required to satisfy the request.
  • the IO Requests are placed on the IO Request Queue 56 for IO
  • IO Scheduler 54 For the destination of a write or copy request, only the destination Store may be known. Selecting the volume within the Store may be postponed until a drive is available to service the request. The task of selecting a volume is given to the IO Scheduler 54. The role of the IO Scheduler 54 is to select the next IO Request to be processed.
  • IO Scheduler 54 selects requests from the IO Request Queue 46 in order to maximize the utilization and throughput of available devices, while guaranteeing some level of service to all clients. File conflicts are resolved by File Manager 58, so the IO Scheduler 54 has a great deal of freedom in reordering IO requests.
  • the algorithm for selecting the next request to process takes the following into account: (1) high-priority requests that can use the drive; (2) other requests that can use the volume that is in the drive; (3) requests that can use a different volume in the same library; and (4) for sequential devices, such as tapes, the file segment position on the media relative to the current media position.
  • a request may have originated as a high-priority request, or may have had its priority elevated based on the length of time it has been queued.
  • Other requests that can use the volume that is in the drive may include requests that do not specify a specific destination volume, and for which there is room on the volume in the drive.
  • the selected request is passed off to an IO worker thread 48.
  • An IO worker thread 48 is dispatched to handle IO requests, especially those that may take some time.
  • IO requests include Issue Volume requests to the Volume Manager 52.
  • the IO Scheduler requests the necessary Volumes from the Volume Manager 52 and handles the response.
  • the Volume Manager 52 will see that the volumes are mounted and the Data Movers 20, 21 are ready.
  • an IO Worker 48 is dispatched to direct the data transfer. When both the source and destination Data Movers 20, 21 are in place, the IO Worker 48 issues the command to the appropriate Data Mover to initiate the data transfer.
  • the IO Completion 50 functions handle the completion. It updates the status of the request on the IO Request Queue 56. If an IO completes abnormally, such as when the destination device runs out of space, the IO
  • Completion routine 50 may create new IO requests to complete the IO operation. When all IO requests associated with a file request are complete, the file manager 58 is notified.
  • Volume Manager 52 carries out the following tasks: (1) maintains the records in the Volume and Store tables in the database; (2) aids in selecting appropriate destination volumes for IO Requests; (3) gets volumes ready for transferring data; and (4) when data transfer is complete, releases the volumes. In preparing volumes for transfer, volume manager 52: (a) issues Volume Mount requests to the Library Server 16 and handle the responses; and (b) issues requests to Volume Server 18 to prepare the drive and volume for file segment commands, and handle the responses.
  • Volume Manager 52 issues requests to Volume Server 18 to release the volume and drive, and issues requests to Library Server 16 to unmount the Volume.
  • Database Manager 64 provides a programming interface to the DSM database that is independent of the underlying database implementation.
  • a Library Server process 16 executes on each host that manages removable media for DSM system 10.
  • Library Server process 16 issues media movement commands to automated libraries, and interfaces to operator consoles for manual media operations such as shelf management.
  • the commands that Volume Manager 18 issues to Library Server 16 for normal operations are independent of the type of library involved.
  • a Library Server 16 has a well-known port address on which the DSM Server 14 can communicate.
  • a Library Server 16 will spawn multiple threads to allow concurrent operation of multiple pickers. For the cases where the Volumes in a Store are not on removable media, the Library Server function may not be necessary, or may be located on the DSM Server host even if the media resides remotely.
  • Volume Server process 18 executes on each host having drives that handle DSM volumes.
  • the roles of the Volume Server 18 are to: (1) issue device-oriented commands such as mount the file system and lock a volume in a drive; (2) perform volume-oriented commands such as (a) partition and format a volume, (b) read and write the volume label, (c) return volume statistics from the operating system, such as total space, space used, and space available, (d) enumerate files on a volume, and (e) perform I/O control such as rewind or position; and (3) set up a Data Mover 20, 21 for each concurrent file-related operation. For random-access devices that allow concurrent operations, such as hard disk and MO, a Data Mover would be established for each concurrent operation.
  • the Volume Server 18 There is one Volume Server process 18 per host that controls DSM drives.
  • the Volume Server 18 has a well-known port that the DSM Server 14 can use to issue commands.
  • the Volume Server 18 is implemented as multiple processes or threads.
  • the DSM Data Mover 20, 21 objects spawned by the Volume Server 18 may be implemented as threads of the Volume Server process.
  • Data movers 20, 21 can be instantiated in two places: (1) on client hosts by the DSM client library 12 to read and write user files and streams; and (2) on hosts that have DSM drives by the Volume Server 18 to read and write DSM volumes.
  • Data Mover process 20, 21 has two communication ports that are made known to the DSM Server 14.
  • One port is used by the IO Scheduler 54 to issue instructions for file- related operations.
  • Common commands include: (a) create a file; (b) delete a file; (c) return information about a file (metadata); (d) Read a file or portion of a file from media and transfer it to another Data Mover; (e) prepare to accept a file or portion of a file from another Data Mover and write it to media.
  • the other port is used to transfer data from one Data Mover 20, 21 to another data mover. Data transfer operations always involve two Data Mover processes 20, 21, and the two processes are each made aware of the communications link.
  • the implementation may have two connections between Data Movers, one for commands and one for data.
  • DSM Agents 22 are processes that typically run on the same host as the DSM Server 14, and work with the DSM server to implement policies set up by the
  • One or more DSM Agents 22 may be responsible for the following types of tasks: (a) migrate files from one DSM Store to another, e.g., from a RAID disk to a magneto-optical (MO) drive and/or to a tape drive based on the time of last reference to the file; (b) delete files that have reached a certain age; (c) compress tape volumes by removing stale files (files that have been deleted by a user application);
  • DSM agents are privileged client applications that use the Client API and the DSM Administrator API.
  • Database Server 26 stores information vital to the operation of DSM system
  • Database Server 26 provides secure, robust, transaction-oriented storage and efficient search and retrieval mechanisms.
  • the information maintained by the Database Server 26 may include: (a) the DSM security database containing user IDs, passwords, and privileges; (b) physical configuration, including local and remote libraries and drives, and external media storage; (c) remote DSM servers and their communication parameters; (d) media inventory, including the location of each piece of media, and the space available; (e) file metadata, including the security attributes and media locations of each file; (f) logical grouping of media into DSM Stores, and information about each Store; and (g) policy parameters used by the DSM server and DSM agents.
  • the database can be stored in tables and the database implementation provides a relational view of the data.
  • Policies can be placed in the following categories: (1) policies dealing with maintaining copies in stores; (2) policies dealing with media and volumes; and (3) miscellaneous policies.
  • Policies dealing with maintaining copies in stores may include (a) an initial store policy that specifies the default store(s) into which a new file is placed; (b) a maximum file size to store, which may be bypassed for larger files, in lieu of an alternative store; and (c) an alternate store to use if the file size exceeds the maximum specified.
  • a migration policy may (a) enable migration, in which case lowest level stores would not enable migration; (b) specify migration low and high watermarks; (c) specify the store to which files should be copied when the present store reaches some capacity threshold; (d) specify migration ranking criteria, such as oldest, least recently used, size, combination of age and size, (e) specify use of fileset migration, and allow the user to choose different levels of adherence to this policy; and (f) set a migration time window, i.e., a period of time in which to carry out the migration.
  • a deletion policy may (a) enable automatic deletion, which typically would not be enabled for lowest level store; (b) specify stores on which copies must exist before a file is deleted from this store; (c) specify deletion from the original store immediately upon migration to the new store; (d) set a suggested minimum age at which to delete a file; (e) set a suggested maximum age at which to delete a file, in which case the file may be deleted even if space is not needed; (f) specify marking of deleted files as obsolete without deletion, enabling recovery of a deleted file; (g) specify marking of overwritten files as obsolete without deletion, enabling recovery of any version of a file; (h) set a maximum time to retain an obsolete file beyond deletion (obsolescence); and (i) set a maximum time to retain any file beyond last reference.
  • a reload policy may (a) require reloading of an entire fileset when a file is reloaded; and
  • a chunking policy may: (a) allow chunking files on the given store although some users may choose to disallow splitting of files across volumes; in this case, files larger than the size of one volume would be rejected; and (b) set a minimum chunk size to prevent a proliferation of small chunks, which may not apply to a final chunk.
  • a volume selection policy may specify that the selected volume will be (a) the first available; (b) the best fit in terms of storage space or other characteristics; (c) the most recently written to keep files in more or less chronologically ordered on media; (d) the least recently used to keep volumes cycled through drives and spread new files across media for concurrent retrieval; (e) a round robin format in which volumes are cycled through the drives; and (f) with the most current file in the same fileset or any file in the fileset.
  • a drive selection policy may specify that files be stored (a) on a drive that is the first available; or (b) to balance usage among drives.
  • a shelf management policy may (a) enable shelf management for the given store although shelf management for intermediate stores may not be desired; (b) use Free Store from which new volumes can be drawn; (c) set a tape retension interval; (d) set a maximum media age according to which the system will copy to new media and scrap old media when the age is exceeded; (e) set a maximum mount count by which the system will copy to new media and scrap old media when the count is exceeded; (f) set a tape compaction threshold specifying the minimum bytes to try to recover; (g) specify merging files in a fileset onto media or, alternatively, only upon migration; (h) set a shelf management time interval that specifies a best time to compact and retension tapes and merge filesets; and (i) specify import/export volumes and notify caller when an offline file is accessed.
  • An inventory policy may provide that the library be periodically inventoried, inventoried using a barcode inventory method, or using a volume label inventory method where each volume is loaded.
  • a number of miscellaneous policies may be employed including a logging policy whereby the system logs, e.g., deletions, overwrites, stores, and retrievals.
  • DSM system 10 makes use of a set of policy algorithms.
  • DSM system 10 receives a new store- file request from an Endpoint Client, for example, it will store the file in the Store specified in the request if one is specified. If no Store is specified, DSM chooses the Default Store for the User's Active Group. Typically, this will be a direct-access storage device, such as a RAID disk. In any case, exactly one Store is chosen. Whenever the size of the file to be stored exceeds the non-zero FileBypassSize attribute of the selected store, then the store specified in the FileBypassStore attribute is selected instead.
  • file manager 58 creates a new File object in the Store and schedules a file copy from the Client Endpoint to the Store. Whenever a copy is scheduled, the destination Store is added to the File's vsScheduled attribute. Whenever a copy completes, the destination Store is removed from vsScheduled and added to vsResidence for that file.
  • the File's vsHeld is updated to indicate whether or not this file is a candidate for deletion (primarily whether it has been migrated), and the Store's BytesHeld property is updated accordingly. Since a new file typically cannot be deleted until it is migrated, the Initial Store is usually added to vsHeld following the store, and the file size is added to BytesHeld.
  • a maximum lifetime is specified for the file, it is placed in the file's Lifetime attribute. If none is specified, then the MaxLifetime attribute of the Store is placed in the file's Lifetime attribute. Whenever a copy to a Store completes, DSM will look at a vsCopyTo indicator for that store, and immediately schedule copies from this Store to all the Stores specified if copies do not already exist or are not already scheduled. This is essentially the same as an immediate migration from the Initial Store to the Stores in the vsCopyTo.
  • Overwriting a file is equivalent to deleting the file and storing a new file but reusing the old FilelD.
  • a file is deleted by giving the file a new FilelD and then scheduling the deletion using the new FilelD. The act of physically deleting the file can be queued. Once the file is given a new FilelD, other requests using the old FilelD, such as an overwrite request, can proceed ahead of the delete. Since copies of the file may exist on a number of different stores, deleting the file may require mounting one or more volumes, which could take some time. A client desiring to overwrite the file will not have to wait for the previous copy to be physically deleted in order to write the new copy. Overwriting a file requires that the File be Locked.
  • a Delete File request is always the last request allowed for a file.
  • the file is marked for deletion. If a RetainDeleted property is set for the Store, then the file is flagged as Obsolete; otherwise the file is flagged as Deleted. All subsequent requests for that file, except those dealing with deleted files or versions, will return a status indicating that the file does not exist.
  • the Delete File request is processed by copying the FilelD property into an OrigFilelD property, and then giving the file a new unique FilelD. In doing so, DSM server 14 can process the Delete File request as it would a normal file, while retaining the information needed to undo the delete if the configuration allows it. Furthermore, this supports multiple deleted (obsolete) file versions, while maintaining a unique FilelD for every unique file version in the system.
  • deleting the file from DSM is equivalent to deleting it from all stores in which it is resident, which can be determined from a vsResidence vector.
  • the file record is then deleted if a bRetainDeleted property is not enabled for this store.
  • the act of physically deleting the file can be queued, and other requests for the same FilelD, such as an overwrite request, can proceed ahead of it. Because copies of the file may exist on a number of different stores, deleting the file may require mounting one or more volumes, which could take some time. A client wanting to overwrite the file will not have to wait for the previous copy to be physically deleted in order to write the new copy. The file must be Locked in order to be deleted.
  • Deleting a file from a Store is not the same as deleting a file from DSM system 10.
  • the file may be deleted from one Store and still exist in other Stores.
  • a file is deleted from a Store as part of the migration scenario. Deletion is a scheduled task, since it may involve mounting one or more volumes. In most cases, it is a higher priority task than writing to the volume, so that the space freed by the deletion will become available sooner. For most types of media that support deletion, as opposed to marking a file as stale, the delete operation is very fast once the volume is mounted.
  • FileSegmenter process schedules the deletion of the individual chunk, and the number of bytes in each chunk is subtracted from a BytesRemovable or BytesHeld property and is added to a BytesDeleting property of the volume and store.
  • An IOScheduler process handles deleting the chunks. As the physical deletion occurs, the size of the chunk is subtracted from the BytesDeleting property and added to the BytesFree property of the volume and store. When all chunks are deleted, the Store is removed from a vsDeleting property for the file. If a copy of a File into a Store is requested, e.g., a Reload, while the file is being deleted from the Store, the deletion must complete before the copy begins. .
  • Copying a file is one of the more important operations in DSM. It generally would be used in all the following situations: (a) to place files into DSM system 10 by doing a copy from a Client Endpoint to a Store; (b) to return files to a client, by doing a copy from a Store to a Client Endpoint; (c) to migrate a file from one store to another; (d) to reload a file into the cache store; (d) compacting media; (e) aging media; and (f) migrating to new media types.
  • File copy requests are handled by a FileSegmenter process. The actions described below take place in response to a file copy request.
  • the element corresponding to the source Store is incremented in the vsSourcing property for the file.
  • the destination Store is added to a vsScheduled vector for the file.
  • the FileSegmenter process is called to generate IO Requests to perform the copy.
  • Volume Manager 52 will select one or more destination volumes and reserve space by incrementing BytesReserved and decrementing BytesFree.
  • IO Scheduler 54 and Data Mover 20, 21 will copy the bytes.
  • a BytesReserved property will be decremented and a BytesHeld or BytesRemovable property will be incremented for both the destination volume(s) and the destination Store. If the file is held, the vsHeld bit for the destination store will be set in a File.vsHeld vector.
  • the destination Store When the copy is complete, the destination Store will be removed from the file's vsScheduled property and added to a vsResidence property for the destination file. The source is evaluated to see if the file is a candidate for deletion and the File's vsHeld vector is updated accordingly.
  • DSM server 14 Whenever a copy to a Store completes, DSM server 14 will look at the vsCopyTo vector for that store, and immediately schedule copies from this Store to all the Stores specified in that vector if copies do not already exist, as determined by the vsResidence, or are not already scheduled., as determined by the vsScheduled vector.
  • DSM server 14 When a client application requests a file from DSM system 10, the following steps take place.
  • the file may exist on multiple Stores.
  • the file vsResidence indicates the Stores in which the file is resident.
  • the Store that is chosen for retrieval will be the Store with the lowest value of a priority property. This will typically be the fastest store.
  • DSM system 10 may determine if a mounted volume contains the file or if a drive is available in a library that contains the file. If the file is not found on the local DSM Server, the Servers in a vhSearch vector for the Store are searched.
  • Second, a copy of the file from the selected Store to the Client application is scheduled.
  • Third, a reload of the file is scheduled if the applicable policy indicates to do so.
  • ReloadToStore policy property for the chosen store is not null, and the user request did not specifically suppress reload, then a copy from the chosen Store to the Store specified by the ReloadToStore property is scheduled.
  • Reloads are not scheduled if the retrieve request specified NO_RELOAD.
  • Reloads are not scheduled if the retrieve is from a remote DSM server, or the file is reloaded to the Store from which that the file was originally requested.
  • migration is accomplished by performing two operations: (a) copying files from a Store to its Migration store so that a copy is retained; and (b) deleting files from a Store that exist in one or more other Stores.
  • the migration policy has an effect on when these operations are performed.
  • DSM system 10 makes use of four thresholds that control these operations. Copy High- Water Mark (CopyHWM) starts copying to the migration store when the held (unmigrated) bytes exceed this threshold. Copy Low- Water Mark (CopyLWM) stops copying when the held (unmigrated) bytes goes below this threshold.
  • CopyHWM Copy High- Water Mark
  • CopyLWM Copy Low- Water Mark
  • Delete High- Water Mark starts deleting copied files when the allocated bytes goes over this threshold.
  • Delete Low- Water Mark stops deleting when the allocated bytes goes under this threshold
  • FIG. 3 illustrates the concept of watermarks.
  • Bytes that have not been copied to the Migration Store are not eligible to be removed (deleted) from the Store, and are referred to as "Held” bytes.
  • Bytes that have been deleted or never allocated are eligible to be reused, and are referred to as "Free” bytes.
  • the other bytes in the Store that have been allocated and migrated, are "Removable” bytes.
  • the goal of the Migration Agent is to keep the level of allocated bytes between the DeleteHWM and the DeleteLWM so that some Free bytes are always available, and to keep the level of
  • the higher-priority task is to keep Free bytes available to be used for files that are being written to the Store by deleting Removable Bytes.
  • the Migration Agent cannot delete more Removable Bytes than exist, so it may have to copy files to the Migration Store to make those
  • Water Mark Migration with Deletion is a strategy that combines the copy and delete operations.
  • CopyLWM is equal to DeleteLWM
  • CopyHWM is equal to DeleteHWM.
  • the migration agent starts to copy files to the Migration Store when the CopyHWM is reached. It copies as many files as necessary to the Migration Store, and deletes them from the current Store, until the CopyLWM is reached.
  • This strategy attempts to always maintain an amount of free space on the Store between the CopyHWM and the CopyLWM. This is the classic water mark migration strategy.
  • the Store can be viewed as a tank of water with a fill pipe and a pump.
  • the pump (the migration agent) turns on and begins emptying the tank.
  • the pump shuts off.
  • water can be entering from the fill pipe. If water is not pumped out faster than it is coming in, the incoming water is backed up or the tank overflows.
  • CopyLWM and the amount of caching can go quite low. It may be difficult to determine the best setting for the Water Marks, especially if the workload fluctuates. Furthermore, the water marks may have to be adjusted as the average workload increases or decreases over time.
  • a variation on the first migration strategy can be referred to as water mark migration without deletion. According to this strategy, files are copied to the Migration Store in the manner described above, but the files are not deleted from the current Store until the space is actually required. In this manner, the DeleteHWM and DeleteLWM are set above the CopyHWM and CopyLWM. This approach is generally acceptable, because once the copy has been made, it generally takes relatively little time to delete the file. In theory, it may be possible to set the Delete water marks at 100% to cause files to be deleted exactly when the space is needed.
  • deleting is not instantaneous, and in fact may require mounting one or more volumes. Therefore, it may be desirable to keep some amount of space immediately available by deleting files before the space is needed.
  • An advantage of this strategy is that more of the Store is available for caching files. The disadvantage is that some overhead is required to delete the file when the space is needed, and that overhead is not postponed to off-peak times. As with the first strategy, it may be difficult to determine the optimum water marks and they may need periodic adjustment.
  • a variation of the second migration strategy is to schedule the migration of files as soon as they enter a Store, in essence setting the CopyHWM and CopyLWM to zero. This strategy can be referred to as immediate migration.
  • the advantages are that migration can occur as a continuous low-priority activity, and caching efficiency is optimized as it is in the second strategy. This strategy also has less dependence on the selection of optimal water marks.
  • the disadvantages are that neither copying nor deleting files is postponed to off-peak times, so they may compete with normal activity. However, both may occur as low-priority tasks.
  • there are six counts associated with each Volume that indicate the states of the space (bytes) on that volume. The . sums of the counts for all the Volumes in the Store are maintained in the Store. The bytes are always in one exactly of the states: Free, Reserved, Held, Migrating, Removable, or Deleting. The common state transitions are shown in FIG. 5 and described below.
  • FIG. 5 completes the picture shown in Fig. 3 by mapping the six states to the watermarks.
  • the specific goal of the migration agent is to keep the number of allocated bytes between the Delete LWM and the Delete HWM, and to keep the number of Unmigrated bytes between the Copy HWM and the Copy LWM.
  • the Migration Agent monitors the migration delete algorithm
  • the Migration Agent when it finds that the level of allocated bytes in a Store is greater than the DeleteHWM, it selects files to be deleted, and schedules them for deletion.
  • the Migration Agent will continue to select files to be deleted until the number of allocated bytes is less than or equal to the DeleteLWM.
  • the Migration Agent does . not consider Stores whose DeleteCriteria is DELETE_NEVER.
  • the percentage of bytes allocated is calculated using the following equation.
  • the migration agent will select physical files (Chunks) for deletion by age (current time minus the LastAccessTime). A file is not eligible to be deleted from a Store if any of the following are true:
  • the file is being read from any store (File.ReadCount > 0); (2) the files is scheduled to be deleted or written to the stores, as indicated by the File.vsScheduled and File. vsDeleting vectors; (3) the DeleteCriteria of the Store is DELETE ANY STORE and the file does not exist on at least one other Store (determined from the vsResidence property); (4) the DeleteCriteria of the Store is
  • the Migration Agent monitors the stores, and when it finds that the number of percent of unmigrated bytes in a Store is greater than the CopyHWM, it schedules files to be copied to the Migration Store.
  • the steps in migration are as follows: First, it is determined if migration is necessary.
  • the Migration Agent will set the Store's bMigrating flag, select files to be migrated (step 2 below), and schedule the migration (step 3 below).
  • the BytesMigrating variable will be incremented and the BytesHeld variable will be decremented.
  • the Migration Agent will continue to select files for migration until the ratio in the equation above is less than the low water mark, CopyLWM. It will then clear the Store's bMigrating flag.
  • the file to be migrated is selected.
  • a file will be selected for migration from a Store based on the Store migration algorithm.
  • a file is eligible for migration from a Store if: (a) the file is not marked as deleted, where obsolete files are migrated; (b) the file is not being deleted from the present store, i.e., the store is not in the vsDeleting vector; and (c) the file is not already being copied to the migration store, i.e., the Store is not in the vsScheduled vector.
  • the Migration Agent will add the store to vsScheduled and schedule a copy to the migration Store. If the file is part of a fileset and other files from the same fileset are present in the Store, then those files will be selected next for migration if fileset migration is selected.
  • a request to copy the file to the migration Store is made if the file does not exist there.
  • the FileSegmenter will add the destination store to the vsScheduled property of the file and handle the physical IO of the file chunk(s).
  • the BytesMigrating property of the source store will be incremented by the file size, and the BytesHeld will be decremented.
  • the IOScheduler is invoked to handle the physical copy.
  • the Store is removed from destination vsScheduled and added it vsResidence for the file.
  • the file size is decremented from the source BytesMigrating. If all of the criterion to release the hold on the file are satisfied, the file size is added to BytesRemovable. Otherwise, the file size is added to BytesHeld.
  • a reload request In reloading a file, a reload request typically causes a file to be copied to the highest-priority Store where the size of the file does not exceed the FileBypassSize of the Store.
  • the source of the copy is the highest-priority Store that contains a copy of the file.
  • a reload request can be handled in the following manner: (a) select the reload source; (b) select the Store with the lowest value of the Priority property that contains a copy of the file as determined by the File vsResidence property; (c) if the file is not found locally and there are remote Servers configured, search the remote servers for the file; (d) select the reload destination; (e) select the Store specified by the ReloadToStore property of the source Store selected in step (a) if the source Store is local, but if ReloadToStore is null, then no reload is performed; (f) if the source store is on a remote server, then select an Initial Store on the local .
  • files will be queued for migration.
  • the actual migration may happen during a preset up time, for example, over night. A maximum number limit will be set by policy for this queue.
  • the migrating queue will not be affected by any action after the file or fileset is already in the Queue.
  • migration policy may be to migrate the least recently used file first.
  • HWMK is reached, if a least recently used file A is put in the migrating queue before the actual migration takes place, then there comes some kind of request for file A.
  • DSM system 10 will fulfill the requirement and File A now is not least recently used any more but it will be still migrated according to the exiting queue.
  • the CopyTo function can be queued and the action will be taken in a preferable time interval or according to some other condition.
  • a warning can be given when BytesTotal below a given minimum bytes.
  • System should be prepared to add new volumes into FreeStore.
  • Media added to FreeStore must be DSM-compatible media and can be formatted or unformatted.
  • the added media is given a DSM global unique identifier (guid).
  • guid DSM global unique identifier
  • the age of a file is the difference between the current time and the time the file was last referenced as recorded in the LastReference property. After being deleted, the file is unrecoverable if the Store's bRetainDeleted property is not set. If bRetainDeleted is set, then the file is marked obsolete but can be recovered. If the file is marked obsolete, its LastReference property is set to the time that it was marked as obsolete.
  • the agent also scans for obsolete files and deletes any whose age exceeds the ObsoleteLifetime property of the Store. An agent also scans files in Stores for files whose age exceeds the MaxSaveTime of the Store, and deletes those files from the Stores (migrating when necessary).
  • each media can specify a MaxAge that sets the age at which media should be copied to new media and discarded.
  • An agent monitors the age of media, i.e., current time minus CreationTime, and instigates the replacement process at the appropriate time.
  • the following steps can be used: (a) configure a new Store (NewStore) that is to replace an existing Store (OldStore); (b) configure NewStore to have the same Priority as OldStore; (c) configure the Stores that migrated to OldStore to migrate to NewStore; (d) configure the Stores that sent bypassed files to OldStore to send bypassed files to NewStore; (e) configure NewStore to copy to the Stores to which OldStore copied; (f) configure NewStore to migrate to the stores to which
  • OldStore migrated (g) configure NewStore to bypass to the store to which OldStore bypassed; (h) configure OldStore to migrate to NewStore; (i) set the ForceMigration flag in the Policy for OldStore.
  • An agent will force the migration of all the files in OldStore to NewStore. Since the new configuration does not store or migrate any files to OldStore, it can be removed once the migration is complete. During the migration, retrievals and reloads of files that are still on OldStore will continue to operate in the usual manner.
  • an agent can be made responsible for periodically checking the LastRetensioned property of the media against the Retensionlnterval property of the media type and performing the retension operation at the proper time.
  • an agent For compactable media, typically tapes, an agent will be responsible for periodically comparing the BytesRemovable of a volume against the MinBytesToRecover property of the media type. When BytesRemovable exceeds MinBytesToRecover, a compaction of the media will take place.
  • An agent can be provided to merge oldest files onto volumes for export. When the library capacity reaches some threshold, an agent will combine the oldest files on loaded volumes onto a single volume for export. That volume can then be exported and a fresh volume can be imported.
  • a number of policy examples are set forth below for pu ⁇ oses of illustration.
  • a first example pertains to a policy that provides for immediate migration.
  • files that are smaller than the Store 0 FileBypassSize are written to Store 0, and then copied to both Store 1 and Store 2.
  • Files that are larger than the Store 0 FileBypassSize and smaller than Storel FileBypassSize are written to Store 1 , and then copied to Store 2.
  • Files that are larger than Storel FileBypassSize are written to Store 2. Because the copies to the other stores are scheduled right away, there is no need for high water mark migration.
  • a second example provides for HWM migration and backup. There are three cases of interest in this example.
  • the file is smaller than Store 0's FileBypassSize.
  • the file is written to Store 0. When the write is complete, the file is copied to Store 3. When the file ages from Store 0, it migrates to Store 1. Because it already exists on Store 3, it is not copied there again. When it ages from Store 1 , it migrates to Store 2. Because it already exists on Store 3, it is not copied there again. The file is larger than the StoreO FileBypassSize and smaller than the Store 1 FileBypassSize. The file is written to Store 1. After the write is complete, the file is copied to Store 3. When it ages from Store 1 , it migrates to Store 2.
  • Store 2 and Store 3 are both at the lowest level, so vsCopyTo and MigrateToStore would be null.
  • Store 3 is a backup store, so bRetainDeleted and bRetainVersions would be set TRUE.
  • This type of Store is essentially a serial record of everything written to DSM. Any version of any file would be recoverable from this media. This policy can be modified by the ObsoleteLifetime and MaxLifetime properties of the
  • a third example concerns migration to a new media type. This is similar to the second example, except that a new Store, Store 4, has been added to the system - with the intention of migrating all files from Store 1 to Store 4. Note that Store 4 takes the place of Store 1, and Store 1 now just migrates to Store 4. The ForceMigration property would be set in Store 1 to cause all the files in Store 1 to migrate to Store 4 even though they might not otherwise migrate.
  • MigrateToStore should usually be the same as the FileBypassStore if there is a FileBypassStore. If a Store (StoreA) specifies a FileBypassStore (StoreB), then StoreB should have the same CopyTo stores StoreA. This ensures that a copy is always done to StoreB.
  • the FileBypassSize should increase for each level of Store. Each Store related through policy should have a unique value in its Priority attribute. Setting the "bRetainDeleted" property of a Store may be inconsistent with specifying a MaxLifetime for the Store unless an ObsoleteLifetime is also specified.
  • DSM system 10 may inco ⁇ orate the Fileset feature whereby groups of images associated with a particular customer, project, or transaction are generally stored together on common media. It is noted, however, that multiple images that are associated with one another are stored together in thumbnail and screen resolution versions on short-term media for ease of access, while associated high resolution versions are stored, preferably together, on longer-term media.
  • the fileset feature allows a user to group files into a logical collection, and perform operations on the files as a group.
  • An example is a group of image files representing images scanned from a single roll of film.
  • the thumbnail and screen resolution versions of each image are stored together with those of associated images in the roll of film to facilitate quick web access to the images.
  • the high resolution versions of the film images can be . migrated offline.
  • the thumbnail, screen resolution, and high resolution images can all be stored on the short-term media, subject to subsequent migration of the high resolution images.
  • system 10 can be configured to include in the fileset images from multiple rolls of film submitted by a common customer.
  • the fileset could include audio, video, or other content originated by the common customer.
  • filesets can be configured to have their member files reside together on media, so that operations on the fileset, such as archival, migration, retrieval, and deletion, can be performed significantly faster than operating on the individual files, which might otherwise be distributed across multiple media.
  • DSM system 10 can be further configured to take advantage of metadata associated with each media volume and each file, thereby providing features referred to as media independence and self-describing media. Metadata can be generated for each volume in the form of a file that uniquely identifies the volume with a guid and other useful information. In this case, system 10 may generate the metadata. For individual files, however, it may be desirable for the client application to generate metadata, including a guid and information concerning the particular customer, project, or transaction associated with the file.
  • the client application can pass through to DSM system 10 simply the content of the image file as a blob and the content of the metadata file as a blob.
  • DSM system 10 need not be concerned with the content of the image/metadata file. Rather, the client application provides sufficient information to the database server, such as a SQL server, to allow DSM system 10 to locate the file and retrieve it for direct access by the client application.
  • the client application is responsible for the format of the image file.
  • system 10 may further include a feature whereby volume metadata is stored on each physical volume to track the volume across within a local server or across a network.
  • the metadata can be useful in tracking volumes and files, verification of the identities of loaded volumes, and database recovery.
  • FIG. 7 is a functional block diagram illustrating the interaction between a client application and a server component in a system as shown in FIG. 1.
  • a client application programming interface also can be included to provide a number of features to a programmatic user of the system and method.
  • DSM client 66 can use the client API to allow for file storage and retrieval by DSM server 14 that is either directed by a user application 68 to or from a particular collection of media 70, or is undirected.
  • the client API can facilitate file grouping by DSM server 14 by allowing fileset creation and modification by user application 68 to generate logical collections of files on storage media 70.
  • the client API also can be used to provide restricted, password-protected logons to distributed server systems, user and user group permissions on files, and a set of privileged, administrator-only functions. For access to a remote DSM server
  • the client API implemented between DSM client 66 and the remote DSM server can provide additional levels of security.
  • the server system can be protected first by a firewall on the server side of the network 72, second by a HSM system server logon and password, third by a file system access control lists, such as the NTFS ACL provided by the Windows NT operation system, and finally by database server logon and password.
  • the client API enables policy creation and modification to provide settings for determining default behavior of the storage management function and its background agents.
  • System configuration also can be managed via the client API, such that hardware and software setup is API-configurable.
  • the server systems can be local or remote.
  • the client API implementation can be made to communicate through industiy-standard TCP/IP protocols across network 72 to distributed server systems running on any machine that is IP-addressable by the machine on which a user application 68 runs.
  • policies can be defined to control the set of media 70 to be used for storage of files associated with particular clients, the movement and replication of the files to other sets of media, .
  • FIG. 8 is a functional block diagram illustrating the interaction between a web- based client application 74 and a DSM server component 14, via an automation server component 76 that links the client application, DSM client 66, and server 14, over a network 72, in a system 10 as shown in FIG. 1.
  • System 10 also can be configured to provide direct access to the storage function instead of indirect access, e.g., via a SQL server.
  • the system can be arrange to give privileged clients direct readonly access to the file-oriented storage media, with guarantees that the files they specify will remain on that media until they specify otherwise. This provides, for example, the fastest and most direct access to media files that need to be published to web sites for use by web applications 74, by allowing web servers to publish stored files without having to make a copy of the file elsewhere.
  • This direct access feature also can support specific user-specified file extensions so that web applications 74 that trigger on file extension, can use this feature.
  • the system and method may employ an architecture that allows a variety of functions to be allocated to the same or different computer servers in a single storage management system.
  • system 10 can use an interface referred to as an automation server 76.
  • Automation server 76 is created as prescribed by applicable
  • This aspect of automation server 76 allows a web interface to attach and utilize the client API implemented in DSM client 66 as described above. This interface is portable across any user application that requires a web browser interface 74 to the storage management system 10. Another unique component is the locking down of individual images on a "Hot" cache to avoid migration. Finally, in storing the images, this automation server specifies a file extension to append to a GUID assigned to the file. This provides great advantage - when viewing the images from a browser 74 that does not support any type of image file decoding. For example, when viewing an image, the Netscape Navigator web browser only examines the file name extension such as .jpg , .gif, .tif, etc.
  • Exe Automation Server This aspect of automation server 76 allows a user to kick off and get a return result from any executable program such as within DSM client 66 and DSM server 14 out of a web browser interface 74.
  • This aspect of automation server 76 allows the calling of image processing tools within DSM client 66 and DSM server 14 from a web browser interface 74.
  • Event Log Automation Server Because web interfaces are page driven and stateless, debugging is extremely difficult. This aspect of automation server 76 allows any web interface 74 to log errors encountered from within a web page to the NT event logging facility on any NT server, e.g., DSM server 14. In this manner, any event at web interface 74 is translated for programmatic delivery to the NT server, and can be logged for analysis.
  • NT server e.g., DSM server 14.
  • System 10 also can be configured to facilitate web-based workflow within an web application 74 that invokes the storage management function provided by DSM client 66 and DSM server 14. From various web pages, and calling the above automation server 76, for example, an application can scan and/or acquire images from a high speed film scanner, photoscanner, flatbed scanner, PCMCIA cards (digital cameras), or any TWAIN compatible device (this is configurable). Those images then can be compressed into three image file types: (1) a thumbnail image, (2) a screen resolution image, and (3) a high resolution image. All three versions of each image are sent and "checked-in" to the storage management function with a file extension known to be processable by web browser 74, e.g., .jpg, via automation server 76.
  • a file extension known to be processable by web browser 74, e.g., .jpg
  • the client application supporting the web-based workflow within web browser 74 converts each acquired image to the supported format such that the user can have direct access to the images maintained by the storage management function of DSM server 14 within the web browser.
  • the screen resolution and thumbnail versions can be "locked" down by DSM server 14 on the short-term media, such as a RAID, and never allowed to migrate offline to tape.
  • the high resolution image may be allowed to migrate according to user-defined migration policies then in effect. In this manner, internet access to the images is as quick as possible.
  • the locked-down images are not stored by the client database server, but rather by the HSM server for direct access by the user. By migrating the high resolution image, however, storage costs for RAID are dramatically reduced.
  • an application can be configured to store images as economically and efficiently as possible using system 10, with the potential for growth of storage capacity being unlimited and scalable.
  • system 10 may provide a "digital vault" function within a client application implemented using web browser 74.
  • This function can be supported in part by the fileset feature.
  • each consumer has his/her own unique digital vault that is accessible as a web page and resembles an electronic safety deposit box.
  • This digital vault contains images that are categorized and stored in folders that are graphically represented on web page browser 74 as a set of drawers.
  • Each set consists of a single or multiple set of images that were acquired from one of the acquisition devices described above.
  • One set could be a single roll of film, another could be a scanned legal document or documents, another set can be a VHS tape or tape library.
  • This vault can be password and login protected. All image transmissions can be done under SSL.
  • the vault image viewing is also secured through a virtual directory service, another sequence of logins into the storage management system 10, a Microsoft SQL Server, and the Windows NT NTFS file system itself via ACL.
  • the consumer From the vault, the consumer can proof and create his/her own media order. That media order is placed into the a "shopping basket" for the consumer, and totaled for payment and shipping.
  • the consumer may also send those images to a third party via internet mail. Those images stored within the vault are set on an aging algorithm where after a predetermined number of days, the images are deleted from system 10.

Abstract

A system and method for managing the storage of files within an HSM system incorporate an architecture and methodology that facilitate the storage and retrieval of large image files as part of an overall image processing workflow. In particular, the system and method may find ready application in a workflow that involves the processing of groups of images associated with particular customers, projects, or transactions, and may act as a storage server for a client application that implements the workflow. The system and method may be useful, for example, in handling the storage of images uploaded from scanned photographic film, or digital images submitted to a photo-processing shop by amateur or professional photographers. In this case, the client application can be a photo-processing application that could provide for various media formats, sizes, and quantities of image reproductions for a consumer. As another example, the system and method may be useful in handling the storage of medical diagnostic images associated with a particular medical patient or study. In this case, the client application could be a picture archival communication system (PACS) that manages the archival of imagery for viewing by physicians. Further, the system and method may be useful in handling the storage of images associated with particular printing jobs, e.g., for publishers, advertising customers, and the like. In this case, the client application could be a digital prepress workflow application.

Description

HIERARCHICAL DATA STORAGE MANAGEMENT
TECHNICAL FIELD The present invention relates to data storage and, more particularly, to systems and methods for hierarchical storage management.
BACKGROUND
Hierarchical storage management (HSM) systems allow the managed storage of data files among a variety of media such as magnetic hard disks, magneto-optic disks, and magnetic tape. The various media differ in access time, capacity, and cost. Thus, HSM systems typically are configured such that files that are accessed more frequently or created more recently are stored on "short-term" media having the shortest access time. Short-term media often includes a group of magnetic disks, which may be arranged as a redundant array of independent disks (RAID). Files that are accessed less frequently, created less recently, or have larger sizes are stored on "long-term" media having longer access times and larger storage capacities. Long- term media in an HSM system may include rewritable optical disks and magnetic tape media, which can be arranged in a jukebox of magneto-optical disks or a tape library, respectively.
Existing HSM systems typically allocate individual files across the hierarchy of storage media based on frequency of use, creation date, or file size, as discussed above. Accordingly, HSM systems generally seek to avoid excessive access delays in retrieving information that is likely to be accessed most often or most frequently by users. As new files are generated, the system stores the files on the short-term media using a "best-fit" approach. In this manner, the system distributes files across the short-term media in order to minimize wasted storage space. Thus, each file may be stored on a different medium in order to most efficiently manage storage space.
A central database maintains the storage location of each file within the HSM system. If users do not request a particular file for an extended period of time, the system automatically migrates the corresponding file to the longer-term storage media and updates the file location database. Again, the system distributes the relocated file across the long-term storage media in a manner calculated to minimize wasted storage space. For image files, an HSM system may store a number of copies at different display resolutions and on different media to facilitate identification and retrieval by a user. When a user requests a particular file, the system accesses the database to determine the current location of the file. If the desired files reside on longer-term storage media, the system automatically retrieves the files and moves them to the short-term media. If some of the media is not currently loaded into the longer-term storage device, the system generates a request for personnel to physically locate the media and load it into the storage device.
SUMMARY
The present invention is directed to a system and method for managing the storage of files within an HSM system. The system and method are especially useful in managing the storage of larger files that include graphic imagery. The system and method may incorporate an architecture and methodology that facilitate the storage and retrieval of image files as part of an overall image processing workflow. In particular, the system and method may find ready application in a workflow that involves the processing of groups of images associated with particular customers, projects, or transactions, and may act as a storage server for a client application that implements the workflow. The system and method may be useful, for example, in handling the storage of images uploaded from scanned photographic film, or digital images submitted to a photo-processing shop by amateur or professional photographers. In this case, the client application can be a photo-processing application that could provide for various media formats, sizes, and quantities of image reproductions for a consumer. As another example, the system and method may be useful in handling the storage of medical diagnostic images associated with a particular medical patient or study. In this case, the client application could be a picture archival communication system (PACS) that manages the archival of imagery for viewing by physicians. Further, the system and method may be useful in handling the storage of images associated with particular printing jobs, e.g., for publishers, advertising customers, and the like. In this case, the client application could be a digital prepress workflow application. The system and method may incorporate a number of architectural and functional features capable of improving overall economy of storage while maintaining workflow efficiency for the user, as described below. Such features may be used together or independently of one another according to the needs of the particular storage and workflow applications.
As an example, a fileset feature can be provided that permits grouping of images associated with a particular customer, project, or transaction. The fileset feature allows a user to group files into a logical collection, and perform operations on the files as a group. Moreover, filesets can be configured to have their member files reside together on common media, so that operations on the fileset such as archival, migration, retrieval, and deletion can be performed significantly faster than operating on the individual files, which might otherwise be distributed across multiple media.
In this manner, the system and method are capable of intelligently grouping images for storage together on common storage media to alleviate excessive access times in retrieving the images. Thus, the fileset feature can be used to avoid the scattering of associated images across diverse storage media having different access times, and resulting delays in incorporating such images into the workflow of the client application. In addition, a fileset can be configured such that activity on an individual file member triggers the same activity on the other file members of the fileset.
Also, a media independence feature can be incorporated whereby data can be stored on particular volumes without knowledge of the media type of the volume. In this case, the system and method provide an application programming interface that is substantially identical for all media types, for both direct access and sequential access, and for removable and non-removable media.
Along with media independence, the system and method may further include a feature that allows the storage volumes to support self-describing media. Normally, an inventory of the contents of a volume are maintained as entries in a database. This self-describing media feature provides the capability to reconstruct the database entries for the files stored on a volume when a volume is physically moved to another system that is not on the same network. A volume may be physically moved, for example, when the database has been corrupted, when identifying unlabeled or mislabeled media from shelf storage, or when a volume is moved to a remote server
The self-describing media feature can be implemented by storing volume metadata on each physical volume and by storing file metadata for each data file on the same physical volume on which the data file is stored. Thus, each file may include two files: a blob file with the actual file contents and a metadata file with identifying information. The metadata provides sufficient information to rebuild an inventory of the contents of a volume without access to the original file location database. The metadata can be useful in tracking volumes and files, verification of the identities of loaded volumes, and database recovery. The metadata can be specified by a client application and can be substantially transparent to the storage management server, providing information known to the client application such as the particular client applications and users associated with the volume or file. For unique identification of volumes and files, the metadata may incorporate a global unique identifier (guid) that is unique on a single server, as well as across a multi-server system. Also, a guid can be generated for each storage device. The guid can be used to track the location of a volume or file in the event the volume is moved across the system, or the file is moved across the various media volumes on a server or across the system. The guid for a particular file can be generated by the storage management system, but preferably is generated by the client application along with other metadata useful in referencing the file.
A client application programming interface (API) also can be included to provide a number of features to a programmatic user of the system and method. In particular, the client API may allow for file storage and retrieval that is either directed by a user to or from a particular collection of media, or is undirected. Further, the client API can facilitate file grouping by allowing fileset creation and modification to generate logical collections of files.
The client API also can be used to provide restricted, password-protected logons to distributed server systems, user and user group permissions on files, and a set of privileged, administrator-only functions. In a network configuration, in particular, the client API can provide additional levels of security. For example, the server system can be protected first by a firewall, second by a HSM system server logon and password, third by file system access control lists, such as the NTFS ACL provided by the Windows NT operating system, and finally by database server logon and password. In addition, the client API enables policy creation and modification to provide settings for determining default behavior of the storage management function and its background agents.
System configuration also can be managed via the client API, such that hardware and software setup is API-configurable. With the client API, the server systems can be local or remote. In particular, the client API implementation can be made to communicate through industry-standard TCP/IP protocols to distributed server systems running on any machine that is IP-addressable by the machine on which an application runs.
The policies implemented by an administrator govern several behavioral aspects of the data and media management functions. For example, policies can be defined to control the set of media to be used for storage of files associated with particular clients, the movement and replication of the files to other sets of media, peer-to-peer data movement between servers, the timing of data movement and replication, maximum file size allowed for various sets of media, retention intervals on various sets of media, versioning, fileset behaviors, media lifetime, tape retensioning, tape compaction, automated media replacement, and automated media upgrades.
Further, the system and method may employ a unique algorithm for migration of data. This algorithm migrates data based on four distinct watermark levels: the deletion high-watermark and low-watermark, and the copy high- watermark and low- watermark. Different combinations of values for these watermarks allow the single migration algorithm to function in various ways, based on customer requirements. Three of the common scenarios that can be supported by the migration algorithm are (1) watermark migration with deletion; (2) copy without deletion; and (3) copy immediately. For scenario (1), files are copied to another storage device when a high- watermark is reached, and deleted from one storage level soon after they are copied to another level. This keeps large areas of space available to handle burst activity, and allows migration to occur in batches and be performed at low-load intervals. For scenario (2), files are copied to another storage device when a high-watermark is . reached, but not deleted from a storage until space on that storage is low. This can improve caching by retaining files for a longer period of time, but may cause delays during large peak loads while files are being deleted. It also allows migration to occur in batches, and can be performed at low-load intervals. For scenario (3), files are scheduled to be copied soon after they are stored, but not deleted from the storage until space on the storage is low. Here, migration can occur as a continuous, low- priority task. The system and method also can be configured to provide direct access to the storage function instead of indirect access, e.g., via a SQL server. In particular, the system and method can be arranged to give privileged clients direct read-only access to the file-oriented storage media, with guarantees that the files they specify will remain on that media until they specify otherwise. This provides, for example, the fastest and most direct access to media files that need to be published to web sites, by allowing web servers to publish stored files without having to make a copy of the file elsewhere.
This direct access feature also can support specific user-specified file extensions so that applications that trigger on file extension, such as certain web browsers, can use this feature.
As another feature, the system and method may employ an architecture that allows a variety of functions to be allocated to the same or different computer servers in a single storage management system. For example, a central controller, e.g., one per system, handles initial requests from client applications, communicates with library controllers to get volumes mounted, and sets up communication paths to data movers. Library controllers, e.g., one or more per system, handle the movement of media in manual, fixed-media, and/or automated media libraries. Data Movers, e.g., one or more per DSM system, handle the actual movement of data to and from clients and other data movers. A database, e.g., one per DSM system, provides database services to the central controller.
The system and method also can make use of peer-to-peer configurations in which multiple storage management systems can participate as peers in storage management policies. Files that are stored in one system can be migrated or copied to other systems either automatically through administrator-defined policies, or explicitly through client application control. The system and method are configured to maintain file identity, even across multiple servers, by assigning each file a globally unique identifier (guid), as described above.
In order to link web pages to executable programs, the system and method can use an interface referred to as an "automation server." The automation server is created as prescribed by applicable Microsoft specifications. Automation servers having features unique to the system and method of the present invention are (1) a Storage Automation Server; (2) an Exe Automation Server; (3) an Image Tool
Automation Server; and (4) an Event Log Automation Server. Storage Automation Server (1) allows a web interface to attach and utilize the client API described above. This interface is portable across any user which requires a web connection to the storage management system. Another unique component is the locking down of individual images on a "Hot"" cache to avoid migration. Finally, in storing the images, this automation server specifies a file extension to append to a GUID assigned to the file. This provides great advantage when viewing the images from a browser which does not support any type of image file decoding. For example, when viewing an image, the Netscape Navigator web browser only examines the file name extension such as .jpg , .gif, .tif, etc.
The Exe Automation Server (2) allows a user to kick off and get a return result from any executable program out of a web page. The Image Tool Automation Server (3) allows the calling of image processing tools from a web page. Because web interfaces are page driven and stateless, debugging is extremely difficult. The Event Log Automation Server (4) allows any web interface to log errors encountered from within a web page to the NT event logging facility on any NT server. In this manner, any event at the web interface is translated for programmatic delivery to the NT server, and can be logged for analysis.
The system and method also can be configured to facilitate web-based workflow within an application that invokes the storage management function. From various web pages, and calling the above Automation Servers, for example, an application can scan and/or acquire images from a high speed film scanner, photoscanner, flatbed scanner, PCMCIA cards (digital cameras), or any TWAIN compatible device. Those images then can be compressed into three image file types: (1) a thumbnail image, (2) a screen resolution image, and (3) a high resolution image. All three versions of each image are sent and "checked-in" to the storage management function with a file extension known to be processable by the web browser, e.g., .jpg, via the automation server.
Thus, the client application supporting the web-based workflow converts each acquired image to the supported format such that the user can have direct access to the images maintained by the storage management function within the web browser. No images need to be kept directly within the client database. The screen resolution and thumbnail versions can be "locked" down on the short-term media, such as a RAID, and never allowed to migrate offline to tape. At the same time, the high resolution image may be allowed to migrate according to user-defined migration policies then in effect. In this manner, internet access to the images is as quick as possible. Also, the locked-down images are not stored by the database server, but rather by the HSM server for direct access by the user. By migrating the high resolution image, however, storage costs for RAID are dramatically reduced. In other words, an application can be configured to store images as economically and efficiently as possible using the system and method of the present invention, with the potential for growth of storage capacity being unlimited and scalable.
As a further feature, the present invention may provide a "digital vault" function within a client application. This function can be supported in part by the fileset feature. With this feature, each consumer has his/her own unique digital vault that is accessible as a web page and resembles an electronic safety deposit box. This digital vault contains images that are categorized and stored in folders that are graphically represented on the web page as a set of drawers. Each set consists of a single or multiple set of images that were acquired from one of the acquisition devices described above. One set could be a single roll of film, another could be a scanned legal document or documents, and another set can be a VHS tape or tape library. This vault can be password and login protected. All image transmissions can be done under SSL. The vault image viewing is also secured through a virtual directory service, another sequence of logins into the storage management system, a Microsoft SQL Server, and the Windows NT NTFS file system itself. From the vault, the consumer can proof and create his/her own media order. That media order is placed into a "shopping basket" for the consumer, and totaled for payment and shipping. The consumer may also send those images to a third party via internet mail. Those images stored within the vault are set on an aging algorithm where after a predetermined number of days, the images are deleted from the system.
DESCRIPTION OF DRAWINGS FIG. 1 is a functional block diagram illustrating the architecture of a hierarchical storage management system;
FIG. 2 is a functional block diagram of server component in a system as shown in FIG. 1 ;
FIG. 3 is a diagram illustrating implementation of migration policy in a system as shown in FIG. 1; FIG. 4 is a diagram further illustrating implementation migration policy in a system as shown in FIG. 1 ;
FIG. 5 is a state diagram illustrating state transitions during execution of a migration policy in a system as shown in FIG. 1 ;
FIG. 6 is a diagram mapping migration states to watermarks in a system as shown in FIG. 1 ;
FIG. 7 is a functional block diagram illustrating the interaction between a client application and a server component in a system as shown in FIG. 1 ; and
FIG. 8 is a functional block diagram illustrating the interaction between a web- based client application and a server component, via an automation server component that links the client and server, in a system as shown in FIG. 1.
Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION FIG. 1 is a functional block diagram illustrating the architecture of a hierarchical storage management system 10, in accordance with an embodiment of the present invention. System 10 will be described herein as a "directed storage management (DSM)" system inasmuch as it allows a system administrator to introduce, on a selective and reconfigurable basis, significant direction concerning . storage management policies including migration. That is, the client and an administrator have the ability to direct a file to a particular location. This differs from a simple hierarchical storage management (HSM) system that handles the selection internally. The difference is that DSM system 10 provides a much broader set of data- movement policies than a typical HSM, and gives a client ways to override those policies. System 10 can be used to implement a method for hierarchical storage management in accordance with an embodiment of the present invention. The following terms will be used herein to describe the structure and functionality of DSM system 10, and are generally defined as follows. The term "media" refers to a physical piece of data storage media such as a tape, an MO cartridge, a CD, or a hard disk. A "Store" is any collection of like media. The media in a Store can exist in any number of locations, i.e., on shelves, in one or more robotic libraries, or in individual drives. A "freeStore" is a collection of "empty" volumes with the same MediaType. The media in a freeStore can exist in any number of locations, i.e., on shelves, in one or more robotic libraries, or in individual drives. A "File" is a user-defined blob of information that can be stored in DSM system 10. A DSM File will often correspond to a file in a file system. A File may exist in more than one Store at any given time, and will be moved or copied between Stores according to the policies that an administrator has put in place and specific user requests. A "policy" is a rule that governs the automatic movement, i.e., "migration," of files between stores, and removal of files from Stores. A "ticket" or "token" is a unique identifier returned to a client application when a File is stored by DSM system 10. The client will use the ticket for all subsequent references to the File.
With further reference to FIG. 1, DSM system 10 is implemented as a software system having a plurality of logical software components. Such components may include a DSM client library 12, a DSM server process 14, library server processes 16, volume server processes 18, DSM data mover processes 20, client data mover processes 21, DSM agent processes 22, DSM administrator processes 24, and a database server 26. DSM client library 12 is linked with a client application to provide client-side data mover services. DSM server process 14 operates to control the storage and retrieval of files from various data storage devices. Library server processes 16 direct media handling in automated and manual libraries. Volume server processes 18 handle mounted volumes and set up data mover processes 20, 21 to transfer data.
Data mover processes 20 move files between DSM clients and mounted DSM volumes. DSM agent processes 22 perform routine management functions such as migration and compaction. DSM administrator applications 24 provide a special type of client application for managing DSM system 10. Finally, database server 26 provides robust, recoverable, database storage and services for the other processes.
The above component processes 12-26 can reside on a single host machine, or can be distributed across many hosts. The term "process" in this discussion should be inteφreted broadly. A process could be a separate application, or one or more threads within an application. FIG. 1 illustrates how the component processes are related, with bold lines indicating networkable interfaces between component processes. Each host machine may be based on any conventional general purpose single- or multi-chip microprocessor such as, for example, a Pentium® processor, a Pentium Pro® processor, an 8051 processor, a MIPS processor, a Power PC® processor, or an Alpha® processor. Further, the processor can be integrated within a personal computer, computer workstation, or network server to form the host machine, and may be configured to run on any number of operating system environments. The Windows NT operating system platform may be particularly suitable to the storage management tasks carried out by DSM system 10, although the system can be readily adapted for other operating systems such as Unix. DSM client library 12 may be configured to implement a set of client API
(application programming interface) functions that are available to user applications 28. User applications 28 are software applications for manipulation of files stored by DSM system 10. The client API functions can be callable, for example, from C and C++ application programs, and from other languages that can link in C libraries. The basic functions provided by the DSM client library are (1) establish secure connections to DSM server 14; (2) translate client API calls into networkable calls to DSM server 14; and (3) establish client-side data mover processes 21 that allow data to move efficiently between user data streams or files 30 and DSM devices and media.
DSM server process 14 is responsible for servicing multiple concurrent requests from its various clients, i.e., user applications 28, DSM agents 22, and DSM administrator applications 24. The basic responsibilities of DSM server process 14 are (a) provide logon and file-access security for its clients; (b) handle concurrent client requests to a given file by providing simultaneous or sequential access depending on the nature of the requests; (c) translate requests for files into requests for data transfer to or from specific media volume locations; (d) sequence data transfer requests to maximize the utilization and throughput of available devices, while providing a good quality of service to clients; (e) direct the library server process 16 to mount the required volumes at the appropriate points in time; (f) direct the volume server process 18 to establish DSM data mover processes 20 for mounted volumes, and connect those processes to client data mover processes 21, or other DSM data mover processes, as required; and (g) issue commands to data mover processes 20, 21 to effect the requested data transfer.
DSM server process 14 also is responsible for communicating with the DSM administrator processes 24 to report errors and effect changes requested by administrators. A DSM server 14 also can communicate with other DSM server processes 14 to allow clients to share remote files in a multi-server host environment.
A library server process 16 executes on each host that manages removable media for DSM system 10. Library server process 16 issues media movement commands to automated libraries 32, and interfaces to operator consoles for manual media operations, i.e., shelf management. Volume server process 18 handles mounted volumes, issuing drive-related commands, such as locking drives 34 and assigning drives to processes. Volume server process 18 also sets up DSM data movers 20 that are configured to read and write DSM media volumes, and to communicate with other client data movers 21 and DSM data movers 20. A data mover process 20, 21 is instantiated at each data endpoint. DSM client library 12 sets up client data movers 21 to communicate with the user application or the user file system 30. The volume server 18 sets up DSM data movers 20 to read and write DSM volumes. The basic functions of the data movers 20, 21 are (1) receive instructions from the DSM server 14 to begin a data transfer, and return status; (2) read from or write to a data endpoint, e.g., a DSM device, or user file/user stream 30; and (3) transfer data to, or receive data from, another (possibly remote) data mover.
A DSM agent 22 is a privileged client application that is tightly integrated into DSM system 10. DSM agents 22 help enforce policies that are set up by the DSM administrator 24. Examples of operations performed by DSM agents 22 are (a) migrate files from one DSM Store to another; for example, an agent may move files from a RAID disk to a magneto-optical (MO) drive and/or to a tape based on the time of their last reference; (b) delete files that have reached a certain age; (c) compress tape volumes by removing stale files (files that have been deleted by an application); (d) retension tapes; (e) copy and remove aged media; (f) copy or move files to remote DSM system(s); (g) import volumes from a foreign system; and (h) transfer a Store to a different (perhaps more modern) media type.
DSM administrator applications 24 are privileged applications used to configure, monitor, and manage the DSM system 10. In particular, DSM administrator applications 24 allow setting of migration policy and security levels. Administrator applications 24 are written using the privileged DSM administrator API functions. Database server 26 stores information vital to the operation of DSM system 10. The database server 26 provides secure, robust, transaction-oriented storage and efficient search and retrieval mechanisms.
DSM client library 12 provides the client-side implementation of the DSM Client API. The DSM Client API functions are presented to client applications as "C" functions. Internally, the functions are implemented in C++, and a C++ object interface could also be available to client applications. The client library is thread- safe, allowing for concurrent multi-threaded user applications. A user application thread can have multiple DSM connections, and there may be multiple DSM files open on each connection. The library maintains state for each connection and each open file.
All Client API functions can be made synchronous. Certain functions, such as file reloads (DSMReloadFile), can either schedule a task, or wait for the task to complete. In either case, the function returns as soon as the operation it requests (the scheduling or the task) is complete. In a preferred embodimeent, there are no asynchronous mechanisms such as callbacks, waits, or event notifications. For scheduled tasks, there are no specific API functions to find out if the task is completed, although the application can be configured to determine if the task is complete by examining the state of the File or Store involved.
Most of the Client API functions preferably can be translated directly into DSM server function calls and processed entirely in the user thread space. The following functions can be provided to establish a data connection, and create a separate data mover thread to handle the movement of the data between the client file or stream 30 and the DSM device 31 : (1) DsmCreateFile; (2) DsmOpenFile; (3) DsmStoreFile; (4) DsmRetrieveFile; and (5) DsmReplaceFile. Data mover threads are used by DsmReadFile and DsmWriteFile functions. Threads created by DsmCreateFile and DsmOpenFile are destroyed by DsmCloseFile. Threads created by DsmStoreFile, DsmRetrieveFile, and DsmReplaceFile are destroyed at the completion of the operation.
FIG. 2 shows the major components of an exemplary DSM server 14 forming part of system 10 as shown in FIG. 1. In particular, FIG. 2 illustrates the relationships of the DSM server components to other components of DSM system 10. The bold boxes in FIG. 2 denote DSM server components. The first DSM server component to be described is request manager 36. Request manager 36 provides a network connection point for client and administrator applications. The DSM server 14 preferably is multi-threaded, and request manager 36 dispatches incoming requests to appropriate threads so that it can handle other new requests. The various components of DSM server 14 will be described in greater detail below.
With reference to FIG. 2, request manager 36 routes incoming requests based on the type of request. Requests that do not deal with specific files and filesets are handled by security manager 38. These requests include Connection API requests, e.g., Logon, and Security API requests. Requests that change the DSM configuration are handled by configuration manager 40. These requests include all requests that modify the physical configuration (Libraries, and Media), and those that modify the logical configuration (Stores, Volumes, and Remote Servers). Fileset requests are handled by fileset manager 42. Each Fileset denotes a group of associated files. Fileset manager 42 decomposes Fileset requests into multiple File requests and passes them off to file manager 44. Requests that process individual files are handled by File manager 42, Requests for explicit file locks, i.e., a locking down of files onto a particular volume, are handled by lock manager 44. When an input/output operation
(IO) is completed by a DSM client 46 or an IO worker 48, IO completion function 50 is dispatched.
Reference will now be made to security management features of system 10. Security manager 38 interfaces to a DSM security database, and is responsible for establishing logon sessions and maintaining all session information. The tasks handled by security manager 38 include (1) connection management; (2) user administration; (3) file permissions; and (4) store permissions. The connection management task handles logon attempts, validates username and password, and issues a security token to the requesting application (via the client library). Also, the connection management task validates the security token on all other requests. The user administration task handles all maintenance of Users, Groups, and passwords. The file permissions task maintains and validates Users and Group permissions for accessing Files. The store permissions task maintains and validates Users and Group permissions for accessing Stores. Security manager 38 also maintains session-level locking information.
As an example, DSM system 10 may implement a security model having the following characteristics: (1) privileges; (2) DSM Users; and (3) Groups of Users. Privileges include Create, Read, Write (where Write implies overwrite and delete). For every DSM User, a Logon and a Password are assigned. For Groups of Users, (a) every User is a member of one or more Groups; (b) every User has a Default Group of which he is a member, (c) for each session, the User is assigned an Active Group for that session, and Files created during that session are associated with the creator's Active Group; (d) at logon, the Active Group for the User is set equal to the User's Default group; (e) every Group has a DefaultStorelD; (f) the administrator creates Users and Groups, assigns Users to Groups, and sets the User's Default Group; (g) every file has one Owner and one Group; (h) every file has separate read and write permissions for the owner, for the group associated with the file, and for others; and (i) every Store has a set of Create, Read, and Write permissions for each group or user that can directly access the files in that Store. The Active Group can be changed by, the User through the API to any group of which the User is a member and remains in effect for that session or until the User changes it again. If a User creates a new file without specifying a StorelD, the new file is created using the DefaultStorelD associated with the User's current Active Group. The User can change his Default Group, but cannot change the groups to which he belongs. By default, the Owner of a file is the user who creates the file and the Group is the owner's Active Group when the file is created. By default, the owner has read and write permissions, the group has read permissions, and others have neither. This default can be changed by the administrator, while file permissions can be set and changed by the owner and administrators. Finally, store permissions are set by the administrator.
Note that permissions only need to be set for Stores that the Administrator allows users to access directly, typically Initial Stores only. Other stores connected through policies do not need user or group permissions unless users require direct access to those Stores. The Collections of Stores may have the following characteristics: (i) Every Store belongs to zero or more collections; (ii) User and group permissions may be changed on a Collection basis; and (iii) Collections are strictly an administrative convenience. Permissions are always stored on a Store basis, and never on a Collection basis.
When a user connects to DSM system 10, the DSM system verifies that the Userld and Password are known. When a user performs a function, DSM system 10 determines which Store, file, and operation (create, read, or write) is involved. There may be two files, and the files may be in the same or different Stores. Each file may have a different operation. For each file, DSM system 10 will: (1) verify that the user or some group the user belongs to has permission to perform that operation in the Store; and (2) verify that the user has permission to perform that operation. Verification of user permission proceeds as follows: (i) if the user is the owner, compare operation with owner permissions, (ii) else if the user is a member of the file group, then compare operation with group permissions, (iii) else compare operation with other users' permission. The Store permissions need only be checked against the Store specified in the user command. If the file is actually stored to or fetched from another Store, the permissions for the Store used are not checked. No security check . is made on DSM system-initiated operations.
The following tables are exemplary of those associated with the security model within DSM system 10:
Users Table:
UserlD Numeric GUID assigned to the user
Logon The user's logon used to connect to DSM
Password The (encrypted) password used to connect to DSM DefaultGroupID The ID of the user's default group
Groups Table:
GroupID GUID assigned to the group
GroupName The name of the group DefaultStorelD New files will be created in this Store if the user does not specify a StorelD, where the GroupID is the User's
ActiveGroupID.
GroupMember Table (Primary key on UserlD and GroupID fields): UserlD The user who is part of the group
GroupID A group the user is part of
File Table (only security-related attributes are listed): FilelD The GUID for this file OwnerlD The UserlD of the owner of the file
GroupID The GroupID of the group the file is associated with
Permissions The read/write permissions for owner/group/other
Store Table (only security-related attributes are listed): StorelD The GUID for this Store
DefaultPerm Default owner/group/other permissions for new files created in this Store.
StoreAccess Table (indexes on StorelD and on AccessID): StorelD The Store ID
AccessID A GroupID or UserlD that has access to this Store.
Permissions The encoded create/read/write permissions for this user or group m this Store.
Collection Table
CollectionID The GUID for this collection
CollectionName Name for this collection
CollectionMember Table
CollectionID A collection a Store belongs to StorelD The Store that is a member of the collection
When a user connects to DSM system 10, his Logon and Password are verified, and his ActiveGroupID is set to the User's DefaultGroupID. The Stores that the user can access and the permissions for those Stores are obtained once from the
GroupMember and StoreAccess tables and cached by DSM system 10 for the open connections. The number of Stores is typically small, but it could be relatively expensive to check if the user is a member of a number of groups. Once this information is cached, checking Store permissions during this session is trivial. If permissions change while the user is active, the cached permissions become invalid and need to be regenerated.
The cached permissions list for the user contains any Stores for which the user has specific permissions, and Stores for which any group that the user belongs to has permission. User permissions will override all group permissions. Multiple group permissions will be cumulative. That is, if there is a permission record for the individual user for a Store in the StoreAccess table, then records for that Store for any groups to which the user belongs are ignored. If there is no record for the user, then his permissions in a given Store will be the sum of permissions for the groups to which he belongs within that Store. File permission for a user can be checked as follows: (1 ) the logged on user's
UserlD is checked against the file OwnerlD; if they match, the file Owner permissions are used; (2) otherwise, for each group to which the user belongs (from the GroupMember table), the GroupID is checked against the file's GroupID and, if a match if found, the file Group permissions are used; (3) otherwise, the file Other permissions are used. This algorithm is easily encoded in a stored procedure and executed by the database engine.
The above security implementation provides a number of advantages. For example, it is easy to implement basic security, but possible to implement detailed security. Also, this security implementation requires very low overhead for sites that have relaxed security requirements, or implement security using native OS facilities.
The ability to secure individual files is also provided, along with the ability to restrict or enable individual users because user privileges override group privileges. Multiple-group membership makes it easy to add access to Stores and groups of Stores (collections) for individual users or groups of users. Further, temporary groups with limited access are easily created and removed. Finally, store permissions allow sites to basically ignore file privileges (enable all permissions for all users) and control access by Store only. Configuration Manager 40 interfaces to a Configuration Database, and handles all requests that modify the configuration. These include all requests that add, delete, or modify parameters of the following entities: (1) sites and DSM services; (2) racks and shelves; (3) libraries, drives, pickers, slots, and ports; and (4) Media and
Volumes. Configuration manager 40 interfaces with volume manager 52 and with IO Scheduler 54 for media-related and volume-related requests. For example, a request to format a volume will build an IO Request and add it to the IO Request Queue 56.
IO scheduler 54 will handle the request when a drive becomes available, and volume manager 52 will issue the appropriate requests to the library server 16 to mount the volume, and to the Volume Server 18 to format the volume. Fileset Manager 42 handles requests that deal with all files in a Fileset. Fileset manager 42 translates the Fileset request into a set of File requests for the file manager, tracks the completion of the associated File requests, and notifies the requestor when the Fileset operation is complete.
File manager 58 translates client requests and fileset manager requests into FileRequest objects and places them on the FileRequest Queue 60. File manager 58 tracks the completion of File requests and notifies the requestor when a file operation completes.
File scheduler 62 sequences requests for a given file when necessary.
Logically, File scheduler 62 maintains a First-In-First-Out (FIFO) queue of requests for each file. A new request to read a file can be started if there are no write requests for that file ahead of it in the queue. A write request for a file can be started only if there are no other requests ahead of it in the queue.
When File scheduler 62 determines that a file request can be started, it creates one or more IO Requests and places them on the lORequest Queue 56. The file request remains in the FileRequest Queue 60 until it is complete, so that file manager
58 can identify conflicts that might delay the start of new requests for the same file. A DSM file may be split into multiple chunks on one or more physical volumes. Furthermore, if both the source and destination of a file operation are on . DSM media, the source and destination may have an unequal number of chunks and the chunks may be of unequal size. File Scheduler 62 converts a File Request into one or more segment IO requests, where each segment is of equal size on the source and destination volumes.
Note that a new volume may need to be mounted for the source or destination or both at each segment boundary. File Scheduler 62 generates an IO Request for each segment, which includes information about the volumes that are required to satisfy the request. The IO Requests are placed on the IO Request Queue 56 for IO
Scheduler 54.
For the destination of a write or copy request, only the destination Store may be known. Selecting the volume within the Store may be postponed until a drive is available to service the request. The task of selecting a volume is given to the IO Scheduler 54. The role of the IO Scheduler 54 is to select the next IO Request to be processed.
IO Scheduler 54 selects requests from the IO Request Queue 46 in order to maximize the utilization and throughput of available devices, while guaranteeing some level of service to all clients. File conflicts are resolved by File Manager 58, so the IO Scheduler 54 has a great deal of freedom in reordering IO requests. When an
IO request finishes using a drive, the algorithm for selecting the next request to process takes the following into account: (1) high-priority requests that can use the drive; (2) other requests that can use the volume that is in the drive; (3) requests that can use a different volume in the same library; and (4) for sequential devices, such as tapes, the file segment position on the media relative to the current media position. A request may have originated as a high-priority request, or may have had its priority elevated based on the length of time it has been queued. Other requests that can use the volume that is in the drive may include requests that do not specify a specific destination volume, and for which there is room on the volume in the drive. The selected request is passed off to an IO worker thread 48. An IO worker thread 48 is dispatched to handle IO requests, especially those that may take some time. IO requests include Issue Volume requests to the Volume Manager 52. When an IO request is selected, the IO Scheduler requests the necessary Volumes from the Volume Manager 52 and handles the response. The Volume Manager 52 will see that the volumes are mounted and the Data Movers 20, 21 are ready. For data transfers that involve two DSM endpoints (as opposed to one client endpoint and one DSM endpoint), an IO Worker 48 is dispatched to direct the data transfer. When both the source and destination Data Movers 20, 21 are in place, the IO Worker 48 issues the command to the appropriate Data Mover to initiate the data transfer.
When a data transfer is complete, the IO Completion 50 functions handle the completion. It updates the status of the request on the IO Request Queue 56. If an IO completes abnormally, such as when the destination device runs out of space, the IO
Completion routine 50 may create new IO requests to complete the IO operation. When all IO requests associated with a file request are complete, the file manager 58 is notified.
Volume Manager 52 carries out the following tasks: (1) maintains the records in the Volume and Store tables in the database; (2) aids in selecting appropriate destination volumes for IO Requests; (3) gets volumes ready for transferring data; and (4) when data transfer is complete, releases the volumes. In preparing volumes for transfer, volume manager 52: (a) issues Volume Mount requests to the Library Server 16 and handle the responses; and (b) issues requests to Volume Server 18 to prepare the drive and volume for file segment commands, and handle the responses. Volume
Server 18 sets up Data Mover 20, 21. To release the volumes, Volume Manager 52 issues requests to Volume Server 18 to release the volume and drive, and issues requests to Library Server 16 to unmount the Volume.
Database Manager 64 provides a programming interface to the DSM database that is independent of the underlying database implementation. A Library Server process 16 executes on each host that manages removable media for DSM system 10. Library Server process 16 issues media movement commands to automated libraries, and interfaces to operator consoles for manual media operations such as shelf management. The commands that Volume Manager 18 issues to Library Server 16 for normal operations are independent of the type of library involved. Library Server
16 translates the Mount and Unmount commands into specific device commands that control pickers in automated libraries, and/or into operator commands for manual libraries and shelf management of automated libraries.
A Library Server 16 has a well-known port address on which the DSM Server 14 can communicate. A Library Server 16 will spawn multiple threads to allow concurrent operation of multiple pickers. For the cases where the Volumes in a Store are not on removable media, the Library Server function may not be necessary, or may be located on the DSM Server host even if the media resides remotely.
Volume Server process 18 executes on each host having drives that handle DSM volumes. The roles of the Volume Server 18 are to: (1) issue device-oriented commands such as mount the file system and lock a volume in a drive; (2) perform volume-oriented commands such as (a) partition and format a volume, (b) read and write the volume label, (c) return volume statistics from the operating system, such as total space, space used, and space available, (d) enumerate files on a volume, and (e) perform I/O control such as rewind or position; and (3) set up a Data Mover 20, 21 for each concurrent file-related operation. For random-access devices that allow concurrent operations, such as hard disk and MO, a Data Mover would be established for each concurrent operation.
There is one Volume Server process 18 per host that controls DSM drives. The Volume Server 18 has a well-known port that the DSM Server 14 can use to issue commands. The Volume Server 18 is implemented as multiple processes or threads.
The DSM Data Mover 20, 21 objects spawned by the Volume Server 18 may be implemented as threads of the Volume Server process.
Data movers 20, 21 can be instantiated in two places: (1) on client hosts by the DSM client library 12 to read and write user files and streams; and (2) on hosts that have DSM drives by the Volume Server 18 to read and write DSM volumes. A
Data Mover process 20, 21 has two communication ports that are made known to the DSM Server 14. One port is used by the IO Scheduler 54 to issue instructions for file- related operations. Common commands include: (a) create a file; (b) delete a file; (c) return information about a file (metadata); (d) Read a file or portion of a file from media and transfer it to another Data Mover; (e) prepare to accept a file or portion of a file from another Data Mover and write it to media. The other port is used to transfer data from one Data Mover 20, 21 to another data mover. Data transfer operations always involve two Data Mover processes 20, 21, and the two processes are each made aware of the communications link. The implementation may have two connections between Data Movers, one for commands and one for data.
DSM Agents 22 are processes that typically run on the same host as the DSM Server 14, and work with the DSM server to implement policies set up by the
Administrator 24. One or more DSM Agents 22 may be responsible for the following types of tasks: (a) migrate files from one DSM Store to another, e.g., from a RAID disk to a magneto-optical (MO) drive and/or to a tape drive based on the time of last reference to the file; (b) delete files that have reached a certain age; (c) compress tape volumes by removing stale files (files that have been deleted by a user application);
(d) retension tapes; (e) copy and remove aged media; (f) copy of move files to remote DSM system(s); (g) import volumes from a foreign system; and (h) transfer a Store to a different (perhaps more modern or up-to-date) media type. DSM agents are privileged client applications that use the Client API and the DSM Administrator API. Database Server 26 stores information vital to the operation of DSM system
10. Database Server 26 provides secure, robust, transaction-oriented storage and efficient search and retrieval mechanisms. The information maintained by the Database Server 26 may include: (a) the DSM security database containing user IDs, passwords, and privileges; (b) physical configuration, including local and remote libraries and drives, and external media storage; (c) remote DSM servers and their communication parameters; (d) media inventory, including the location of each piece of media, and the space available; (e) file metadata, including the security attributes and media locations of each file; (f) logical grouping of media into DSM Stores, and information about each Store; and (g) policy parameters used by the DSM server and DSM agents. The database can be stored in tables and the database implementation provides a relational view of the data.
Having described the architecture of DSM system 10, reference will now be made to the policy algorithms implemented by the system. Policies can be placed in the following categories: (1) policies dealing with maintaining copies in stores; (2) policies dealing with media and volumes; and (3) miscellaneous policies. Policies dealing with maintaining copies in stores may include (a) an initial store policy that specifies the default store(s) into which a new file is placed; (b) a maximum file size to store, which may be bypassed for larger files, in lieu of an alternative store; and (c) an alternate store to use if the file size exceeds the maximum specified.
A migration policy may (a) enable migration, in which case lowest level stores would not enable migration; (b) specify migration low and high watermarks; (c) specify the store to which files should be copied when the present store reaches some capacity threshold; (d) specify migration ranking criteria, such as oldest, least recently used, size, combination of age and size, (e) specify use of fileset migration, and allow the user to choose different levels of adherence to this policy; and (f) set a migration time window, i.e., a period of time in which to carry out the migration. A deletion policy may (a) enable automatic deletion, which typically would not be enabled for lowest level store; (b) specify stores on which copies must exist before a file is deleted from this store; (c) specify deletion from the original store immediately upon migration to the new store; (d) set a suggested minimum age at which to delete a file; (e) set a suggested maximum age at which to delete a file, in which case the file may be deleted even if space is not needed; (f) specify marking of deleted files as obsolete without deletion, enabling recovery of a deleted file; (g) specify marking of overwritten files as obsolete without deletion, enabling recovery of any version of a file; (h) set a maximum time to retain an obsolete file beyond deletion (obsolescence); and (i) set a maximum time to retain any file beyond last reference. A reload policy may (a) require reloading of an entire fileset when a file is reloaded; and
(b) specify a maximum total size of files in a fileset to reload.
Among the policies dealing with media and volumes, a chunking policy may: (a) allow chunking files on the given store although some users may choose to disallow splitting of files across volumes; in this case, files larger than the size of one volume would be rejected; and (b) set a minimum chunk size to prevent a proliferation of small chunks, which may not apply to a final chunk. A volume selection policy may specify that the selected volume will be (a) the first available; (b) the best fit in terms of storage space or other characteristics; (c) the most recently written to keep files in more or less chronologically ordered on media; (d) the least recently used to keep volumes cycled through drives and spread new files across media for concurrent retrieval; (e) a round robin format in which volumes are cycled through the drives; and (f) with the most current file in the same fileset or any file in the fileset. A drive selection policy may specify that files be stored (a) on a drive that is the first available; or (b) to balance usage among drives. A shelf management policy may (a) enable shelf management for the given store although shelf management for intermediate stores may not be desired; (b) use Free Store from which new volumes can be drawn; (c) set a tape retension interval; (d) set a maximum media age according to which the system will copy to new media and scrap old media when the age is exceeded; (e) set a maximum mount count by which the system will copy to new media and scrap old media when the count is exceeded; (f) set a tape compaction threshold specifying the minimum bytes to try to recover; (g) specify merging files in a fileset onto media or, alternatively, only upon migration; (h) set a shelf management time interval that specifies a best time to compact and retension tapes and merge filesets; and (i) specify import/export volumes and notify caller when an offline file is accessed.
An inventory policy may provide that the library be periodically inventoried, inventoried using a barcode inventory method, or using a volume label inventory method where each volume is loaded. A number of miscellaneous policies may be employed including a logging policy whereby the system logs, e.g., deletions, overwrites, stores, and retrievals.
To implement the various policy categories described above, DSM system 10 makes use of a set of policy algorithms. When DSM system 10 receives a new store- file request from an Endpoint Client, for example, it will store the file in the Store specified in the request if one is specified. If no Store is specified, DSM chooses the Default Store for the User's Active Group. Typically, this will be a direct-access storage device, such as a RAID disk. In any case, exactly one Store is chosen. Whenever the size of the file to be stored exceeds the non-zero FileBypassSize attribute of the selected store, then the store specified in the FileBypassStore attribute is selected instead. If no store is selected because the file size exceeds all non-zero FileBypassSize values, then the store fails, it is logged, and the requestor is notified. When a Store is selected, file manager 58 creates a new File object in the Store and schedules a file copy from the Client Endpoint to the Store. Whenever a copy is scheduled, the destination Store is added to the File's vsScheduled attribute. Whenever a copy completes, the destination Store is removed from vsScheduled and added to vsResidence for that file. Also, when a copy completes, the File's vsHeld is updated to indicate whether or not this file is a candidate for deletion (primarily whether it has been migrated), and the Store's BytesHeld property is updated accordingly. Since a new file typically cannot be deleted until it is migrated, the Initial Store is usually added to vsHeld following the store, and the file size is added to BytesHeld.
If a maximum lifetime is specified for the file, it is placed in the file's Lifetime attribute. If none is specified, then the MaxLifetime attribute of the Store is placed in the file's Lifetime attribute. Whenever a copy to a Store completes, DSM will look at a vsCopyTo indicator for that store, and immediately schedule copies from this Store to all the Stores specified if copies do not already exist or are not already scheduled. This is essentially the same as an immediate migration from the Initial Store to the Stores in the vsCopyTo.
Overwriting a file is equivalent to deleting the file and storing a new file but reusing the old FilelD. A file is deleted by giving the file a new FilelD and then scheduling the deletion using the new FilelD. The act of physically deleting the file can be queued. Once the file is given a new FilelD, other requests using the old FilelD, such as an overwrite request, can proceed ahead of the delete. Since copies of the file may exist on a number of different stores, deleting the file may require mounting one or more volumes, which could take some time. A client desiring to overwrite the file will not have to wait for the previous copy to be physically deleted in order to write the new copy. Overwriting a file requires that the File be Locked. A Delete File request is always the last request allowed for a file. When a Delete File request is received, the file is marked for deletion. If a RetainDeleted property is set for the Store, then the file is flagged as Obsolete; otherwise the file is flagged as Deleted. All subsequent requests for that file, except those dealing with deleted files or versions, will return a status indicating that the file does not exist. The Delete File request is processed by copying the FilelD property into an OrigFilelD property, and then giving the file a new unique FilelD. In doing so, DSM server 14 can process the Delete File request as it would a normal file, while retaining the information needed to undo the delete if the configuration allows it. Furthermore, this supports multiple deleted (obsolete) file versions, while maintaining a unique FilelD for every unique file version in the system.
Once a new FilelD has been assigned to a file, deleting the file from DSM is equivalent to deleting it from all stores in which it is resident, which can be determined from a vsResidence vector. The file record is then deleted if a bRetainDeleted property is not enabled for this store. Once a new FilelD has been assigned to the file, the act of physically deleting the file can be queued, and other requests for the same FilelD, such as an overwrite request, can proceed ahead of it. Because copies of the file may exist on a number of different stores, deleting the file may require mounting one or more volumes, which could take some time. A client wanting to overwrite the file will not have to wait for the previous copy to be physically deleted in order to write the new copy. The file must be Locked in order to be deleted.
Deleting a file from a Store is not the same as deleting a file from DSM system 10. The file may be deleted from one Store and still exist in other Stores. A file is deleted from a Store as part of the migration scenario. Deletion is a scheduled task, since it may involve mounting one or more volumes. In most cases, it is a higher priority task than writing to the volume, so that the space freed by the deletion will become available sooner. For most types of media that support deletion, as opposed to marking a file as stale, the delete operation is very fast once the volume is mounted.
To delete a copy of a file from a store, the following steps occur: (a) the store is removed from a vsResidence vector for the file; consequently, future read requests for this file on this Store will find the file is not resident, even if the physical deletion has not yet occurred; (b) the Store is added to the file's vsDeleting vector; (c) a FileSegmenter process is invoked to delete the individual file chunks; (d) the
FileSegmenter process schedules the deletion of the individual chunk, and the number of bytes in each chunk is subtracted from a BytesRemovable or BytesHeld property and is added to a BytesDeleting property of the volume and store. An IOScheduler process handles deleting the chunks. As the physical deletion occurs, the size of the chunk is subtracted from the BytesDeleting property and added to the BytesFree property of the volume and store. When all chunks are deleted, the Store is removed from a vsDeleting property for the file. If a copy of a File into a Store is requested, e.g., a Reload, while the file is being deleted from the Store, the deletion must complete before the copy begins. . Otherwise, multiple copies of the file would exist in the same Store, which may be difficult to support. Note that this does not apply to Overwriting a file because the FilelD of the old copy is changed prior to the Overwrite operation. No file lock is required to delete a File from a Store.
Copying a file is one of the more important operations in DSM. It generally would be used in all the following situations: (a) to place files into DSM system 10 by doing a copy from a Client Endpoint to a Store; (b) to return files to a client, by doing a copy from a Store to a Client Endpoint; (c) to migrate a file from one store to another; (d) to reload a file into the cache store; (d) compacting media; (e) aging media; and (f) migrating to new media types.
File copy requests are handled by a FileSegmenter process. The actions described below take place in response to a file copy request. The element corresponding to the source Store is incremented in the vsSourcing property for the file. The destination Store is added to a vsScheduled vector for the file. The FileSegmenter process is called to generate IO Requests to perform the copy. Volume Manager 52 will select one or more destination volumes and reserve space by incrementing BytesReserved and decrementing BytesFree. IO Scheduler 54 and Data Mover 20, 21 will copy the bytes. A BytesReserved property will be decremented and a BytesHeld or BytesRemovable property will be incremented for both the destination volume(s) and the destination Store. If the file is held, the vsHeld bit for the destination store will be set in a File.vsHeld vector.
When the copy is complete, the destination Store will be removed from the file's vsScheduled property and added to a vsResidence property for the destination file. The source is evaluated to see if the file is a candidate for deletion and the File's vsHeld vector is updated accordingly. Whenever a copy to a Store completes, DSM server 14 will look at the vsCopyTo vector for that store, and immediately schedule copies from this Store to all the Stores specified in that vector if copies do not already exist, as determined by the vsResidence, or are not already scheduled., as determined by the vsScheduled vector. When a client application requests a file from DSM system 10, the following steps take place. First, the store from which the files is to be retrieved must be chosen. The file may exist on multiple Stores. The file vsResidence indicates the Stores in which the file is resident. The Store that is chosen for retrieval will be the Store with the lowest value of a priority property. This will typically be the fastest store. Alternatively, DSM system 10 may determine if a mounted volume contains the file or if a drive is available in a library that contains the file. If the file is not found on the local DSM Server, the Servers in a vhSearch vector for the Store are searched. Second, a copy of the file from the selected Store to the Client application is scheduled. Third, a reload of the file is scheduled if the applicable policy indicates to do so. If the ReloadToStore policy property for the chosen store is not null, and the user request did not specifically suppress reload, then a copy from the chosen Store to the Store specified by the ReloadToStore property is scheduled. Reloads are not scheduled if the Retrieve request specified NO_RELOAD. Reloads are not scheduled if the retrieve is from a remote DSM server, or the file is reloaded to the Store from which that the file was originally requested.
For migration strategies, there are competing goals. First, it is desirable to keep as many files in a Store as possible so that fast retrieval is possible. Second, it is necessary that enough spaced in a Store remain available to handle incoming files. In general, migration is accomplished by performing two operations: (a) copying files from a Store to its Migration store so that a copy is retained; and (b) deleting files from a Store that exist in one or more other Stores. The migration policy has an effect on when these operations are performed. DSM system 10 makes use of four thresholds that control these operations. Copy High- Water Mark (CopyHWM) starts copying to the migration store when the held (unmigrated) bytes exceed this threshold. Copy Low- Water Mark (CopyLWM) stops copying when the held (unmigrated) bytes goes below this threshold.
Delete High- Water Mark (DeleteHWM) starts deleting copied files when the allocated bytes goes over this threshold. Delete Low- Water Mark (DeleteLWM) stops deleting when the allocated bytes goes under this threshold
FIG. 3 illustrates the concept of watermarks. Bytes that have not been copied to the Migration Store are not eligible to be removed (deleted) from the Store, and are referred to as "Held" bytes. Bytes that have been deleted or never allocated are eligible to be reused, and are referred to as "Free" bytes. The other bytes in the Store that have been allocated and migrated, are "Removable" bytes. The goal of the Migration Agent is to keep the level of allocated bytes between the DeleteHWM and the DeleteLWM so that some Free bytes are always available, and to keep the level of
Held bytes between the CopyHWM and the CopyLWM so that some allocated bytes can be quickly converted to Free bytes if needed. The higher-priority task is to keep Free bytes available to be used for files that are being written to the Store by deleting Removable Bytes. However, the Migration Agent cannot delete more Removable Bytes than exist, so it may have to copy files to the Migration Store to make those
Held bytes Removable before it can delete them.
There are at least three migration strategies that are supported by DSM system 10, as illustrated in FIG. 4 and described below. Water Mark Migration with Deletion (Move files from one Store to Another) is a strategy that combines the copy and delete operations. In essence, CopyLWM is equal to DeleteLWM and CopyHWM is equal to DeleteHWM. In this strategy, the migration agent starts to copy files to the Migration Store when the CopyHWM is reached. It copies as many files as necessary to the Migration Store, and deletes them from the current Store, until the CopyLWM is reached. This strategy attempts to always maintain an amount of free space on the Store between the CopyHWM and the CopyLWM. This is the classic water mark migration strategy. The Store can be viewed as a tank of water with a fill pipe and a pump. When the water (incoming files) reaches the High- Water Mark, the pump (the migration agent) turns on and begins emptying the tank. When the water reaches the Low- Water Mark the pump shuts off. During the time the pump is running, water can be entering from the fill pipe. If water is not pumped out faster than it is coming in, the incoming water is backed up or the tank overflows.
One feature of this approach is that migration tends to occur in batches. This can be a positive thing if the batches are such that migration can occur only at non- peak intervals. However, if the High Water Mark is reached during a peak period (which intuitively might happen), then large amounts of migration will compete with peak activity. Furthermore, the migration may become critical in order to keep a Store from filling up. To keep migration at off-peak times, it may be necessary to set the Low and High Water Marks far apart. For example, there may have to be enough room to store the new files for an entire day. Because the Low Water Mark is typically quite low in this strategy, it means that less efficient caching is taking place. The amount of the Store that is being used for caching varies between the DeleteHWM and the DeleteLWM. In this strategy, DeleteLWM is the same as
CopyLWM and the amount of caching can go quite low. It may be difficult to determine the best setting for the Water Marks, especially if the workload fluctuates. Furthermore, the water marks may have to be adjusted as the average workload increases or decreases over time. A variation on the first migration strategy can be referred to as water mark migration without deletion. According to this strategy, files are copied to the Migration Store in the manner described above, but the files are not deleted from the current Store until the space is actually required. In this manner, the DeleteHWM and DeleteLWM are set above the CopyHWM and CopyLWM. This approach is generally acceptable, because once the copy has been made, it generally takes relatively little time to delete the file. In theory, it may be possible to set the Delete water marks at 100% to cause files to be deleted exactly when the space is needed. In practice, deleting is not instantaneous, and in fact may require mounting one or more volumes. Therefore, it may be desirable to keep some amount of space immediately available by deleting files before the space is needed. An advantage of this strategy is that more of the Store is available for caching files. The disadvantage is that some overhead is required to delete the file when the space is needed, and that overhead is not postponed to off-peak times. As with the first strategy, it may be difficult to determine the optimum water marks and they may need periodic adjustment. A variation of the second migration strategy is to schedule the migration of files as soon as they enter a Store, in essence setting the CopyHWM and CopyLWM to zero. This strategy can be referred to as immediate migration. The advantages are that migration can occur as a continuous low-priority activity, and caching efficiency is optimized as it is in the second strategy. This strategy also has less dependence on the selection of optimal water marks. The disadvantages are that neither copying nor deleting files is postponed to off-peak times, so they may compete with normal activity. However, both may occur as low-priority tasks. In the exemplary embodiment described herein, there are six counts associated with each Volume that indicate the states of the space (bytes) on that volume. The . sums of the counts for all the Volumes in the Store are maintained in the Store. The bytes are always in one exactly of the states: Free, Reserved, Held, Migrating, Removable, or Deleting. The common state transitions are shown in FIG. 5 and described below. When a volume is created, all the bytes are in the "Free" state. That is, they are available to be used. When a request is received to store a file, a volume is chosen with enough space to accommodate the estimated file size, and those bytes are moved to the "Reserved" state. When the file is completely stored on the volume, the number of bytes corresponding to the actual file size are typically moved to the
"Held" state, indicating that the file must be migrated to another store before it can be deleted from this store. If more bytes were reserved than were actually used, then the excess bytes return to the "Free" state. When a file is selected for migration, the bytes are moved to the "Migrating" state. When a file has been migrated to the next level of storage, it is eligible to be deleted from the current store. The byte count is then moved to the "Removable" state.
When a file is selected to be deleted from a store, its byte count is moved to the "Deleting" state and the delete operation is scheduled. When the deletion is complete, those bytes are moved to the "Free" state. There also may be three long- term states: "Free," "Held," and "Removable." There are also three typically short- term or transitional states: "Reserved," "Migrating," and "Deleting." Other transitions may occur, depending on the migration policy. For example, the policy may define a temporary store, where old files are deleted without ever being migrated. In that case, bytes would transition directly from "Reserved" to "Removable." Another example is when a file is explicitly deleted by a client. In that case, the bytes may transition from
"Held" to "Deleting." FIG. 5 completes the picture shown in Fig. 3 by mapping the six states to the watermarks. The specific goal of the migration agent is to keep the number of allocated bytes between the Delete LWM and the Delete HWM, and to keep the number of Unmigrated bytes between the Copy HWM and the Copy LWM. According to the migration delete algorithm, the Migration Agent monitors the
Stores, and when it finds that the level of allocated bytes in a Store is greater than the DeleteHWM, it selects files to be deleted, and schedules them for deletion. The Migration Agent will continue to select files to be deleted until the number of allocated bytes is less than or equal to the DeleteLWM. The Migration Agent does . not consider Stores whose DeleteCriteria is DELETE_NEVER. The percentage of bytes allocated is calculated using the following equation.
BytesTotal - BytesFree - BytesDeleting
PercentAUocated =
BytesTotal
The migration agent will select physical files (Chunks) for deletion by age (current time minus the LastAccessTime). A file is not eligible to be deleted from a Store if any of the following are true:
(1) the file is being read from any store (File.ReadCount > 0); (2) the files is scheduled to be deleted or written to the stores, as indicated by the File.vsScheduled and File. vsDeleting vectors; (3) the DeleteCriteria of the Store is DELETE ANY STORE and the file does not exist on at least one other Store (determined from the vsResidence property); (4) the DeleteCriteria of the Store is
DELETE_SPECIFIC_STORE and the file does not exist on all of the Stores specified in the vsDeleteSpecific (determined from the vsResidence property); (5) the DeleteCriteria of the Store is DELETE_MIGRATE_STORE and the file does not exist on the Migration Store (determined from the vsResidence property). The first two of the above conditions will typically not be true for the oldest files on a store.
The other conditions will be untrue after the Migration Copy process is complete.
According to the migration copy algorithm, the Migration Agent monitors the stores, and when it finds that the number of percent of unmigrated bytes in a Store is greater than the CopyHWM, it schedules files to be copied to the Migration Store. The steps in migration are as follows: First, it is determined if migration is necessary.
Migration will begin if the ratio of unmigrated bytes to the total number of bytes in the Store is greater than CopyHWM, computed by the following formula:
BytesHeld^ Bytes Re served
PercentUnmigrated =
BytesTotal
If migration is necessary, the Migration Agent will set the Store's bMigrating flag, select files to be migrated (step 2 below), and schedule the migration (step 3 below).
As files are selected for migration, the BytesMigrating variable will be incremented and the BytesHeld variable will be decremented. The Migration Agent will continue to select files for migration until the ratio in the equation above is less than the low water mark, CopyLWM. It will then clear the Store's bMigrating flag.
Second, the file to be migrated is selected. A file will be selected for migration from a Store based on the Store migration algorithm. A file is eligible for migration from a Store if: (a) the file is not marked as deleted, where obsolete files are migrated; (b) the file is not being deleted from the present store, i.e., the store is not in the vsDeleting vector; and (c) the file is not already being copied to the migration store, i.e., the Store is not in the vsScheduled vector. When a file is selected for migration, the Migration Agent will add the store to vsScheduled and schedule a copy to the migration Store. If the file is part of a fileset and other files from the same fileset are present in the Store, then those files will be selected next for migration if fileset migration is selected.
Third, a request to copy the file to the migration Store is made if the file does not exist there. The FileSegmenter will add the destination store to the vsScheduled property of the file and handle the physical IO of the file chunk(s). The BytesMigrating property of the source store will be incremented by the file size, and the BytesHeld will be decremented. The IOScheduler is invoked to handle the physical copy. When the physical IO is complete, the Store is removed from destination vsScheduled and added it vsResidence for the file. The file size is decremented from the source BytesMigrating. If all of the criterion to release the hold on the file are satisfied, the file size is added to BytesRemovable. Otherwise, the file size is added to BytesHeld.
In reloading a file, a reload request typically causes a file to be copied to the highest-priority Store where the size of the file does not exceed the FileBypassSize of the Store. The source of the copy is the highest-priority Store that contains a copy of the file. Specifically, a reload request can be handled in the following manner: (a) select the reload source; (b) select the Store with the lowest value of the Priority property that contains a copy of the file as determined by the File vsResidence property; (c) if the file is not found locally and there are remote Servers configured, search the remote servers for the file; (d) select the reload destination; (e) select the Store specified by the ReloadToStore property of the source Store selected in step (a) if the source Store is local, but if ReloadToStore is null, then no reload is performed; (f) if the source store is on a remote server, then select an Initial Store on the local . system; (g) if the size of the file to be reloaded exceeds the FileBypassSize of this Store, then select the Store with the next lowest value in its Priority property and continue until a Store is found where the file size does not exceed FileBypassSize; and
(h) if the source and destination are the same, no reload is required, otherwise schedule a copy of the file from the source Store to the Destination Store. Normal copy and migration rules apply to reloaded files, including scheduling additional copies per the vsCopyTo. Reloading a fileset is generally equivalent to reloading all the files within the required fileset. All the files will be queued for reloading according to its location if the media type is sequential access.
It is preferable that Migration, Copy To happen during a time the system is not very busy. For this reason, when criteria for Migration or CopyTo are reached, DSM queues requests so they occur during a preferred time interval. When the high water mark is reached, files will be queued for migration. The actual migration may happen during a preset up time, for example, over night. A maximum number limit will be set by policy for this queue. The migrating queue will not be affected by any action after the file or fileset is already in the Queue. For example, migration policy may be to migrate the least recently used file first. When HWMK is reached, if a least recently used file A is put in the migrating queue before the actual migration takes place, then there comes some kind of request for file A. DSM system 10 will fulfill the requirement and File A now is not least recently used any more but it will be still migrated according to the exiting queue.
The CopyTo function can be queued and the action will be taken in a preferable time interval or according to some other condition. For free store management, a warning can be given when BytesTotal below a given minimum bytes. System should be prepared to add new volumes into FreeStore. Media added to FreeStore must be DSM-compatible media and can be formatted or unformatted. The added media is given a DSM global unique identifier (guid). The following discussion deals with operations that are performed by DSM agents for shelf management. To facilitate deletion of aged files, each file has a Lifetime property that is set when the file is created. An agent scans the files periodically and deletes any files whose age exceeds their specified lifetime. The age of a file is the difference between the current time and the time the file was last referenced as recorded in the LastReference property. After being deleted, the file is unrecoverable if the Store's bRetainDeleted property is not set. If bRetainDeleted is set, then the file is marked obsolete but can be recovered. If the file is marked obsolete, its LastReference property is set to the time that it was marked as obsolete. The agent also scans for obsolete files and deletes any whose age exceeds the ObsoleteLifetime property of the Store. An agent also scans files in Stores for files whose age exceeds the MaxSaveTime of the Store, and deletes those files from the Stores (migrating when necessary).
For replacement of aged media, each media can specify a MaxAge that sets the age at which media should be copied to new media and discarded. An agent monitors the age of media, i.e., current time minus CreationTime, and instigates the replacement process at the appropriate time. For migration to a new or different media type, the following steps can be used: (a) configure a new Store (NewStore) that is to replace an existing Store (OldStore); (b) configure NewStore to have the same Priority as OldStore; (c) configure the Stores that migrated to OldStore to migrate to NewStore; (d) configure the Stores that sent bypassed files to OldStore to send bypassed files to NewStore; (e) configure NewStore to copy to the Stores to which OldStore copied; (f) configure NewStore to migrate to the stores to which
OldStore migrated; (g) configure NewStore to bypass to the store to which OldStore bypassed; (h) configure OldStore to migrate to NewStore; (i) set the ForceMigration flag in the Policy for OldStore. An agent will force the migration of all the files in OldStore to NewStore. Since the new configuration does not store or migrate any files to OldStore, it can be removed once the migration is complete. During the migration, retrievals and reloads of files that are still on OldStore will continue to operate in the usual manner.
To retension media, an agent can be made responsible for periodically checking the LastRetensioned property of the media against the Retensionlnterval property of the media type and performing the retension operation at the proper time.
For compactable media, typically tapes, an agent will be responsible for periodically comparing the BytesRemovable of a volume against the MinBytesToRecover property of the media type. When BytesRemovable exceeds MinBytesToRecover, a compaction of the media will take place. An agent can be provided to merge oldest files onto volumes for export. When the library capacity reaches some threshold, an agent will combine the oldest files on loaded volumes onto a single volume for export. That volume can then be exported and a fresh volume can be imported.
A number of policy examples are set forth below for puφoses of illustration. A first example pertains to a policy that provides for immediate migration. In this example, files that are smaller than the Store 0 FileBypassSize are written to Store 0, and then copied to both Store 1 and Store 2. Files that are larger than the Store 0 FileBypassSize and smaller than Storel FileBypassSize are written to Store 1 , and then copied to Store 2. Files that are larger than Storel FileBypassSize are written to Store 2. Because the copies to the other stores are scheduled right away, there is no need for high water mark migration.
A second example provides for HWM migration and backup. There are three cases of interest in this example. The file is smaller than Store 0's FileBypassSize.
The file is written to Store 0. When the write is complete, the file is copied to Store 3. When the file ages from Store 0, it migrates to Store 1. Because it already exists on Store 3, it is not copied there again. When it ages from Store 1 , it migrates to Store 2. Because it already exists on Store 3, it is not copied there again. The file is larger than the StoreO FileBypassSize and smaller than the Store 1 FileBypassSize. The file is written to Store 1. After the write is complete, the file is copied to Store 3. When it ages from Store 1 , it migrates to Store 2.
Because it already exists on Store 3, the file is not copied there again. The file is larger than Store 0's FileBypassSize and larger than Store l's FileBypassSize. The file is written to Store 2. After the write is complete, the file is copied to Store 3.
Store 2 and Store 3 are both at the lowest level, so vsCopyTo and MigrateToStore would be null. Store 3 is a backup store, so bRetainDeleted and bRetainVersions would be set TRUE. This type of Store is essentially a serial record of everything written to DSM. Any version of any file would be recoverable from this media. This policy can be modified by the ObsoleteLifetime and MaxLifetime properties of the
Store. A third example concerns migration to a new media type. This is similar to the second example, except that a new Store, Store 4, has been added to the system - with the intention of migrating all files from Store 1 to Store 4. Note that Store 4 takes the place of Store 1, and Store 1 now just migrates to Store 4. The ForceMigration property would be set in Store 1 to cause all the files in Store 1 to migrate to Store 4 even though they might not otherwise migrate.
The following are several guidelines for practice of the policies described above. If the policy is intended to provide a permanent storage facility, configure a CopyTo from the InitialStore to at least one lowest-level store. If a maximum file size is specified for a Store (FileBypassSize), then a FileBypassStore is configured. The
MigrateToStore should usually be the same as the FileBypassStore if there is a FileBypassStore. If a Store (StoreA) specifies a FileBypassStore (StoreB), then StoreB should have the same CopyTo stores StoreA. This ensures that a copy is always done to StoreB. The FileBypassSize should increase for each level of Store. Each Store related through policy should have a unique value in its Priority attribute. Setting the "bRetainDeleted" property of a Store may be inconsistent with specifying a MaxLifetime for the Store unless an ObsoleteLifetime is also specified. Specifying the MaxLifetime will cause the file to be deleted (presumably to recover the space it used), but "bRetainDeleted" will cause the space to be retained while making the file invisible to most client applications. However, if an ObsoleteLifetime is also specified, then the deleted copy will eventually be eliminated.
As described above, DSM system 10 may incoφorate the Fileset feature whereby groups of images associated with a particular customer, project, or transaction are generally stored together on common media. It is noted, however, that multiple images that are associated with one another are stored together in thumbnail and screen resolution versions on short-term media for ease of access, while associated high resolution versions are stored, preferably together, on longer-term media. The fileset feature allows a user to group files into a logical collection, and perform operations on the files as a group.
An example is a group of image files representing images scanned from a single roll of film. The thumbnail and screen resolution versions of each image are stored together with those of associated images in the roll of film to facilitate quick web access to the images. The high resolution versions of the film images can be . migrated offline. At the outset, when the images are checked into DSM system 10, the thumbnail, screen resolution, and high resolution images can all be stored on the short-term media, subject to subsequent migration of the high resolution images.
In addition to creating a fileset for a single roll of film, system 10 can be configured to include in the fileset images from multiple rolls of film submitted by a common customer. Also, the fileset could include audio, video, or other content originated by the common customer. Accordingly, filesets can be configured to have their member files reside together on media, so that operations on the fileset, such as archival, migration, retrieval, and deletion, can be performed significantly faster than operating on the individual files, which might otherwise be distributed across multiple media.
DSM system 10 can be further configured to take advantage of metadata associated with each media volume and each file, thereby providing features referred to as media independence and self-describing media. Metadata can be generated for each volume in the form of a file that uniquely identifies the volume with a guid and other useful information. In this case, system 10 may generate the metadata. For individual files, however, it may be desirable for the client application to generate metadata, including a guid and information concerning the particular customer, project, or transaction associated with the file.
In this manner, the client application can pass through to DSM system 10 simply the content of the image file as a blob and the content of the metadata file as a blob. In other words, DSM system 10 need not be concerned with the content of the image/metadata file. Rather, the client application provides sufficient information to the database server, such as a SQL server, to allow DSM system 10 to locate the file and retrieve it for direct access by the client application. The client application is responsible for the format of the image file.
Thus, for a web-based client application, the image file can be converted to a browser-recognizable format prior to submission to DSM system 10. Along with media independence, system 10 may further include a feature whereby volume metadata is stored on each physical volume to track the volume across within a local server or across a network. The metadata can be useful in tracking volumes and files, verification of the identities of loaded volumes, and database recovery.
FIG. 7 is a functional block diagram illustrating the interaction between a client application and a server component in a system as shown in FIG. 1. As shown in FIG. 7, a client application programming interface (API) also can be included to provide a number of features to a programmatic user of the system and method. In particular, DSM client 66 can use the client API to allow for file storage and retrieval by DSM server 14 that is either directed by a user application 68 to or from a particular collection of media 70, or is undirected. Further, the client API can facilitate file grouping by DSM server 14 by allowing fileset creation and modification by user application 68 to generate logical collections of files on storage media 70.
The client API also can be used to provide restricted, password-protected logons to distributed server systems, user and user group permissions on files, and a set of privileged, administrator-only functions. For access to a remote DSM server
14b and the associated collection of media 70b over a network 72, the client API implemented between DSM client 66 and the remote DSM server can provide additional levels of security. For example, the server system can be protected first by a firewall on the server side of the network 72, second by a HSM system server logon and password, third by a file system access control lists, such as the NTFS ACL provided by the Windows NT operation system, and finally by database server logon and password. In addition, the client API enables policy creation and modification to provide settings for determining default behavior of the storage management function and its background agents. System configuration also can be managed via the client API, such that hardware and software setup is API-configurable. With the client API, the server systems can be local or remote. In particular, the client API implementation can be made to communicate through industiy-standard TCP/IP protocols across network 72 to distributed server systems running on any machine that is IP-addressable by the machine on which a user application 68 runs.
The policies implemented by an administrator govern several behavioral aspects of the data and media management functions. For example, policies can be defined to control the set of media 70 to be used for storage of files associated with particular clients, the movement and replication of the files to other sets of media, . peer-to-peer data movement between servers, the timing of data movement and replication, maximum file size allowed for various sets of media, retention intervals on various sets of media, versioning, fileset behaviors, media lifetime, tape retensioning, tape compaction, automated media replacement, and automated media upgrades.
FIG. 8 is a functional block diagram illustrating the interaction between a web- based client application 74 and a DSM server component 14, via an automation server component 76 that links the client application, DSM client 66, and server 14, over a network 72, in a system 10 as shown in FIG. 1. System 10 also can be configured to provide direct access to the storage function instead of indirect access, e.g., via a SQL server. In particular, the system can be arrange to give privileged clients direct readonly access to the file-oriented storage media, with guarantees that the files they specify will remain on that media until they specify otherwise. This provides, for example, the fastest and most direct access to media files that need to be published to web sites for use by web applications 74, by allowing web servers to publish stored files without having to make a copy of the file elsewhere.
This direct access feature also can support specific user-specified file extensions so that web applications 74 that trigger on file extension, can use this feature. As another feature, the system and method may employ an architecture that allows a variety of functions to be allocated to the same or different computer servers in a single storage management system. In order to link web pages applications 74 to the DSM executable programs, system 10 can use an interface referred to as an automation server 76. Automation server 76 is created as prescribed by applicable
Microsoft specifications. Various automation servers having features unique to system 10 can be incoφorated together or individually as follows:
(1 ) Storage Automation Server: This aspect of automation server 76 allows a web interface to attach and utilize the client API implemented in DSM client 66 as described above. This interface is portable across any user application that requires a web browser interface 74 to the storage management system 10. Another unique component is the locking down of individual images on a "Hot" cache to avoid migration. Finally, in storing the images, this automation server specifies a file extension to append to a GUID assigned to the file. This provides great advantage - when viewing the images from a browser 74 that does not support any type of image file decoding. For example, when viewing an image, the Netscape Navigator web browser only examines the file name extension such as .jpg , .gif, .tif, etc.
(2) Exe Automation Server: This aspect of automation server 76 allows a user to kick off and get a return result from any executable program such as within DSM client 66 and DSM server 14 out of a web browser interface 74.
(3) Image Tool Automation Server: This aspect of automation server 76 allows the calling of image processing tools within DSM client 66 and DSM server 14 from a web browser interface 74.
(4) Event Log Automation Server: Because web interfaces are page driven and stateless, debugging is extremely difficult. This aspect of automation server 76 allows any web interface 74 to log errors encountered from within a web page to the NT event logging facility on any NT server, e.g., DSM server 14. In this manner, any event at web interface 74 is translated for programmatic delivery to the NT server, and can be logged for analysis.
System 10 also can be configured to facilitate web-based workflow within an web application 74 that invokes the storage management function provided by DSM client 66 and DSM server 14. From various web pages, and calling the above automation server 76, for example, an application can scan and/or acquire images from a high speed film scanner, photoscanner, flatbed scanner, PCMCIA cards (digital cameras), or any TWAIN compatible device ( this is configurable). Those images then can be compressed into three image file types: (1) a thumbnail image, (2) a screen resolution image, and (3) a high resolution image. All three versions of each image are sent and "checked-in" to the storage management function with a file extension known to be processable by web browser 74, e.g., .jpg, via automation server 76.
Thus, the client application supporting the web-based workflow within web browser 74 converts each acquired image to the supported format such that the user can have direct access to the images maintained by the storage management function of DSM server 14 within the web browser. No images need to be kept directly within the client database. The screen resolution and thumbnail versions can be "locked" down by DSM server 14 on the short-term media, such as a RAID, and never allowed to migrate offline to tape. At the same time, the high resolution image may be allowed to migrate according to user-defined migration policies then in effect. In this manner, internet access to the images is as quick as possible. Also, the locked-down images are not stored by the client database server, but rather by the HSM server for direct access by the user. By migrating the high resolution image, however, storage costs for RAID are dramatically reduced. In other words, an application can be configured to store images as economically and efficiently as possible using system 10, with the potential for growth of storage capacity being unlimited and scalable.
As a further feature, system 10 may provide a "digital vault" function within a client application implemented using web browser 74. This function can be supported in part by the fileset feature. With this feature, each consumer has his/her own unique digital vault that is accessible as a web page and resembles an electronic safety deposit box. This digital vault contains images that are categorized and stored in folders that are graphically represented on web page browser 74 as a set of drawers. Each set consists of a single or multiple set of images that were acquired from one of the acquisition devices described above. One set could be a single roll of film, another could be a scanned legal document or documents, another set can be a VHS tape or tape library.
This vault can be password and login protected. All image transmissions can be done under SSL. The vault image viewing is also secured through a virtual directory service, another sequence of logins into the storage management system 10, a Microsoft SQL Server, and the Windows NT NTFS file system itself via ACL. From the vault, the consumer can proof and create his/her own media order. That media order is placed into the a "shopping basket" for the consumer, and totaled for payment and shipping. The consumer may also send those images to a third party via internet mail. Those images stored within the vault are set on an aging algorithm where after a predetermined number of days, the images are deleted from system 10.

Claims

CLAIMS:
1. A method for storing files comprising: assigning the files to filesets, each of the files assigned to a respective one of the filesets sharing one or more common attributes with the other files assigned to the respective fileset; storing the files assigned to the respective fileset together on a common data storage medium; and moving the files assigned to the respective fileset together from the common data storage medium to another common data storage medium.
2. The method of claim 1 , wherein the one or more common attributes include at least one of a common entity and a common project associated with each of the files assigned to the respective fileset.
3. The method of claim 1 , wherein the one or more common attributes include a common transaction associated with each of the files assigned to the respective fileset.
4. The method of claim 1 , wherein the files include medical diagnostic imagery, and the one or more common attributes include at least one of a common patient and a common medical practitioner associated with each of the files assigned to a respective one of the filesets.
5. The method of claim 1 , wherein the files include advertising imagery, and the one or more common attributes include a common advertising customer associated with each of the files assigned to the respective fileset.
6. The method of claim 1 , further comprising retrieving the files assigned to the respective fileset together as a group.
7. The method of claim 1 , further comprising deleting the files assigned to the respective fileset together as a group.
8. A method for storing files, the method comprising: storing the files on data storage media volumes; writing, to each of the media volumes, volume metadata that uniquely describes the respective volume; writing, with each of the files stored on the media volumes, file metadata that uniquely describes the respective file; and moving one or more of the files from one of the media volumes to another of the media volumes, wherein each of the moved files carries with it the file metadata for the respective moved file.
9. The method of claim 8, wherein each of the volume metadata and the file metadata is based on a globally unique identifier (guid).
10. The method of claim 8, wherein storing each of the files includes storing a data file containing the contents of the respective file and storing a metadata file containing the file metadata.
1 1. The method of claim 8, further comprising: constructing original file location database entries specifying the arrangement of the files on each of the media volumes; and in the event the original database entries become corrupted, reconstructing the database entries using the file metadata written with each of the files stored on the respective media volume.
12. The method of claim 11 , wherein the file metadata for the files stored on the respective media volume provides sufficient information to reconstruct the database entries access to the original database entries.
13. A method for storing image files on a hierarchy of data storage media, the method comprising: storing a first-resolution copy and a second-resolution copy of an image file together on a first type of media, wherein the second-resolution copy has a higher resolution than the first-resolution copy; migrating the second-resolution copy of the image file to a second type of media according to a migration policy; and preventing migration of the first-resolution copy of the image from the first type of media.
14. The method of claim 13, wherein the second type of media has a slower access time than the first type of media.
15. The method of claim 13 , wherein the second type of media is less costly than the first type of media.
16. The method of claim 13, wherein the first type of media includes a hard disk array.
17. The method of claim 13, wherein the second type of media includes magnetic tape.
18. The method of claim 13, further comprising: storing a third-resolution copy of the image file on the first type of media, the third-resolution copy of the image file having a resolution that is greater than the first- resolution copy and less than the second-resolution copy; and preventing migration of the third-resolution copy of the image from the first type of media.
19. A system for hierarchical storage management comprising: a web-based client application that submits images for storage; a storage server that stores the images; and an automation server that converts script-based web events in the client application into programmatic executable commands for execution by the storage server.
20. A system for hierarchical storage management comprising: a web-based client application that submits images for storage and requests images from storage for viewing by a user; a storage server that stores and retrieves the images; and an automation server that converts script-based web events in the client application into programmatic executable commands for execution by the storage server, and converts the images submitted by the client application to an image format recognizable by the web browser application on which the client application runs.
PCT/US1999/016051 1998-07-15 1999-07-15 Hierarchical data storage management WO2000004483A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US9285398P 1998-07-15 1998-07-15
US60/092,853 1998-07-15

Publications (2)

Publication Number Publication Date
WO2000004483A2 true WO2000004483A2 (en) 2000-01-27
WO2000004483A3 WO2000004483A3 (en) 2000-06-29

Family

ID=22235488

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/016051 WO2000004483A2 (en) 1998-07-15 1999-07-15 Hierarchical data storage management

Country Status (2)

Country Link
US (1) US6330572B1 (en)
WO (1) WO2000004483A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1412886A1 (en) * 2001-08-02 2004-04-28 Christian R. M. Singfield Network image server
SG103289A1 (en) * 2001-05-25 2004-04-29 Meng Soon Cheo System for indexing textual and non-textual files
DE10332245A1 (en) * 2003-07-16 2005-02-17 Siemens Ag Operating method for an imaging medical device
EP1643352A1 (en) * 2003-07-02 2006-04-05 Satoshi Yamatake Database system
WO2006040258A2 (en) * 2004-10-15 2006-04-20 Agfa Inc. Image archiving system and method
WO2006076482A1 (en) * 2005-01-12 2006-07-20 Emc Corporation Methods and apparatus for managing deletion of data
AU2002322153B2 (en) * 2001-08-02 2007-10-04 Sautec Pty Ltd Network image server
US7281084B1 (en) 2005-01-12 2007-10-09 Emc Corporation Method and apparatus for modifying a retention period
WO2007088084A3 (en) * 2006-02-03 2007-12-13 Ibm Restoring a file to its proper storage tier in an information lifecycle management environment
US7428621B1 (en) 2005-01-12 2008-09-23 Emc Corporation Methods and apparatus for storing a reflection on a storage system
WO2009077789A1 (en) * 2007-12-18 2009-06-25 Bae Systems Plc Improvements relating to data curation
US8874628B1 (en) * 2009-10-15 2014-10-28 Symantec Corporation Systems and methods for projecting hierarchical storage management functions

Families Citing this family (289)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CH691155A5 (en) * 1997-02-13 2001-04-30 Fotowire Dev S A The image processing and device for its implementation.
US7382379B1 (en) * 1997-06-27 2008-06-03 Eastman Kodak Company Arrangement for mapping colors between imaging systems and method thereof
US7581077B2 (en) 1997-10-30 2009-08-25 Commvault Systems, Inc. Method and system for transferring data in a storage operation
US6418478B1 (en) 1997-10-30 2002-07-09 Commvault Systems, Inc. Pipelined high speed data transfer mechanism
EP1038248A2 (en) * 1997-12-01 2000-09-27 Cedara Software, Corp. Architecture for an application framework
EP1045564A4 (en) * 1998-06-10 2005-07-13 Matsushita Electric Ind Co Ltd Data source and data sink
US6654881B2 (en) * 1998-06-12 2003-11-25 Microsoft Corporation Logical volume mount manager
US6453320B1 (en) * 1999-02-01 2002-09-17 Iona Technologies, Inc. Method and system for providing object references in a distributed object environment supporting object migration
US7412538B1 (en) * 1999-03-30 2008-08-12 Sony Corporation Request event manager and event lists for home and office systems and networks
US7546530B1 (en) * 1999-04-15 2009-06-09 Hoshiko Llc Method and apparatus for mapping a site on a wide area network
US20020174329A1 (en) * 1999-04-28 2002-11-21 Bowler Richard A. Method and system for automatically transitioning files among computer systems
US7035880B1 (en) 1999-07-14 2006-04-25 Commvault Systems, Inc. Modular backup and retrieval system used in conjunction with a storage area network
US7389311B1 (en) 1999-07-15 2008-06-17 Commvault Systems, Inc. Modular backup and retrieval system
US7395282B1 (en) 1999-07-15 2008-07-01 Commvault Systems, Inc. Hierarchical backup and retrieval system
US6895557B1 (en) 1999-07-21 2005-05-17 Ipix Corporation Web-based media submission tool
US9066113B1 (en) * 1999-10-19 2015-06-23 International Business Machines Corporation Method for ensuring reliable playout in a DMD system
US6944654B1 (en) * 1999-11-01 2005-09-13 Emc Corporation Multiple storage array control
US6973464B1 (en) * 1999-11-15 2005-12-06 Novell, Inc. Intelligent replication method
US6732162B1 (en) 1999-11-15 2004-05-04 Internet Pictures Corporation Method of providing preprocessed images for a plurality of internet web sites
US6553387B1 (en) 1999-11-29 2003-04-22 Microsoft Corporation Logical volume configuration data management determines whether to expose the logical volume on-line, off-line request based on comparison of volume epoch numbers on each extents of the volume identifiers
US6684231B1 (en) * 1999-11-29 2004-01-27 Microsoft Corporation Migration of friendly volumes
US6427152B1 (en) * 1999-12-08 2002-07-30 International Business Machines Corporation System and method for providing property histories of objects and collections for determining device capacity based thereon
US7155481B2 (en) 2000-01-31 2006-12-26 Commvault Systems, Inc. Email attachment management in a computer system
US6542972B2 (en) * 2000-01-31 2003-04-01 Commvault Systems, Inc. Logical view and access to physical storage in modular data and storage management system
US6658436B2 (en) 2000-01-31 2003-12-02 Commvault Systems, Inc. Logical view and access to data managed by a modular data and storage management system
US6760723B2 (en) * 2000-01-31 2004-07-06 Commvault Systems Inc. Storage management across multiple time zones
US7003641B2 (en) 2000-01-31 2006-02-21 Commvault Systems, Inc. Logical view with granular access to exchange data managed by a modular data and storage management system
US7765271B1 (en) * 2000-02-03 2010-07-27 Hyland Software, Inc. System and method for scanning a document in client/server environment
US6718446B1 (en) * 2000-02-11 2004-04-06 Iomega Corporation Storage media with benchmark representative of data originally stored thereon
US7730113B1 (en) 2000-03-07 2010-06-01 Applied Discovery, Inc. Network-based system and method for accessing and processing emails and other electronic legal documents that may include duplicate information
US7693866B1 (en) * 2000-03-07 2010-04-06 Applied Discovery, Inc. Network-based system and method for accessing and processing legal documents
JP2001357010A (en) * 2000-04-10 2001-12-26 Mitsubishi Corp Method for entrusting and managing file in web server on internet and a file entrusting and managing device to be used for the same
US20030110182A1 (en) * 2000-04-12 2003-06-12 Gary Christophersen Multi-resolution image management system, process, and software therefor
US20020002515A1 (en) * 2000-05-26 2002-01-03 Hiromichi Okazaki Image formation method, method for exhibiting and selling image, and server, work terminal, recording medium, and program data signal for realizing the method
US9038108B2 (en) * 2000-06-28 2015-05-19 Verizon Patent And Licensing Inc. Method and system for providing end user community functionality for publication and delivery of digital media content
GB0015896D0 (en) * 2000-06-28 2000-08-23 Twi Interactive Inc Multimedia publishing system
US8126313B2 (en) * 2000-06-28 2012-02-28 Verizon Business Network Services Inc. Method and system for providing a personal video recorder utilizing network-based digital media content
JP2002032245A (en) * 2000-07-17 2002-01-31 Nidek Co Ltd Medical image processing method and medical image processing system
US6804719B1 (en) * 2000-08-24 2004-10-12 Microsoft Corporation Method and system for relocating files that are partially stored in remote storage
US7406484B1 (en) * 2000-09-12 2008-07-29 Tbrix, Inc. Storage allocation in a distributed segmented file system
US8935307B1 (en) 2000-09-12 2015-01-13 Hewlett-Packard Development Company, L.P. Independent data access in a segmented file system
US6782389B1 (en) * 2000-09-12 2004-08-24 Ibrix, Inc. Distributing files across multiple, permissibly heterogeneous, storage devices
DE60128200T2 (en) * 2000-12-15 2008-01-24 International Business Machines Corp. Method and system for scalable, high performance hierarchical storage management
US20080059532A1 (en) * 2001-01-18 2008-03-06 Kazmi Syed N Method and system for managing digital content, including streaming media
US6990667B2 (en) 2001-01-29 2006-01-24 Adaptec, Inc. Server-independent object positioning for load balancing drives and servers
US20020147781A1 (en) * 2001-03-27 2002-10-10 Seiko Epson Corporation Information providing server
US8392586B2 (en) * 2001-05-15 2013-03-05 Hewlett-Packard Development Company, L.P. Method and apparatus to manage transactions at a network storage device
US6714953B2 (en) * 2001-06-21 2004-03-30 International Business Machines Corporation System and method for managing file export information
US8990214B2 (en) 2001-06-27 2015-03-24 Verizon Patent And Licensing Inc. Method and system for providing distributed editing and storage of digital media over a network
US7970260B2 (en) 2001-06-27 2011-06-28 Verizon Business Global Llc Digital media asset management system and method for supporting multiple users
US8972862B2 (en) 2001-06-27 2015-03-03 Verizon Patent And Licensing Inc. Method and system for providing remote digital media ingest with centralized editorial control
US20070089151A1 (en) * 2001-06-27 2007-04-19 Mci, Llc. Method and system for delivery of digital media experience via common instant communication clients
US7870013B1 (en) 2001-06-29 2011-01-11 Versata Development Group, Inc. Automated system and method for managing goals
US7110988B1 (en) * 2001-08-01 2006-09-19 Trilogy Development Group, Inc. Automated system and method for creating aligned goals
US7761449B2 (en) * 2001-08-09 2010-07-20 Hewlett-Packard Development Company, L.P. Self-disentangling data storage technique
CA2354990A1 (en) * 2001-08-10 2003-02-10 Ibm Canada Limited-Ibm Canada Limitee Method and apparatus for fine dining queuing
US20040054656A1 (en) * 2001-08-31 2004-03-18 Arkivio, Inc. Techniques for balancing capacity utilization in a storage environment
US20030046313A1 (en) * 2001-08-31 2003-03-06 Arkivio, Inc. Techniques for restoring data based on contents and attributes of the data
US7092977B2 (en) * 2001-08-31 2006-08-15 Arkivio, Inc. Techniques for storing data based upon storage policies
US20040039891A1 (en) * 2001-08-31 2004-02-26 Arkivio, Inc. Optimizing storage capacity utilization based upon data storage costs
US7509316B2 (en) * 2001-08-31 2009-03-24 Rocket Software, Inc. Techniques for performing policy automated operations
US7136883B2 (en) * 2001-09-08 2006-11-14 Siemens Medial Solutions Health Services Corporation System for managing object storage and retrieval in partitioned storage media
US7769823B2 (en) * 2001-09-28 2010-08-03 F5 Networks, Inc. Method and system for distributing requests for content
JP4087097B2 (en) * 2001-11-12 2008-05-14 株式会社日立製作所 Data relocation method and data relocation method considering database management system information
US7603518B2 (en) 2005-12-19 2009-10-13 Commvault Systems, Inc. System and method for improved media identification in a storage device
US7596586B2 (en) 2003-04-03 2009-09-29 Commvault Systems, Inc. System and method for extended media retention
US20030101155A1 (en) * 2001-11-23 2003-05-29 Parag Gokhale Method and system for scheduling media exports
US8346733B2 (en) 2006-12-22 2013-01-01 Commvault Systems, Inc. Systems and methods of media management, such as management of media to and from a media storage library
US20030110190A1 (en) * 2001-12-10 2003-06-12 Hitachi, Ltd. Method and system for file space management
US20030115204A1 (en) * 2001-12-14 2003-06-19 Arkivio, Inc. Structure of policy information for storage, network and data management applications
TW528975B (en) * 2001-12-19 2003-04-21 Hon Hai Prec Ind Co Ltd Distributed project management system and the method thereof
JP4130076B2 (en) * 2001-12-21 2008-08-06 富士通株式会社 Database management program and recording medium
JP2003190083A (en) * 2001-12-21 2003-07-08 Olympus Optical Co Ltd Endoscope image filing apparatus
US20050066219A1 (en) 2001-12-28 2005-03-24 James Hoffman Personal digital server pds
US6922757B2 (en) * 2002-02-15 2005-07-26 Exanet Inc. Flexible and adaptive read and write storage system architecture
US7126712B2 (en) * 2002-03-29 2006-10-24 Kabushiki Kaisha Toshiba File based request queue handler
US20030187820A1 (en) * 2002-03-29 2003-10-02 Michael Kohut Media management system and process
US7302435B2 (en) * 2002-03-29 2007-11-27 Sony Corporation Media storage and management system and process
US7743143B2 (en) * 2002-05-03 2010-06-22 Oracle America, Inc. Diagnosability enhancements for multi-level secure operating environments
JP4211285B2 (en) * 2002-05-24 2009-01-21 株式会社日立製作所 Method and apparatus for virtual unification of network storage system
US6980987B2 (en) 2002-06-28 2005-12-27 Alto Technology Resources, Inc. Graphical user interface-relational database access system for a robotic archive
US6980698B2 (en) * 2002-07-22 2005-12-27 Xerox Corporation Image finder method and apparatus for pixography and other photo-related reproduction applications
AU2003253196A1 (en) * 2002-07-26 2004-02-23 Ron Everett Data management architecture associating generic data items using reference
US20040049513A1 (en) * 2002-08-30 2004-03-11 Arkivio, Inc. Techniques for moving stub files without recalling data
US20040083202A1 (en) * 2002-08-30 2004-04-29 Arkivio, Inc. Techniques to control recalls in storage management applications
AU2003270482A1 (en) 2002-09-09 2004-03-29 Commvault Systems, Inc. Dynamic storage device pooling in a computer system
US7246275B2 (en) * 2002-09-10 2007-07-17 Exagrid Systems, Inc. Method and apparatus for managing data integrity of backup and disaster recovery data
US20040054674A1 (en) * 2002-09-13 2004-03-18 Carpenter Keith A. Enabling a web application to access a protected file on a secured server
CA2499073C (en) 2002-09-16 2013-07-23 Commvault Systems, Inc. Combined stream auxiliary copy system and method
US20040098421A1 (en) * 2002-11-18 2004-05-20 Luosheng Peng Scheduling updates of electronic files
US7412433B2 (en) * 2002-11-19 2008-08-12 International Business Machines Corporation Hierarchical storage management using dynamic tables of contents and sets of tables of contents
WO2004051481A1 (en) * 2002-12-02 2004-06-17 Arkivio Inc. Data recovery techniques in storage systems
AU2002953384A0 (en) * 2002-12-16 2003-01-09 Canon Kabushiki Kaisha Method and apparatus for image metadata entry
AU2003299668A1 (en) * 2002-12-18 2004-07-22 Emc Corporation Resource allocation aware queuing of requests for media resources
US7386529B2 (en) * 2002-12-19 2008-06-10 Mathon Systems, Inc. System and method for managing content with event driven actions to facilitate workflow and other features
US7136885B2 (en) * 2002-12-20 2006-11-14 International Business Machines Corporation Free space management
US20040146221A1 (en) * 2003-01-23 2004-07-29 Siegel Scott H. Radiography Image Management System
US20050044173A1 (en) * 2003-02-28 2005-02-24 Olander Daryl B. System and method for implementing business processes in a portal
JP4322031B2 (en) * 2003-03-27 2009-08-26 株式会社日立製作所 Storage device
US7174433B2 (en) 2003-04-03 2007-02-06 Commvault Systems, Inc. System and method for dynamically sharing media in a computer network
US6959362B2 (en) * 2003-05-07 2005-10-25 Microsoft Corporation Caching based on access rights in connection with a content management server system or the like
US20050188248A1 (en) * 2003-05-09 2005-08-25 O'brien John Scalable storage architecture
US20050021566A1 (en) * 2003-05-30 2005-01-27 Arkivio, Inc. Techniques for facilitating backup and restore of migrated files
WO2004109556A1 (en) * 2003-05-30 2004-12-16 Arkivio, Inc. Operating on migrated files without recalling data
US7467168B2 (en) * 2003-06-18 2008-12-16 International Business Machines Corporation Method for mirroring data at storage locations
US7454569B2 (en) 2003-06-25 2008-11-18 Commvault Systems, Inc. Hierarchical system and method for performing storage operations in a computer network
AU2003903967A0 (en) * 2003-07-30 2003-08-14 Canon Kabushiki Kaisha Distributed data caching in hybrid peer-to-peer systems
US20050044333A1 (en) * 2003-08-19 2005-02-24 Browning James V. Solid-state information storage device
US7146388B2 (en) 2003-10-07 2006-12-05 International Business Machines Corporation Method, system, and program for archiving files
US7107416B2 (en) * 2003-09-08 2006-09-12 International Business Machines Corporation Method, system, and program for implementing retention policies to archive records
US7117322B2 (en) * 2003-09-08 2006-10-03 International Business Machines Corporation Method, system, and program for retention management and protection of stored objects
US7827591B2 (en) * 2003-10-08 2010-11-02 Fmr Llc Management of hierarchical reference data
US7315923B2 (en) 2003-11-13 2008-01-01 Commvault Systems, Inc. System and method for combining data streams in pipelined storage operations in a storage network
US20050174869A1 (en) * 2003-11-13 2005-08-11 Rajiv Kottomtharayil System and method for data storage and tracking
US7546324B2 (en) 2003-11-13 2009-06-09 Commvault Systems, Inc. Systems and methods for performing storage operations using network attached storage
US20050138306A1 (en) * 2003-12-19 2005-06-23 Panchbudhe Ankur P. Performance of operations on selected data in a storage area
US7103740B1 (en) * 2003-12-31 2006-09-05 Veritas Operating Corporation Backup mechanism for a multi-class file system
US8127095B1 (en) 2003-12-31 2012-02-28 Symantec Operating Corporation Restore mechanism for a multi-class file system
US7225211B1 (en) 2003-12-31 2007-05-29 Veritas Operating Corporation Multi-class storage mechanism
US8825591B1 (en) 2003-12-31 2014-09-02 Symantec Operating Corporation Dynamic storage mechanism
US7293133B1 (en) 2003-12-31 2007-11-06 Veritas Operating Corporation Performing operations without requiring split mirrors in a multi-class file system
US7836021B2 (en) * 2004-01-15 2010-11-16 Xerox Corporation Method and system for managing image files in a hierarchical storage management system
US20050171977A1 (en) * 2004-02-02 2005-08-04 Osborne James W. Methods, systems and products for data preservation
WO2005093579A1 (en) * 2004-03-25 2005-10-06 Softbank Bb Corp. Backup system and backup method
US8099407B2 (en) 2004-03-31 2012-01-17 Google Inc. Methods and systems for processing media files
US8131674B2 (en) 2004-06-25 2012-03-06 Apple Inc. Methods and systems for managing data
US9081872B2 (en) * 2004-06-25 2015-07-14 Apple Inc. Methods and systems for managing permissions data and/or indexes
US20050289127A1 (en) * 2004-06-25 2005-12-29 Dominic Giampaolo Methods and systems for managing data
US8561076B1 (en) * 2004-06-30 2013-10-15 Emc Corporation Prioritization and queuing of media requests
US7206790B2 (en) * 2004-07-13 2007-04-17 Hitachi, Ltd. Data management system
US7177883B2 (en) * 2004-07-15 2007-02-13 Hitachi, Ltd. Method and apparatus for hierarchical storage management based on data value and user interest
US7739303B2 (en) * 2004-07-22 2010-06-15 International Business Machines Corporation Method, system and program product for verifying access to a data object
US8805934B2 (en) 2004-09-02 2014-08-12 Vmware, Inc. System and method for enabling an external-system view of email attachments
US20060059172A1 (en) * 2004-09-10 2006-03-16 International Business Machines Corporation Method and system for developing data life cycle policies
US20060069815A1 (en) * 2004-09-29 2006-03-30 Warner William T Iii Method and system for initiating a task and receiving task data via TWAIN interface from network-attached peripheral
WO2006047218A2 (en) * 2004-10-21 2006-05-04 Createthe, Llc System and method for managing creative assets via user interface
US20060101084A1 (en) * 2004-10-25 2006-05-11 International Business Machines Corporation Policy based data migration in a hierarchical data storage system
WO2006053084A2 (en) 2004-11-05 2006-05-18 Commvault Systems, Inc. Method and system of pooling storage devices
US7536291B1 (en) 2004-11-08 2009-05-19 Commvault Systems, Inc. System and method to support simulated storage operations
US8954393B1 (en) 2004-11-24 2015-02-10 Symantec Operating Corporation Opportunistic performance allocations in data storage systems
US8055622B1 (en) 2004-11-30 2011-11-08 Symantec Operating Corporation Immutable data containers in tiered storage hierarchies
US8059539B2 (en) * 2004-12-29 2011-11-15 Hewlett-Packard Development Company, L.P. Link throughput enhancer
US7404039B2 (en) * 2005-01-13 2008-07-22 International Business Machines Corporation Data migration with reduced contention and increased speed
US20060206507A1 (en) * 2005-02-16 2006-09-14 Dahbour Ziyad M Hierarchal data management
CA2603380A1 (en) 2005-03-30 2006-10-05 Welch Allyn, Inc. Communication of information between a plurality of network elements
US20060235716A1 (en) * 2005-04-15 2006-10-19 General Electric Company Real-time interactive completely transparent collaboration within PACS for planning and consultation
US20060235936A1 (en) * 2005-04-15 2006-10-19 General Electric Company System and method for PACS workstation conferencing
US8315993B2 (en) * 2005-05-13 2012-11-20 International Business Machines Corporation Policy decision stash for storage lifecycle management
US20060259461A1 (en) * 2005-05-16 2006-11-16 Rajesh Kapur Method and system for preserving access to deleted and overwritten documents by means of a system recycle bin
US7293170B2 (en) * 2005-06-06 2007-11-06 Tranxition Corporation Changing the personality of a device by intercepting requests for personality information
US20070028302A1 (en) * 2005-07-29 2007-02-01 Bit 9, Inc. Distributed meta-information query in a network
US20070033247A1 (en) * 2005-08-02 2007-02-08 The Mathworks, Inc. Methods and system for distributing data to technical computing workers
US7660834B2 (en) * 2005-08-17 2010-02-09 International Business Machines Corporation Maintaining an aggregate including active files in a storage pool
US7634516B2 (en) * 2005-08-17 2009-12-15 International Business Machines Corporation Maintaining an aggregate including active files in a storage pool in a random access medium
US9401080B2 (en) 2005-09-07 2016-07-26 Verizon Patent And Licensing Inc. Method and apparatus for synchronizing video frames
US20070107012A1 (en) * 2005-09-07 2007-05-10 Verizon Business Network Services Inc. Method and apparatus for providing on-demand resource allocation
US8631226B2 (en) 2005-09-07 2014-01-14 Verizon Patent And Licensing Inc. Method and system for video monitoring
US9076311B2 (en) 2005-09-07 2015-07-07 Verizon Patent And Licensing Inc. Method and apparatus for providing remote workflow management
US7941620B2 (en) * 2005-09-12 2011-05-10 International Business Machines Corporation Double-allocation data-replication system
US20070083482A1 (en) * 2005-10-08 2007-04-12 Unmesh Rathi Multiple quality of service file system
US7343447B2 (en) 2005-11-08 2008-03-11 International Business Machines Corporation Method and system for synchronizing direct access storage volumes
US20070130232A1 (en) * 2005-11-22 2007-06-07 Therrien David G Method and apparatus for efficiently storing and managing historical versions and replicas of computer data files
US7543125B2 (en) 2005-12-19 2009-06-02 Commvault Systems, Inc. System and method for performing time-flexible calendric storage operations
EP2312470B1 (en) * 2005-12-21 2018-09-12 Digimarc Corporation Rules driven pan ID metadata routing system and network
TWI301021B (en) * 2005-12-27 2008-09-11 Ind Tech Res Inst File distribution and access system and method for file management
US20070156691A1 (en) * 2006-01-05 2007-07-05 Microsoft Corporation Management of user access to objects
US7265928B2 (en) * 2006-01-24 2007-09-04 Imation Corp. Data storage cartridge with worm write-protection
US7475077B2 (en) * 2006-01-31 2009-01-06 International Business Machines Corporation System and method for emulating a virtual boundary of a file system for data management at a fileset granularity
US20070208780A1 (en) * 2006-03-02 2007-09-06 Anglin Matthew J Apparatus, system, and method for maintaining metadata for offline repositories in online databases for efficient access
KR100888593B1 (en) * 2006-03-14 2009-03-12 삼성전자주식회사 Method and apparatus for contents management
US20070220029A1 (en) * 2006-03-17 2007-09-20 Novell, Inc. System and method for hierarchical storage management using shadow volumes
WO2008027626A2 (en) 2006-04-25 2008-03-06 Secure Network Systems, Llc Logical and physical security
JP4912026B2 (en) * 2006-04-27 2012-04-04 キヤノン株式会社 Information processing apparatus and information processing method
US20070255714A1 (en) * 2006-05-01 2007-11-01 Nokia Corporation XML document permission control with delegation and multiple user identifications
US20080021865A1 (en) * 2006-07-20 2008-01-24 International Business Machines Corporation Method, system, and computer program product for dynamically determining data placement
JP4082437B2 (en) * 2006-08-18 2008-04-30 富士ゼロックス株式会社 Information management apparatus and information management program
US20080068673A1 (en) * 2006-09-14 2008-03-20 Steinar Kolbu Digital image information system and method for providing an image object in a digital image information system
US20080077638A1 (en) * 2006-09-21 2008-03-27 Microsoft Corporation Distributed storage in a computing environment
US7539783B2 (en) 2006-09-22 2009-05-26 Commvault Systems, Inc. Systems and methods of media management, such as management of media to and from a media storage library, including removable media
CA2665556A1 (en) 2006-10-04 2008-04-17 Welch Allyn, Inc. Dynamic medical object information base
US8484108B2 (en) * 2006-11-17 2013-07-09 International Business Machines Corporation Tracking entities during identity resolution
US8719809B2 (en) 2006-12-22 2014-05-06 Commvault Systems, Inc. Point in time rollback and un-installation of software
US7831566B2 (en) 2006-12-22 2010-11-09 Commvault Systems, Inc. Systems and methods of hierarchical storage management, such as global management of storage operations
US8312323B2 (en) 2006-12-22 2012-11-13 Commvault Systems, Inc. Systems and methods for remote monitoring in a computer network and reporting a failed migration operation without accessing the data being moved
US9940345B2 (en) * 2007-01-10 2018-04-10 Norton Garfinkle Software method for data storage and retrieval
US20080183680A1 (en) * 2007-01-31 2008-07-31 Laurent Meynier Documents searching on peer-to-peer computer systems
US20080222513A1 (en) * 2007-03-07 2008-09-11 Altep, Inc. Method and System for Rules-Based Tag Management in a Document Review System
JP5081498B2 (en) * 2007-05-24 2012-11-28 株式会社日立製作所 Computer system and control method thereof
US7788233B1 (en) * 2007-07-05 2010-08-31 Amazon Technologies, Inc. Data store replication for entity based partition
US8412792B2 (en) * 2007-07-31 2013-04-02 Brent Young Network file transfer and caching system
KR101498673B1 (en) 2007-08-14 2015-03-09 삼성전자주식회사 Solid state drive, data storing method thereof, and computing system including the same
US8706976B2 (en) 2007-08-30 2014-04-22 Commvault Systems, Inc. Parallel access virtual tape library and drives
US7895242B2 (en) * 2007-10-31 2011-02-22 Microsoft Corporation Compressed storage management
US8201145B2 (en) * 2007-11-13 2012-06-12 International Business Machines Corporation System and method for workflow-driven data storage
US20090150355A1 (en) * 2007-11-28 2009-06-11 Norton Garfinkle Software method for data storage and retrieval
US8572043B2 (en) * 2007-12-20 2013-10-29 International Business Machines Corporation Method and system for storage of unstructured data for electronic discovery in external data stores
US8112406B2 (en) * 2007-12-21 2012-02-07 International Business Machines Corporation Method and apparatus for electronic data discovery
US8140494B2 (en) 2008-01-21 2012-03-20 International Business Machines Corporation Providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery
JP5136159B2 (en) * 2008-03-31 2013-02-06 富士通株式会社 Configuration information management apparatus, configuration information management program, and configuration information management method
US20090286219A1 (en) * 2008-05-15 2009-11-19 Kisin Roman Conducting a virtual interview in the context of a legal matter
US8620923B1 (en) 2008-05-30 2013-12-31 Adobe Systems Incorporated System and method for storing meta-data indexes within a computer storage system
US8549007B1 (en) 2008-05-30 2013-10-01 Adobe Systems Incorporated System and method for indexing meta-data in a computer storage system
US8135839B1 (en) 2008-05-30 2012-03-13 Adobe Systems Incorporated System and method for locking exclusive access to a divided resource
US8275720B2 (en) * 2008-06-12 2012-09-25 International Business Machines Corporation External scoping sources to determine affected people, systems, and classes of information in legal matters
US9830563B2 (en) 2008-06-27 2017-11-28 International Business Machines Corporation System and method for managing legal obligations for data
US8484069B2 (en) * 2008-06-30 2013-07-09 International Business Machines Corporation Forecasting discovery costs based on complex and incomplete facts
US8327384B2 (en) * 2008-06-30 2012-12-04 International Business Machines Corporation Event driven disposition
US20100017239A1 (en) * 2008-06-30 2010-01-21 Eric Saltzman Forecasting Discovery Costs Using Historic Data
US8489439B2 (en) * 2008-06-30 2013-07-16 International Business Machines Corporation Forecasting discovery costs based on complex and incomplete facts
US8073729B2 (en) * 2008-09-30 2011-12-06 International Business Machines Corporation Forecasting discovery costs based on interpolation of historic event patterns
US7792945B2 (en) * 2008-06-30 2010-09-07 Pss Systems, Inc. Method and apparatus for managing the disposition of data in systems when data is on legal hold
US8515924B2 (en) * 2008-06-30 2013-08-20 International Business Machines Corporation Method and apparatus for handling edge-cases of event-driven disposition
US20100042655A1 (en) * 2008-08-18 2010-02-18 Xerox Corporation Method for selective compression for planned degradation and obsolence of files
US20100070466A1 (en) 2008-09-15 2010-03-18 Anand Prahlad Data transfer techniques within data storage devices, such as network attached storage performing data migration
US8204869B2 (en) 2008-09-30 2012-06-19 International Business Machines Corporation Method and apparatus to define and justify policy requirements using a legal reference library
GB2464980A (en) * 2008-10-31 2010-05-05 Symbian Software Ltd Method of associating and labeling primary and secondary files
US8838796B2 (en) 2008-12-19 2014-09-16 Adobe Systems Incorporated System and method for allocating online storage to computer users
US20100191708A1 (en) * 2009-01-23 2010-07-29 International Business Machines Corporation Synchronous Deletion of Managed Files
US20100218122A1 (en) * 2009-02-20 2010-08-26 Microsoft Corporation Asynchronously uploading and resizing content in web-based applications
WO2010102176A1 (en) 2009-03-06 2010-09-10 Vetrix, Llc Systems and methods for mobile tracking, communications and alerting
US8027960B2 (en) * 2009-03-11 2011-09-27 International Business Machines Corporation Intelligent deletion of elements to maintain referential integrity of dynamically assembled components in a content management system
EP2270692A1 (en) * 2009-06-30 2011-01-05 Hasso-Plattner-Institut für Softwaresystemtechnik GmbH Lifecycle-based horizontal partitioning
US9448730B2 (en) * 2009-09-30 2016-09-20 International Business Machines Corporation Method and apparatus for dispersed storage data transfer
US20170147219A1 (en) * 2009-09-30 2017-05-25 International Business Machines Corporation Utilization of solid-state memory devices in a dispersed storage network
US8655856B2 (en) * 2009-12-22 2014-02-18 International Business Machines Corporation Method and apparatus for policy distribution
US8250041B2 (en) * 2009-12-22 2012-08-21 International Business Machines Corporation Method and apparatus for propagation of file plans from enterprise retention management applications to records management systems
US8285692B2 (en) * 2010-01-15 2012-10-09 Oracle America, Inc. Method and system for attribute encapsulated data resolution and transcoding
US8578107B2 (en) * 2010-02-16 2013-11-05 International Business Machines Corporation Extent migration scheduling for multi-tier storage architectures
US8463828B2 (en) * 2010-02-24 2013-06-11 Salesforce.Com, Inc. System, method and computer program product for storing file system content in a multi-tenant on-demand database system
US8850003B2 (en) 2010-04-20 2014-09-30 Zte Corporation Method and system for hierarchical tracking of content and cache for networking and distribution to wired and mobile devices
CN102947681B (en) 2010-04-20 2016-05-18 惠普发展公司,有限责任合伙企业 Strengthen luminous automatic layout, luminous enhance device for surface
US20110314071A1 (en) * 2010-06-17 2011-12-22 Openwave Systems Inc. Metadata-based data access and control
US8566903B2 (en) 2010-06-29 2013-10-22 International Business Machines Corporation Enterprise evidence repository providing access control to collected artifacts
US8832148B2 (en) 2010-06-29 2014-09-09 International Business Machines Corporation Enterprise evidence repository
US8402359B1 (en) 2010-06-30 2013-03-19 International Business Machines Corporation Method and apparatus for managing recent activity navigation in web applications
US8732219B1 (en) * 2010-08-25 2014-05-20 United Services Automobile Association (Usaa) Method and system for determining correlated geographic areas
US9244779B2 (en) 2010-09-30 2016-01-26 Commvault Systems, Inc. Data recovery operations, such as recovery from modified network data management protocol data
WO2012054027A1 (en) 2010-10-20 2012-04-26 Hewlett-Packard Development Company, L.P. Chemical-analysis device integrated with metallic-nanofinger device for chemical sensing
US9274058B2 (en) 2010-10-20 2016-03-01 Hewlett-Packard Development Company, L.P. Metallic-nanofinger device for chemical sensing
US9021198B1 (en) 2011-01-20 2015-04-28 Commvault Systems, Inc. System and method for sharing SAN storage
US9721033B2 (en) 2011-02-28 2017-08-01 Micro Focus Software Inc. Social networking content management
US9116914B1 (en) * 2011-04-18 2015-08-25 American Megatrends, Inc. Data migration between multiple tiers in a storage system using policy based ILM for QOS
US9563714B2 (en) 2011-06-16 2017-02-07 Microsoft Technology Licensing Llc. Mapping selections between a browser and the original file fetched from a web server
US9753699B2 (en) 2011-06-16 2017-09-05 Microsoft Technology Licensing, Llc Live browser tooling in an integrated development environment
US9460224B2 (en) * 2011-06-16 2016-10-04 Microsoft Technology Licensing Llc. Selection mapping between fetched files and source files
US8799358B2 (en) 2011-11-28 2014-08-05 Merge Healthcare Incorporated Remote cine viewing of medical images on a zero-client application
US9529871B2 (en) 2012-03-30 2016-12-27 Commvault Systems, Inc. Information management of mobile device data
US10831728B2 (en) * 2012-05-29 2020-11-10 International Business Machines Corporation Application-controlled sub-LUN level data migration
US10817202B2 (en) * 2012-05-29 2020-10-27 International Business Machines Corporation Application-controlled sub-LUN level data migration
US10831727B2 (en) * 2012-05-29 2020-11-10 International Business Machines Corporation Application-controlled sub-LUN level data migration
US8738585B2 (en) 2012-07-13 2014-05-27 Symantec Corporation Restore software with aggregated view of site collections
US8712971B2 (en) 2012-07-13 2014-04-29 Symantec Corporation Restore software with aggregated view of content databases
US9596288B2 (en) 2012-12-04 2017-03-14 Pixia Corp. Method and system of requesting information from a server computer
US10379988B2 (en) 2012-12-21 2019-08-13 Commvault Systems, Inc. Systems and methods for performance monitoring
US9069799B2 (en) 2012-12-27 2015-06-30 Commvault Systems, Inc. Restoration of centralized data storage manager, such as data storage manager in a hierarchical data storage system
US9087138B2 (en) 2013-01-15 2015-07-21 Xiaofan Zhou Method for representing and storing hierarchical data in a columnar format
WO2014149025A1 (en) * 2013-03-18 2014-09-25 Ge Intelligent Platforms, Inc. Apparatus and method for optimizing time series data store usage
US9971796B2 (en) 2013-04-25 2018-05-15 Amazon Technologies, Inc. Object storage using multiple dimensions of object information
US10102148B2 (en) 2013-06-13 2018-10-16 Microsoft Technology Licensing, Llc Page-based compressed storage management
EA201301239A1 (en) * 2013-10-28 2015-04-30 Общество С Ограниченной Ответственностью "Параллелз" METHOD FOR PLACING A NETWORK SITE USING VIRTUAL HOSTING
US9400739B2 (en) 2013-11-01 2016-07-26 International Business Machines Corporation Capacity forecasting based on capacity policies and transactions
US9684625B2 (en) 2014-03-21 2017-06-20 Microsoft Technology Licensing, Llc Asynchronously prefetching sharable memory pages
JP6245045B2 (en) * 2014-04-08 2017-12-13 コニカミノルタ株式会社 Medical imaging system for diagnosis
US9886447B2 (en) * 2014-08-22 2018-02-06 International Business Machines Corporation Performance of asynchronous replication in HSM integrated storage systems
US20170331772A1 (en) * 2014-10-27 2017-11-16 Clutch Group, Llc Chat Log Analyzer
GB2532039B (en) * 2014-11-06 2016-09-21 Ibm Secure database backup and recovery
US9898213B2 (en) 2015-01-23 2018-02-20 Commvault Systems, Inc. Scalable auxiliary copy processing using media agent resources
US9904481B2 (en) 2015-01-23 2018-02-27 Commvault Systems, Inc. Scalable auxiliary copy processing in a storage management system using media agent resources
US9632924B2 (en) 2015-03-02 2017-04-25 Microsoft Technology Licensing, Llc Using memory compression to reduce memory commit charge
US9928144B2 (en) 2015-03-30 2018-03-27 Commvault Systems, Inc. Storage management of data using an open-archive architecture, including streamlined access to primary data originally stored on network-attached storage and archived to secondary storage
US10037270B2 (en) 2015-04-14 2018-07-31 Microsoft Technology Licensing, Llc Reducing memory commit charge when compressing memory
US20170060980A1 (en) * 2015-08-24 2017-03-02 International Business Machines Corporation Data activity tracking
US10101913B2 (en) 2015-09-02 2018-10-16 Commvault Systems, Inc. Migrating data to disk without interrupting running backup operations
JP6418400B2 (en) * 2015-12-17 2018-11-07 京セラドキュメントソリューションズ株式会社 Electronic equipment and information processing program
US9973647B2 (en) * 2016-06-17 2018-05-15 Microsoft Technology Licensing, Llc. Suggesting image files for deletion based on image file parameters
US10585854B2 (en) * 2016-06-24 2020-03-10 Box, Inc. Establishing and enforcing selective object deletion operations on cloud-based shared content
US10379743B1 (en) * 2016-06-24 2019-08-13 EMC IP Holding Company LLC Offloaded delete operations
US10715629B2 (en) 2017-02-28 2020-07-14 Google Llc Seamless context switch
US11010261B2 (en) 2017-03-31 2021-05-18 Commvault Systems, Inc. Dynamically allocating streams during restoration of data
US11048671B2 (en) * 2017-10-18 2021-06-29 Quantum Corporation Automated storage tier copy expiration
US10742735B2 (en) 2017-12-12 2020-08-11 Commvault Systems, Inc. Enhanced network attached storage (NAS) services interfacing to cloud storage
US11782882B2 (en) 2018-01-22 2023-10-10 Jpmorgan Chase Bank, N.A. Methods for automated artifact storage management and devices thereof
US10908940B1 (en) 2018-02-26 2021-02-02 Amazon Technologies, Inc. Dynamically managed virtual server system
US20190304609A1 (en) * 2018-03-28 2019-10-03 Konica Minolta Healthcare Americas, Inc. Deletion of medical images in cloud-based storage
US11175845B2 (en) * 2018-04-05 2021-11-16 International Business Machines Corporation Adding a migration file group to a hierarchical storage management (HSM) system for data co-location
US10776009B2 (en) * 2019-01-03 2020-09-15 International Business Machines Corporation Journaling on an appendable non-volatile memory module
US11175846B2 (en) 2019-04-11 2021-11-16 International Business Machines Corporation Data co-location in a hierarchical storage management (HSM) system
US11238107B2 (en) * 2020-01-06 2022-02-01 International Business Machines Corporation Migrating data files to magnetic tape according to a query having one or more predefined criterion and one or more query expansion profiles
US11593223B1 (en) 2021-09-02 2023-02-28 Commvault Systems, Inc. Using resource pool administrative entities in a data storage management system to provide shared infrastructure to tenants

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0474395A2 (en) * 1990-09-07 1992-03-11 International Business Machines Corporation Data storage hierarchy with shared storage level

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276860A (en) 1989-12-19 1994-01-04 Epoch Systems, Inc. Digital data processor with improved backup storage
US5276867A (en) 1989-12-19 1994-01-04 Epoch Systems, Inc. Digital data storage system with improved data migration
US5317728A (en) 1990-09-07 1994-05-31 International Business Machines Corporation Storage management of a first file system using a second file system containing surrogate files and catalog management information
JP2550239B2 (en) 1991-09-12 1996-11-06 株式会社日立製作所 External storage system
US5367698A (en) 1991-10-31 1994-11-22 Epoch Systems, Inc. Network file migration system
JPH0659982A (en) 1992-08-10 1994-03-04 Hitachi Ltd Method and device for controlling virtual storage
US5991753A (en) 1993-06-16 1999-11-23 Lachman Technology, Inc. Method and system for computer file management, including file migration, special handling, and associating extended attributes with files
US5537585A (en) 1994-02-25 1996-07-16 Avail Systems Corporation Data storage management for network interconnected processors
JP3796551B2 (en) 1994-04-25 2006-07-12 ソニー株式会社 Information storage processing device
US5557790A (en) 1994-06-21 1996-09-17 International Business Machines Corp. Facility for the generic storage and management of multimedia objects
US5579516A (en) * 1994-12-15 1996-11-26 Hewlett-Packard Company Method for storing data files on a multiple volume media set
US5778395A (en) * 1995-10-23 1998-07-07 Stac, Inc. System for backing up files from disk volumes on multiple nodes of a computer network
US5878410A (en) * 1996-09-13 1999-03-02 Microsoft Corporation File system sort order indexes
US6026474A (en) * 1996-11-22 2000-02-15 Mangosoft Corporation Shared client-side web caching using globally addressable memory
US6148377A (en) * 1996-11-22 2000-11-14 Mangosoft Corporation Shared memory computer networks
US5822780A (en) 1996-12-31 1998-10-13 Emc Corporation Method and apparatus for hierarchical storage management for data base management systems
US6108713A (en) * 1997-02-11 2000-08-22 Xaqti Corporation Media access control architectures and network management systems

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0474395A2 (en) * 1990-09-07 1992-03-11 International Business Machines Corporation Data storage hierarchy with shared storage level

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"GRAPHICAL USER INTERFACE FOR THE DISTRIBUTED COMPUTING ENVIRONMENT" IBM TECHNICAL DISCLOSURE BULLETIN, vol. 38, no. 1, 1 January 1995 (1995-01-01), page 409/410 XP000498815 ISSN: 0018-8689 *
"HIERARCHICAL VIEW OF FILESETS" IBM TECHNICAL DISCLOSURE BULLETIN, vol. 36, no. 5, 1 May 1993 (1993-05-01), page 167 XP000408950 ISSN: 0018-8689 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG103289A1 (en) * 2001-05-25 2004-04-29 Meng Soon Cheo System for indexing textual and non-textual files
EP1412886A1 (en) * 2001-08-02 2004-04-28 Christian R. M. Singfield Network image server
EP1412886A4 (en) * 2001-08-02 2006-08-16 Sautec Pty Ltd Network image server
AU2002322153B2 (en) * 2001-08-02 2007-10-04 Sautec Pty Ltd Network image server
EP1643352A1 (en) * 2003-07-02 2006-04-05 Satoshi Yamatake Database system
EP1643352A4 (en) * 2003-07-02 2009-03-04 Satoshi Yamatake Database system
DE10332245A1 (en) * 2003-07-16 2005-02-17 Siemens Ag Operating method for an imaging medical device
DE10332245B4 (en) * 2003-07-16 2006-07-13 Siemens Ag Operating method for an imaging medical device with on-line detection of useful signals with simultaneous permanent storage of the measurement signals or raw signals on which the useful signals are based
US7316234B2 (en) 2003-07-16 2008-01-08 Siemens Aktiengesellschaft Medical imaging installation and operating method for reading stored signals to reconstruct a three-dimensional image of a subject
WO2006040258A2 (en) * 2004-10-15 2006-04-20 Agfa Inc. Image archiving system and method
WO2006040258A3 (en) * 2004-10-15 2008-02-28 Agfa Inc Image archiving system and method
US7281084B1 (en) 2005-01-12 2007-10-09 Emc Corporation Method and apparatus for modifying a retention period
WO2006076482A1 (en) * 2005-01-12 2006-07-20 Emc Corporation Methods and apparatus for managing deletion of data
US7428621B1 (en) 2005-01-12 2008-09-23 Emc Corporation Methods and apparatus for storing a reflection on a storage system
US7698516B2 (en) 2005-01-12 2010-04-13 Emc Corporation Methods and apparatus for managing deletion of data
US8055861B2 (en) 2005-01-12 2011-11-08 Emc Corporation Methods and apparatus for managing deletion of data
WO2007088084A3 (en) * 2006-02-03 2007-12-13 Ibm Restoring a file to its proper storage tier in an information lifecycle management environment
US8229897B2 (en) 2006-02-03 2012-07-24 International Business Machines Corporation Restoring a file to its proper storage tier in an information lifecycle management environment
WO2009077789A1 (en) * 2007-12-18 2009-06-25 Bae Systems Plc Improvements relating to data curation
US8874628B1 (en) * 2009-10-15 2014-10-28 Symantec Corporation Systems and methods for projecting hierarchical storage management functions

Also Published As

Publication number Publication date
US6330572B1 (en) 2001-12-11
WO2000004483A3 (en) 2000-06-29

Similar Documents

Publication Publication Date Title
US6330572B1 (en) Hierarchical data storage management
US7418464B2 (en) Method, system, and program for storing data for retrieval and transfer
US5956733A (en) Network archiver system and storage medium storing program to construct network archiver system
US7890554B2 (en) Apparatus and method of exporting file systems without first mounting the file systems
US6714952B2 (en) Method for backup and restore of a multi-lingual network file server
US7801850B2 (en) System of and method for transparent management of data objects in containers across distributed heterogenous resources
US5829001A (en) Database updates over a network
US8412685B2 (en) Method and system for managing data
US6922761B2 (en) Method and system for migrating data
US8392477B2 (en) Seamless remote traversal of multiple NFSv4 exported file systems
US9449007B1 (en) Controlling access to XAM metadata
US7376681B1 (en) Methods and apparatus for accessing information in a hierarchical file system
US20070112836A1 (en) Systems, methods and apparatus for creating stable disk images
US7366836B1 (en) Software system for providing storage system functionality
US7080102B2 (en) Method and system for migrating data while maintaining hard links
US6182151B1 (en) Method and apparatus for batch storage of objects in a client-server storage management system
US6952699B2 (en) Method and system for migrating data while maintaining access to data with use of the same pathname
EP1215590B1 (en) Method and system for scalable, high performance hierarchical storage management
US6954762B2 (en) System and method for characterizing logical storage devices
US7979665B1 (en) Method and apparatus for processing access requests in a computer system
US6968433B2 (en) System and method for controlling the creation of stable disk images
Schrodel Integrating UniTree with the data migration API
Beizer An architecture for large-scale work management systems
Komo et al. Multimedia medical data archive and retrieval server on the Internet
Foster et al. MANAGING THE NETWORK COMPUTER AND ITS STORAGE REQUIREMENTS

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

122 Ep: pct application non-entry in european phase