WO2016122699A1

WO2016122699A1 - Failure atomic update of application data files

Info

Publication number: WO2016122699A1
Application number: PCT/US2015/027195
Authority: WO
Inventors: Anton Ajay MENDEZ; Rajat VERMA; Sandya Srivilliputtur Mannarswamy; Terence P. Kelly; James Hyungsun PARK
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2015-01-30
Filing date: 2015-04-23
Publication date: 2016-08-04

Abstract

In one example, disclosed are techniques to update application data files between open and sync and/or between two consecutive syncs using a syncv operation, file descriptors associated with each modified application data file and associated transaction records in a transaction log file.

Description

FAILURE ATOMIC UPDATE OF APPLICATION DATA FILES

Background

[001] Many applications modify data o durable media, such as storage devices, and any untimely failures during updates/modifications, for example, application process crashes, operating system (OS) kernel panics, power outages and the like, may jeopardize the integrity of the application data.

Brief Description of the Drawings

[002] Examples of the disclosure will now be described in detail with reference to the accompanying drawings, in which:

[003] FIG. 1 illustrates a block diagram of an example system for a mechanism for failure atomic update of application data in application data files in a file system;

[004] FIG 2 illustrates a block diagram of another example system for mechanism for failure atomic update of application data in application data files in a file system;

[005] FIG. 3 illustrates a block diagram illustrating an example implementation of a mechanism for failure atomic updates of application data in application data files in a file system, such as those shown in FIGS . 1 and 2 ;

[006] FIG 4 illustrates a flow chart of an example method for failure atomic update of application data in a application data files in a file system: and

[007] FIG. 5 illustrates a block diagram of an example computing device for a mechanism for applications for failure atomic update of application data in application data files in a file system. Detailed Description

[008] In the following detailed description of the examples of the present subject matter, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific examples in which the present subject matter may be practiced. These examples are described in sufficient detail to practice the present subject matter, and it is to be understood that other examples may be utilized and that changes may be made without departing from the scope of the present subject matter. The following detailed description is. therefore, not to be taken i a limiting sense.

[009] Examples described herein provide enhanced methods, techniques, and systems for a mechanism for applications to perform failure atomic update of application data in application data files in a file system. Generally, failure atomic updates, (consistent modification of application durable data, i.e., the problem of evolving durable application data without fear that failure will preclude recovery to a consistent state) protect integrity of application data from failures, such as process, crashes, kernel panics and/or power outages.

[0010] Typically, file systems strive to protect internal metadata from corruption; however, file systems may not offer corresponding protection for application data, providing neither transactions on application data nor other unified solution to the failure atomic updates problem. Instead, file systems may offer primitives for controlling the order in which application data attains durability; applications may shoulder the burden of restoring consistency to their data following failures. Consider for example, the task of failure-atomically updating a set of configuration files scattered throughout a directory tree atop a post operating system for Unix (POSIX)- like file systems. In such a scenario, vast majority of file systems may not provide a straightforward operation that the failure atomic updates demand: the ability to modify application in (sets of) files failure atomic-ally and efficiently. [001 ί] Some existing mechanisms may provide imperfect support for solving failure atomic updates problem. Further, existing file systems may offer limited support for failure atomic updates, may be due to problems associated with operation system (OS) interfaces. For example, POSFX may permit write to succeed partially, making it difficult to define atomic semantics for this call. Further for example, synchronization calls, such as fsync and msync may constrain the order in which application data reaches durable media. However, applications generally remain responsible for reconstructing a consistent state of their data following a crash. Sometimes, applications may circumvent the need for recovery by using the one failure-atomic mechanism provided in conventional file systems, i.e., the file rename. For example, desktop applications can open a temporary file, write the entire modified contents of a file to it, then use the rename to implement an atomic file update - a reasonable expedient for small files but may be untenable for large files. Further, some existing mechanisms may require special hardware and may apply only to single-file updates, and may not address modifications to memory-mapped files. Furthermore in some existing mechanisms, transaction size, i.e., size of atomically modified data in a file may be limited by the size of the journal, which may carry substantial overheads. In addition, a journal based implementations of failure-atomic sync operation may suffer at least two shortcomings, one being a need to run a modified kernel that may impede adoption, and the other being use of the file system journal that can limit transaction sizes.

[0012] To help address these issues, the present disclosure describes various example mechanisms for applications for failure atomic update of application data in application data files in a file system. In one example, a simple interface to file system may offer applications a guarantee that the application data in files always reflects the most recent successful sync mechanism, such as syncv operation, on the files. In this example, syncv operation takes as arguments an array of file descriptors. On return from a call to syncv the file system guarantees that all passed cached blocks in the list of files are flushed to a stable storage. If all files passed to syncv operation were opened with the atomic flag, then syncv operation guarantees that all the files are atomically flushed to a stable storage. Further, the interface to the file system offers the syncv mechanism along with transaction records in transaction log file that faihire-atomically commits changes to files. Furthermore, failure-injection test verifies that the file system protects the integrity of application data from crashes. In addition, the interface to the file system runs on conventional hardware and operating system and the mechanism is implementable in any file system that supports per-file writable snapshots.

[0013] In addition, the example implementations describe a simple interlace to the file system that generalizes failure-atomic variants of write and sync operations. If files are opened with atomic flags, the state of their application data always reflects the most recent successful sync operation, such as syncv. Further, the size of atomic updates to the files is only limited by the tree space in the file system and not by the file system journal. Furthermore, opening each of the files including an atomic flag guarantees that the file's application data reflects the most recent synchronization operation regardless of whether the file was modified with interlaces, such as write and/or mmap families of interfaces. Atomic flag may be implemented in a file system that supports per-file writable snapshots. Also, the syncv operation along with the using the transaction records in a transaction log file described in the present disclosure ensures that the updates to files are atomic in nature. The file system may not rely solely on the file system journal to implement atomic updates, and the size of atomic updates may be limited only by the amount of free space in the file system. Adding the interface to the file system may be relatively easy as it can run on any conventional operating system kernels and requires no special hardware. In this example, syncv operation may be implemented in the existing kernels using a device input and output control (IOCTL) interface.

[001 ] Moreover, the example implementations describe a file system that supports multi-file atomic durability via a syncv mechanism. The example syncv mechanism attains failure atomicity by leveraging the write-ahead logging feature in the journal mechanism of the file system. Further, the example syncv mechanism attains failure atomicity either by updating all modifications made to data blocks of open files or not making any of the updates to a system storage disc, During operation, any modifications to the metadata are written to a transaction log file before the changes are written to a storage disk and further the content of the transaction log file is written to storage disk at regular intervals. During each recovery, file system may read the transaction log file to confirm the file system transactions. All completed transactions may be committed to the storage disk and uncompleted transactions may be undone. In such a scenario, it can be seen that the number of uncommitted records i the transaction log file and not the amount of data in the file system may decide the speed of recovery from a crash.

[0015] The terms "storage media", "durable media", "storag device", and "disk drive" are used interchangeably throughout the document. Also, the terms "file" and "application data file" are used interchangeably throughout the document. Further the terms "sync" "sync operation" refers to "synchronization operation". Furthermore, the term "application" refers to "application software". In addition, the term "file clone" refers to "file's clone". Moreover, the terms "system failure", "untimely failure", and "untimely system failure", as used herein, may refer to process crashes, OS kernel panics, power outages and the like. Also, the terms "syncv^' and "syncv operation" are used interchangeably throughout the document.

[0016] FIG 1 illustrates a block diagram of an example system 100 for a mechanism for applications for failure atomic update of application data in application data files in a file system 106. The system 100 may represent any type of computing device capable of reading machine-executable instructions. Examples of computing device may include, without limitation, a server, a desktop computer, a notebook computer, a tablet computer, a thin client, a mobile device, a personal digital assistant (PDA), a tablet, and the like.

[0017] In the example of FIG. 1 , the system 100 may include a processor 102 and storage device 104 coupled to the processor 102. In an example, the storage device 104 may be a machine readable storage medium (e.g., a disk drive). The machine-readable storage medium may also be an external medium that may be accessible to the system 100. Further, the storage device 104 may include the file system 106. Furthermore, the file system 106 may include a failure atomic update module 108. [0018] For example, the failure atomic update module 108 may refer to software components (machine executable instructions), a hardware component or a combination thereof. The failure atomic update module 108 may include, by way of example, components, such as software components, processes, tasks, co-routines, functions, attributes, procedures, drivers, firmware, data, databases, data structures and Application Specific Integrated Circuits (ASIC). The failure atomic update module 108 may reside in a volatile or non- volatile storage medium and configured to interact with a processor 102 of the system 100.

[0019] In one example, the file system 106 may include data blocks, application data files, snapshots of files, directory and/or file clones implemented by atomic updates as shown in FIG. 3. For example, file clones may include shared data blocks of a file (i.e., primary file) in the file system that are implemented by atomic updates. The file system may decouple logical file hierarchy from the physical storage. The logical file hierarchy layer may implement the naming scheme and portable operating system interface (POSIX) complaint functions, such as, creating, opening, reading, and writing files. The physical storage layer implements write-ahead logging, caching, file storage allocation, file migration, and/or physical disk input/output (I O) functions. This is explained in more detail with reference to FIG. 3.

[0020] In operation, files including associated atomic flags may be opened upon invoking open operations by an application. For example, each opened file may include data blocks: Block 0, Block I, and Block 2 as shown at 302 in FIG3. The atomic flag may indicate the application's desire that changes to the application data in each file may be atomic.

[0021] A file clone including shared data blocks of the file may then be created by the application upon opening each file including the atomic flag. File clone may be a writable snapshot of the file at the time it is opened with using the atomic flag. The file clone may not change with any modification to the data blocks in each file. Further, the file clones may not be visible to the user visible namespace and may exist in a non-visible (hidden) namespace that may be accessible to the operating system (OS). For example, each of file clones CLONE 0 iNODE 1 to CLONE 0 iNODE N may be implemented utilizing a variant of eopy-on-write (COW) operation as shown at 304 in FIG. 3. Further for example, when a file is cloned, a copy of the file's iNODE may be made as sho wn in FIG. 3. Each of iNODE 1 to iNODE N may include each associated file's block map, a data structure that maps logical file offsets to block numbers on the underlying block device as show in FIG 3. For example, it ca be seen in FIG 3, that each of original files FILE iNODE 1 to FILE iNODE N and its associated file clone CLONE 0 iNODE to CLONE 0 iNODE N, respectively, may have identical copies of the block map, and they may initially share the same storage.

[0022] Any modified data blocks in each opened file may be remapped by the file system upon a subsequent modificatio and/or addition to the file by the application. For example, modified data blocks in each file may be remapped using COW operation and leaving the file clone's view of the file unchanged. For example, addition of Block 3 and remapping of added Block 3 via COW is shown at 306 in FIG. 3. It can be seen that the file clone CLONE 0 iNODE still points to the blocks: Block 0, Block I and Block 2 of the file at the time it was opened.

[0023] A operation for synching may then be initiated by the application, which in turn passes file descriptors of each opened file associated with any modified data blocks. Any modified data blocks i each file associated with any of the file descriptors passed via the syncv operation are then flushed into a stable storage media, such as a disk drive. The created t le clone of each opened file associated with the file descriptors passed via the syncv operation may then be deleted using transaction records in a transaction log file residing in a journal sub-system, which facilitates in doing a single transaction deletes of all files passed to syncv to make the application data files failure atomic. In this example, using transaction records in a transaction log such that the deletes of each passed file to syticv appears as a single file system transaction, thereby making the application data failure atomic.

[0024] A new file clone including any modified and unmodified data blocks of each file associated with the file descriptor passed via the

operation may then be created. The state of each file may reflect a logical state of the file at the time the applicatio synched using the sync operation. For example, syncv operation is sync vector that is similar in operation to fsync operation, msync operation and/or fdatasync operation and further capable of operating substantially simultaneously on multiple files. For example, sync operation replacing created file clone CLONE 0 iNODE with new file clone CLONE 1 iNODE is shown at 308 in FIG. 3. In one example, the last close of a file opened with atomic flag and all cached blocks of the file are flushed and any existing file clones are deleted. In another example, the above mechanism repeats itself until the file is closed by the application.

[0025] In one example, the failure atomic update module 108 determines if there was an untimely system failure. Based on the outcome of the determination, if the untimely system failure occurs before deleting the file clones, the failure atomic update module 108 then replaces the files with file clones next time the files are opened by the application. Based on the outcome of the determination, if there was no untimely system failure and the file clones are deleted, the failure atomic update module 108 then creates the new file clones including any modified and unmodified data blocks. In another example, an intermediary approach may include a background daemon to search the file system for recoverable files after mount but before files are opened. Further in this example, the failure atomic update module 108 determines if there was an untimely failure during delete operation. Based on the outcome of the determination, if there was an untimely failure during the delete operation on multiple files that results in incomplete failure atomic update, the failure atomic update module 1 8 then replaces files with file clones the next time the files are opened.

[0026] In one example, if the system fails, recovery of files may be delayed until the files are accessed again. The file system's path name lookup function may check if each of the file's clones exists in the hidden namespace. The file clones are then renamed to the user visible file and a handle to the file clones may be returned if the file clones exist in the hidden namespace. The per-file recovery offers several attractions, for example, consider an OS kernel panic that occurs while many processes are updating many files. Upon reboot, the file system may recover quickly because the in-progress updates, interrupted by the crash trigger no recover}'' actions when the file system is mounted. In such a scenario, the applications that may not need recover)' from interrupted atomic updates (e.g., applications that are merely reading files) may not share the recovery-time penalty incurred by the crash; only those applications that benefit from application-consistent recover)' may pay the penalty. In this example, the above described atomic failure update mechanism is built on top of the file clone feature of file system, it can be envisioned that alternative implementations, such as using delayed journal writeback may also be possible implementations.

[0027] FIG. 4 illustrates a flow chart of an example method 400 for failure atomic update of application data in application data files in a file system. The method 400, which is described below, may be executed on a system such as a system 100 of FIG. 1 or a system 200 of FIG. 2. However, other systems may be used as well. At block 402, files including data blocks in each file and an associated atomic flag are opened upon invoking open operations by an application. The atomic flag may indicate the application's desire that any changes to the file be atomic.

[0028] Further, in block 402, a file clone is created upon opening each file including the atomic flag by the application. The file clone may be a writable snapshot of the file at the time it is opened using the atomic flag. In one example, a file clone including shared blocks of the primary file is created upon opening the file including the atomic flag by the application. The primary file and the file clone may share same blocks until one or more blocks in the primary file are modified.

[0029] At block 404, any modified data blocks of each opened file are remapped upon a subsequent modification and/or addition to the file by the application. In one example, any modified data blocks of each file are remapped via copy of write (COW) operation and leaving the file clone's view of each file unchanged by the file system upon a subsequent modification and/or addition to each file by the application.

[0030] At block 406, a syncv operation to sync may be initiated by the application to pass file descriptors of each opened file associated with any modified data blocks. For example, syncv operation is a sync vector operation that is similar" to an fsync operation, a mysnc operation and/or a fdatasync operation and further capable of operating substantially simultaneously on multiple files. In one example, file descriptors associated with a subset of any opened files associated with modified data blocks are passed upon initiating a syncv operation.

[0031] At block 408, any modified data blocks in each opened file associated with file descriptor passed via the syncv operation are flushed to a stable storage media, such as a storage disk. At block 410, the created file clone of each opened file associated with the file descriptor sent via the syncv operation is then deleted using transaction records in a transaction log file residing in a journal sub-system by the file system. In one example, any modified data blocks in each opened file associated with the file descriptor sent via the syncv operation is flushed into a stable storage media such that the state of the file reflects a logical state of each file at the time the application syncs using the syncv operation, and then each created file clone is deleted by the file system.

[0032] At block 412, a new file clone is created including any modified and unmodified data blocks for an opened file associated with a file descriptor sent via the

operation. At block 414. a determination is made as to whether the application has closed all of the files. Based on the outcome of the determination at block 414, the process 400 goes to block 404 and repeats the steps outlined in blocks 404 to 414 if any of the opened files are still open and not closed by the application. Further, based on the outcome of the determination at block 414, the process 400 goes to block 416 and stops if all the opened files are closed by the application.

[0033] In one example, the failure atomic update module 108 determines whether there was an untimely system failure. If the untimely system failure occurs before the deleting each file cone, the file is then replaced with the file clone the next time each file is opened by the application. Based on the outcome of the determination, if there was no untimely system failure and ail the file clones are deleted, new file clones are created including any modified and unmodified data blocks. [0034] FIG. 5 illustrates a block diagram of an example computing device 500 for a mechanism for failure atomic update of application data in single application data file in a file system. The computing device 500 includes a processor 502 and a machine- readable storage medium 504 communicatively coupled through a system bus. The processor 502 may be any type of central processing unit (CPU), microprocessor, or processing logic that interprets and executes machine-readable instructions stored in tlie machine-readable storage medium 504. The machine-readable storage medium 504 may be a random access memory (RAM) or another type of dynamic storage device that may store information and machine-readable instructions that may be executed by the processor 502. For example, the machine-readable storage medium 504 may be synchronous DRAM (SDRAM), double data rate (DDR), rambus DRAM (RDRAM), rambus RAM, etc., or storage memory media such as a floppy disk., a hard disk, a CD-ROM, a DVD, a pen drive, and the like. In an example, the machine- readable storage medium 504 may be a non-transitory machine-readable medium. In an example, the machine -readable storage medium 504 may be remote but accessible to the computing device 500.

[0035] The machine-readable storage medium 504 may store instructions 402, 404, 406, 408, 410, 412, 414 and 416. In an example, instructions 402, 404, 406, 408, 410, 412, 414 and 416 may be executed by processor 502 to provide a mechanism for failure atomic update of application data in single application data file in a file system. Instructions 402, 404, 406, 408, 410, 412, 414 and 416 may be executed by processor 502 to implement failure atomic updates of application data. Instructions 402 , 404, 406, 408, 410, 412, 414 and 416 may be executed by processor 502 to protect integrity of application data from failures, such as process crashes, OS kernel panics, and/or power outages.

[0036] It may be noted that the above-described examples of tlie present solution is for the purpose of illustration only. Although the solution has been described in conjunction with a specific embodiment thereof, numerous modifications may be possible without materially departing from the teachings and advantages of the subject matter described herein. Other substitutions, modifications and changes may be made without departing from the spirit of the present solution. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.

Claims

Claims:

1. A system to update application data files, comprising:

a processor: and

a storage device communicatively coupled to the processor, wherein the storage device comprises a failure atomic update module to:

create a file clone for each opened file including an atomic flag by an application, wherein each opened file includes data blocks;

remap any modified data blocks of each opened file upon a subsequent modification and/or addition to the opened file by the application;

sync by initiating a syncv operation and passing file descriptor of each opened file associated with any modified data blocks by the application;

flush any modified data blocks in each opened file associated with the file descriptor passed via the syncv operation into a stable storage media;

delete the created file clone of each opened file associated with the file descriptor passed via the syncv operation using transaction records in a transaction log file residing in a journal sub-system by the file system; and

create a new file clone including any modified and unmodified data blocks of each file associated with the file descriptor passed via the syncv operation.

2. The system of claim 1, wherei the failure atomic update module is further configured to:

go to the step of remapping; and

repeat the steps of remap any modified data blocks, sync by initiating a syncv operation, flush any modified data blocks and deleting the created file clone, and create a new clone upon each modification to the data blocks in each opened file by the file system until all the files are closed by the application.

3. The system of claim 2, wherein the failure atomic update module determines whether there was an untimely system failure, based on the outcome of the determination, if the untimely system failure occurs before deleting the file clone, then replaces the files with the file clones the next time the files are opened by the application, further based on the outcome of the determination, if there was no untimely system failure and the file clones are deleted, then creates new file clones including any modified and unmodified data blocks, and wherein the failure atomic update module further determines if there was an untimely failure during delete operation and if there was a untimely failure during the delete operation on files resulting in incomplete failure atomic update, then the failure atomic update module replaces the files with file clones the next time the files are opened.

4. The system of claim 1, wherein the atomic flag indicates the application's desire that changes to the file be atomic.

5. The system of clam 1, wherein the file clone comprises a writable snapshot of the file at the time it is opened with the atomic flag.

6. The system of claim 1, wherein the syncv operation comprises & fiyne vector operation and is similar to fsync operation, msync operation, and/or fdat sync operation and further capable of operating substantially simultaneously on files.

7. A method for failure atomic update of application data files, comprising:

creating a file clone for each opened file including an atomic flag by an application, wherein each opened file includes data blocks;

remapping any modified data blocks of each opened file upon a subsequent modification and/or addition to the opened file by the application;

syncing by initiating a syncv operation and passing file descriptor of each opened file associated with any modified data blocks by the application;

flushing any modified data blocks in each opened file associated with the file descriptor passed via the syncv operation into a stable storage media;

deleting the created file clone of each opened file associated with the file descriptor passed via the syncv operation using transaction records in a transaction log file residing in a journal sub-system by the file system; and

creating a new file clone including any modified and unmodified data blocks of each file associated with the file descriptor passed via the syncv operation.

S. The method of claim 7, further comprising:

going to the step of remapping; and

repeating the steps of remapping any modified data blocks, syncing by initiating a syn v operation, flushing any modified data blocks and deleting the created file clone, and creating a new clone upon each modification to the data blocks in each opened file by the file system until all the files are closed by the application,

9. The method of claim 8, wherein creating the new file clones including any modified and unmodified data blocks, comprises:

determinin whether there was an untimely system failure;

based on the outcome of the determination, if the untimely system failure occurs before deleting the file clones, then replacing the files with the file clones the next time the files are opened by the application; and

based on the outcome of the determination, if there was no untimely system failure and the file clones are deleted, and then creating new file clones including any modified and unmodified data blocks.

10. The method of claim 7, wherein the atomic flag indicates the application's desire that changes to the file be atomic.

11. The method of clam 7, wherein the file clone comprises a writable snapshot of the file at the time it is opened with the atomic flag.

12. The method of claim 7, wherein the syncv operation comprises afsync vector operation and is similar to fsync operation, msync operation, and/or fdatasync operation and further capable of operating substantially simultaneously on files.

13. A non-transitory machine-readable storage medium comprising instructions for a mechanism for applications for failure atomic update of application data in application data files, the instructions executable by a processor to:

create a file clone for each opened file including an atomic flag by an application, wherein each opened file includes data blocks; remap any modified data blocks of each opened file upon a subsequent modification and/or addition to the opened file by the application;

14. The article of claim 13, further configured to:

go to the step of remapping; and

repeat the steps of remap any modified data blocks, sync by initiating a syncx' operation. Hush any modified data blocks and deleting the created file clone, and create a new clone upon each modificatio to the data blocks i each opened file by the file system until all tlie files are closed by tlie application.

15. The article of claim 14, wherein the failure atomic update module determines whether there was an untimely system failure, based on the outcome of the determination, if the untimely system failure occurs before deleting the file clone, then replaces the files with the file clones the next time the files are opened by the application, and further based on the outcome of the determination, if there was no untimely svstem failure and the file clones are deleted, then creates new file clones including any modified and unmodified data blocks.