US20070198979A1 - Methods and apparatus to implement parallel transactions - Google Patents

Methods and apparatus to implement parallel transactions Download PDF

Info

Publication number
US20070198979A1
US20070198979A1 US11/475,716 US47571606A US2007198979A1 US 20070198979 A1 US20070198979 A1 US 20070198979A1 US 47571606 A US47571606 A US 47571606A US 2007198979 A1 US2007198979 A1 US 2007198979A1
Authority
US
United States
Prior art keywords
transaction
lock
shared
processes
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/475,716
Inventor
David Dice
Nir Shavit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Priority to US11/475,716 priority Critical patent/US20070198979A1/en
Assigned to SUN MICROSYSTEMS, INC. reassignment SUN MICROSYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DICE, DAVID, SHAVIT, NIR N.
Publication of US20070198979A1 publication Critical patent/US20070198979A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms

Definitions

  • Parallelism is the practice of executing or performing multiple things simultaneously. Parallelism can be possible on multiple levels, from executing multiple instructions at the same time, to executing multiple threads at the same time, to executing multiple programs at the same time, and so on. Instruction Level Parallelism or ILP is parallelism at the lowest level and involves executing multiple instructions simultaneously. Processors that exploit ILP are typically called multiple-issue processors, meaning they can issue multiple instructions in a single clock cycle to the various functional units on the processor chip.
  • One type of multiple-issue processor is a superscalar processor in which a sequential list of program instructions are dynamically scheduled. A respective processor determines which instructions can be executed on the same clock cycle, and sends them out to their respective functional units to be executed.
  • This type of multi-issue processor is called an in-order-issue processor since issuance of instructions is performed in the same sequential order as the program sequence, but issued instructions may complete at different times (e.g., short instructions requiring fewer cycles may complete before longer ones requiring more cycles).
  • VLIW Very Large Instruction Width
  • a VLIW processor depends on a compiler to do all the work of instruction reordering and the processor executes the instructions that the compiler provides as fast as possible according to the compiler-determined order.
  • Other types of multi-issue processors issue out of order instructions, meaning the instruction issue order is not be the same order as the order of instructions as they appear in the program.
  • Another type of conventional parallel processing involves a use of fine-grain locking. As its name suggests, fine-grain locking prevents conflicting instructions from being simultaneously executed in parallel based on use of lockouts. This technique enables non-conflicting instructions to execute in parallel.
  • embodiments herein include techniques for enhancing performance associated with transactions executing in parallel.
  • a transactional memory programming technique provides an alternative type of “lock” method over the conventional techniques as discussed above.
  • one embodiment herein involves use and/or maintenance of version information indicating whether any of multiple “globally” shared variables has been modified during a course of executing a respective transaction (e.g., a set of software instructions initiating a respective computation). Any one of multiple possible processes executing in parallel can update respective version information associated with a globally shared variable (e.g., a shared variable accessible by any of multiple processes) in order to indicate that the shared variable has been modified. Accordingly, other processes keeping track of the version information during execution of their own respective transaction can (keep track of) and identify if and when any shared variables have been modified during a window of use. If any critical variables have been modified, a respective process can prevent corresponding computational results from being committed to memory.
  • results of the respective transaction can be committed globally without causing data corruption by one or more processes simultaneously using the shared variable.
  • version information associated with one or more respective shared variables used to produce the transaction results
  • a respective process can identify that another process modified the one or more respective shared variables during execution and prevent global committal of the respective results.
  • the transaction can repeat itself (e.g., execute again or retry) until the process is able to commit respective results without causing data corruption.
  • each of multiple processes executing in parallel can “blindly” initiate computations using the shared variables even though there is a chance that another process executing in parallel modifies a mutually used shared variable and prevents the process from globally committing its results.
  • a computer environment includes segments of information (e.g., a groupings, sections, portions, etc. of a repository for storing data values associated with one or more variables) that are shared by multiple processes executing in parallel. For each of at least two of the segments, the computer environment includes a corresponding location to store a respective version value (e.g., version information) representing a relative version of a respective segment.
  • a respective version value e.g., version information
  • a relative version associated with a segment is changed or updated by a respective process each time any contents (e.g., data values of one or more respective shared variables) in a respective segment has been modified. Accordingly, other processes keeping track of version information associated with a respective segment can identify if and when contents of the respective segment have been modified.
  • one or more processes in the computer environment can use contents stored in the one or more segments to generate new data values for storage in a segment.
  • a respective process can initiate modification of a data value associated with a shared variable.
  • the processes can compete to secure an exclusive access lock with respect to each of multiple segments to prevent other processes from modifying a respective locked segment.
  • Locking of a segment e.g., a single or multiple shared variables
  • Locking of a segment also may provide notification to other processes that the other processes should not use contents of a respective segment for a current transaction and/or that previous computations associated with a current transaction must be aborted.
  • a computer environment can be configured to maintain, for each of multiple segments of shared data, a corresponding location to store globally accessible lock information indicating whether one of the multiple processes executing in parallel has locked a respective segment for: i) changing a respective one or more data value therein, and ii) preventing other processes from reading respective data values from the respective segment.
  • acquiring a lock on a segment prevents other processes from accessing data values in the locked segment.
  • the computer environment can enable the multiple processes to maintain (e.g., store, retrieve, use, etc.) version information associated with the respective multiple segments to identify whether contents of a respective segment have changed over time.
  • a computer environment can include globally accessible version information enabling a respective one of the processes to modify respective version value information associated with shared variables.
  • the version value information can represent a relative version value associated with a given segment as modified by a respective process to a new unique data value to indicate that the respective process modified a data value associated with the given segment.
  • a first process can retrieve a data value associated with a shared variable as well as retrieve a current version value associated with the shared variable when the shared variable is accessed.
  • the first process stores the version value associated with the shared variable and then can perform computations (e.g., a transaction) using the shared variable.
  • the first process can verify that no other process modified the shared variable by checking current version information associated with the shared variable. If the version information associated with one or more shared variables at a committal phase of the transaction matches corresponding originally obtained version information associated with the one or more shared variables during an execution phase of the transaction, then the first process can globally commit results of the transaction to memory.
  • the first process can abort and repeat a transaction until it is able to complete without interference. If and when the first process is able to globally commit it results from a respective transaction to memory, then the first process updates version information associated with any data values (or segments) that are modified during the commit phase. Accordingly, a second process (or multiple other processes) can identify if and when a data value associated with the one or more shared variables changes and prevent or initiate its own global committal depending on current processing circumstances.
  • a computerized device e.g., a host computer, workstation, etc.
  • a computer environment includes a memory system, a processor (e.g., a processing device), a respective display, and an interconnect connecting the processor and the memory system.
  • the interconnect can also support communications with the respective display (e.g., display screen or display medium).
  • the memory system is encoded with an application that, when executed on the processor, supports parallel processing according to techniques herein.
  • one embodiment herein includes a computer program product (e.g., a computer-readable medium).
  • the computer program product includes computer program logic (e.g., software instructions) encoded thereon.
  • Such computer instructions can be executed on a computerized device to support parallel processing according to embodiments herein.
  • the computer program logic when executed on at least one processor associated with a computing system, causes the processor to perform the operations (e.g., the methods) indicated herein as embodiments of the present disclosure.
  • Such arrangements as further disclosed herein can be provided as software, code and/or other data structures arranged or encoded on a computer readable medium such as an optical medium (e.g., CD-ROM), floppy or hard disk, or other medium such as firmware or microcode in one or more ROM or RAM or PROM chips or as an Application Specific Integrated Circuit (ASIC).
  • a computer readable medium such as an optical medium (e.g., CD-ROM), floppy or hard disk, or other medium such as firmware or microcode in one or more ROM or RAM or PROM chips or as an Application Specific Integrated Circuit (ASIC).
  • the software or firmware or other such configurations can be installed on a computerized device to cause one or more processors in the computerized device to perform the techniques explained herein.
  • Yet another more particular technique of the present disclosure is directed to a computer program product that includes a computer readable medium having instructions stored thereon for to facilitate use of shared information among multiple processes.
  • the instructions when carried out by a processor of a respective computer device, cause the processor to perform the steps of: i) executing a transaction defined by a corresponding set of instructions to produce a respective transaction outcome based on use of at least one shared variable; ii) after producing the respective transaction outcome, initiating a lock on a given shared variable to prevent other processes from modifying a data value associated with the given shared variable; and iii) initiating a modification of the data value associated with the given shared variable based on the respective transaction outcome even though at least one of the other processes performed a computation using the data value associated with the given shared variable before the lock.
  • Other embodiments of the present application include software programs to perform any of the method embodiment steps and operations summarized above and disclosed in detail below.
  • system of the invention can be embodied as a software program, as software and hardware, and/or as hardware alone.
  • Example embodiments of the invention may be implemented within computer systems, processors, and computer program products and/or software applications manufactured by Sun Microsystems Inc. of Palo Alto, Calif., USA.
  • FIG. 1 is a diagram illustrating a computer environment enabling multiple processes to access shared variable data according to embodiments herein.
  • FIG. 2 is a diagram illustrating maintenance and use of version and lock information associated with shared data according to embodiments herein.
  • FIG. 3 is a diagram of a sample process including a read-set and write-set according to embodiments herein.
  • FIG. 4 is a diagram of a flowchart illustrating execution of a transaction according to an embodiment herein.
  • FIG. 5 is a diagram of a flowchart illustrating execution of a transaction according to embodiments herein.
  • FIG. 6 is a diagram of a sample architecture supporting shared use of data according to embodiments herein.
  • FIG. 7 is a diagram of a flowchart according to an embodiment herein.
  • FIG. 8 is a diagram of a flowchart according to an embodiment herein.
  • FIG. 9 is a diagram of a flowchart according to an embodiment herein.
  • results of the respective transaction can be globally committed to memory without causing data corruption. If version information associated with one or more corresponding shared variables (used to produce the transaction results for the respective transaction) happens to change thus indicating that another process modified shared data used to generate results associated with the respective transaction, then results associated with the respective transaction are not committed to memory for global access. In this latter case, the respective transaction repeats itself until the respective transaction is able to commit respective results without causing potential data corruption as a result of data changing during execution of the respective transaction.
  • FIG. 1 is a block diagram of a computer environment 100 according to an embodiment herein.
  • computer environment 100 includes shared data 125 and corresponding metadata 135 (e.g., in a respective repository) that is globally accessible by multiple processes 140 such as process 140 - 1 , process 140 - 2 , . . . process 140 -M.
  • each of processes 140 is a processing thread.
  • Metadata 135 enables each of processes 140 to identify whether portions of shared data 125 have been “locked” and/or whether any portions of shared data 125 have changed during execution of a respective transaction.
  • Each of processes 140 includes a respective read-set 150 and write-set 160 for storing information associated with shared data used to carry computations with respect to a transaction.
  • process 140 - 1 includes read-set 150 - 1 and write-set 160 - 1 to carry out a respective one or more transactions associated with process 140 - 1 .
  • Process 140 - 2 includes read-set 150 - 2 and write-set 160 - 2 to carry out a respective transaction associated with process 140 - 2 .
  • Process 140 -M includes read-set 150 -M and write-set 160 -M to carry out one or more transactions associated with process 140 -M.
  • Transactions executed by respective processes 140 can be defined by one or more instructions of software code. Accordingly, each of processes 140 can execute a respective set of instructions to carry out a respective transaction. In one embodiment, the transactions executed by the processes 140 come from the same overall program or application running on one or more computers. Alternatively, the processes 140 execute transactions associated with different programs.
  • each of processes 140 accesses shared data 125 to generate computational results (e.g., transaction results) that are eventually committed for storage in a respective repository storing shared data 125 .
  • shared data 125 is considered to be globally accessible because each of the multiple processes 140 can access the shared data 125 .
  • Each of processes 140 can store data values locally that are not accessible by the other processes 140 .
  • process 140 - 1 can globally access a data value and store a respective copy locally in write-set 160 - 1 that is not accessible by any of the other processes.
  • the process 140 - 1 is able to locally modify the data value in its write-set 160 .
  • one purpose of write-set 160 is to store globally accessed data that is modified locally.
  • the results of executing the respective transaction can be globally committed back to a respective repository storing shared data 125 depending on whether globally accessed data values happened to change during the course of the transaction executed by process 140 - 1 .
  • a respective read-set 150 - 1 associated with each process stores information for determining which shared data 125 has been accessed during a respective transaction and whether any respective data values associated with globally accessed shared data 125 happens to change during execution of a respective transaction.
  • each of one or more processes 140 complies with a respective rule or set of rules indicating transaction size limitations associated with the parallel transactions to enhance efficiency of multiple processes executing different transactions using a same set of shared variables including the given shared variable to produce respective transaction outcomes.
  • each transaction can be limited to a certain number of lines of code, a number of data value modifications, time limit, etc. so that potentially competing transactions do not end up in a deadlock.
  • embodiments herein include: i) maintaining a locally managed and accessible write set of data values associated with each of multiple shared variables that are locally modified during execution of the transaction, the local write set representing data values not yet a) globally committed and b) accessible by the other processes; ii) initiating locks on each of the multiple shared variables specified in the write set which were locally modified during execution of the transaction to prevent the other processes from changing data values associated with the multiple shared variables to be modified; iii) verifying that respective data values associated with the multiple shared variables accessed during the transaction have not been globally modified by the other processes during execution of the transaction by checking that respective version values associated with the multiple shared variables have not changed during execution of the transaction; and vi) after modifying data values associated with the multiple shared variables, releasing the locks on each of the multiple shared variables.
  • FIG. 2 is a diagram illustrating shared data 125 and corresponding metadata 135 according to embodiments herein.
  • shared data 125 can be partitioned to include segment 210 - 1 , segment 210 - 2 , . . . , segment 210 -J.
  • a respective segment of shared data 125 can be a resource such as a single variable, a set of variables, an object, a stripe, a portion of memory, etc.
  • Metadata 135 includes respective version information 220 and lock information 230 associated with each corresponding segment 210 of shared data 125 .
  • version information 220 is a multi-bit value that is incremented each time a respective process 140 modifies contents of a corresponding segment 210 of shared data 135 .
  • the lock information 230 and version information 220 can make up a single 64-bit word.
  • each of processes 140 (e.g., software) need not be responsible for updating the version information 220 .
  • a monitor function separate or integrated with processes 140 automatically initiate changing version information 220 each time contents of a respective segment is modified.
  • process 140 - 2 modifies contents of segment 210 - 1 during a commit phase of a respective executed transaction.
  • process 140 - 2 Prior to committing transaction results globally to shared data 125 , process 140 - 2 would read and store version information 220 - 1 associated with segment 210 - 1 or shared variable. After modifying contents of segment 210 - 1 during the commit phase, the process 140 - 2 would modify the version information 220 - 1 in metadata 135 to a new value. More specifically, prior to modifying segment 210 - 1 , the version information 220 - 1 may have been a count value of 1326.
  • the process 140 - 2 updates (e.g., increments) the version information 220 - 1 to be a count value of 1327.
  • Each of the processes 140 performs a similar updating of corresponding version information 220 each time a respective process 140 modifies a respective segment 210 of shared data 125 . Accordingly, the processes can monitor the version information 220 - 1 to identify when changes have been made to a respective segment 210 of shared data 125 .
  • metadata 135 also maintains lock information 230 associated with each respective segment 210 of shared data 125 .
  • the lock information 230 associated with each segment 210 is a globally accessible single bit indicating whether one of processes 140 currently has “locked” a corresponding segment for purposes of modifying its contents.
  • a respective process such as process 140 - 1 can set the lock information 230 -J to a logic one indicating that segment 210 -J has been locked for use.
  • Other processes know that contents of segment 210 -J should not be accessed, used, modified, etc. during the lock phase initiated by process 140 - 1 .
  • process 140 - 1 sets the lock information 230 -J to a logic zero. All processes 140 can then compete again to obtain a lock with respect to segment 210 -J.
  • FIG. 3 is a diagram more particularly illustrating details of respective read-sets 150 and write-sets 160 associated with processes 140 according to embodiments herein.
  • process 140 - 1 executes transaction 351 (e.g., a set of software instructions).
  • Transaction 351 e.g., a set of software instructions.
  • Read-set 150 - 1 stores retrieved version information 320 - 1 , retrieved version information 320 - 2 , . . . , retrieved version information 320 -K associated with corresponding data values (or segments) accessed from shared data 125 during execution of transaction 351 .
  • the process 140 - 1 can keep track of version information associated with any globally accessed data.
  • Write-set 160 - 1 stores shared variable identifier information 340 (e.g., address information, variable identifier information, etc.) for each respective globally shared variable that is locally modified during execution of the transaction 351 .
  • Local modification involves maintaining and modifying locally used values of shared variables in write-set 160 - 1 rather than actually modifying the global variables during execution of transaction 351 .
  • the process 140 - 1 attempts to globally commit information in write-set 160 - 1 to shared data 125 upon completion of transaction 351 .
  • process 140 - 1 maintains write-set 160 - 1 to include i) shared variable identifier information 340 - 1 (e.g., segment or variable identifier information) of a respective variable accessed from shared data 125 and corresponding locally used value of shared variable 350 - 1 , ii) shared variable identifier information 340 - 2 (e.g., segment or variable identifier information) of a variable or segment accessed from shared data 125 and corresponding locally used value of shared variable 350 - 2 , an so on. Accordingly, process 140 - 1 uses write-set 160 - 1 as a scratch-pad to carry out execution of transaction 351 and keep track of locally modified variables and corresponding identifier information.
  • shared variable identifier information 340 - 1 e.g., segment or variable identifier information
  • shared variable identifier information 340 - 2 e.g., segment or variable identifier information
  • FIG. 4 is a flowchart illustrating a more specific use of read-sets 150 , write-sets 160 , version information 220 , and lock information 230 according to embodiments herein.
  • flowchart 400 indicates how each of multiple processes 140 utilizes use of read-sets 150 and write-sets 160 while carrying out a respective transaction.
  • Step 405 indicates a start of a respective transaction.
  • a transaction can include a set of software instructions indicating how to carry out one or more computations using shared data 125 .
  • a respective process 140 executes an instruction associated with the transaction identifying a specific variable in shared data 125 .
  • step 415 the respective process checks whether the variable exists in its respective write-set 160 . If the variable already exists in its respective write-set 160 in step 420 , then processing continues at step 440 in which the respective process 140 fetches a locally maintained value from its write-set 160 .
  • step 420 the respective process 140 attempts to globally fetch a data value associated with the variable based on a respective access to shared data 125 .
  • the process 140 checks whether the variable to be globally fetched is locked by another process. As previously discussed, another process may lock variables, segments, etc. of shared data 125 to prevent others from accessing the variables.
  • Globally accessible lock information 230 e.g., a single bit of information in metadata 135 indicates which variables have been locked for use.
  • step 425 the respective process initiates step 430 to abort and retry a respective transaction or initiate execution of a so-called back-off function to access the variable.
  • the back-off function can specify a random or fixed amount of time for the process to wait before attempting to read the variable again with hopes that a lock will be released.
  • the respective lock on the variable may be released by during a second or subsequent attempt to read the variable.
  • step 435 the respective process initiates step 435 to globally fetch a data value associated with the specified variable from shared data 125 .
  • the respective process retrieves version information 220 associated with the globally fetched variable. The process stores retrieved version information associated with the variable in its respective read-set 150 for later use during a commit phase.
  • step 445 the respective process utilizes the fetched data value associated with the variable to carry out one or more computations associated with the transaction. Based on the paths discussed above, the data value associated with the variable can be obtained from either write-set 160 or shared data 125 .
  • step 460 the respective process identifies whether a respective transaction has completed. If not, the process continues at step 410 to perform a similar loop for each of additional variables used during a course of executing the transaction. If the transaction has completed in step 460 , the respective process continues at step 500 (e.g., the flowchart 500 in FIG. 5 ) in which the process attempts to globally commit values in its write-set 160 to globally accessible shared data 125 .
  • step 500 e.g., the flowchart 500 in FIG. 5
  • a respective process in response to identifying that a corresponding data value associated with one or more shared variable was modified during execution of the transaction, a respective process can abort a respective transaction in lieu of modifying a data value associated with shared data 125 and initiate execution of the transaction again at a later time to produce attempt to produce a respective transaction outcome.
  • FIG. 5 is a flowchart 500 illustrating a technique for committing results of a transaction to shared data 125 according to embodiments herein.
  • the process executing the respective transaction has not initiated any locks on any shared data yet although the process does initiate execution of computations associated with accessed shared data 125 .
  • Waiting to obtain locks at the following “commit phase” enables other processes 140 to perform other transactions in parallel because a respective process initiating storage of results during the commit phase holds the locks for a relatively short amount of time.
  • the respective process that executed the transaction attempts to obtain locks associated with each variable in its write-set 160 .
  • the process checks whether lock information in metadata 135 indicates whether the variables to be written to (e.g., specific portions of globally accessible shared data 125 ) are locked by another process.
  • the process initiates locking the variables (or segments as the case may be) to block other process from using or locking the variables.
  • a respective process attempts to obtain locks according to a specific ordering such as an order of initiating local modifications to retrieved shared variables during execution of a respective transaction, addresses associated with the globally shared variables, etc.
  • step 510 If all locks cannot be immediately obtained in step 510 , then the process can abort and retry a transaction or initiate a back-off function to acquire locks associated with the variables that are locally modified during execution of the transaction.
  • step 520 the process obtains the stored version information associated with variables read from shared data 125 .
  • the version information 230 of metadata 135 indicates a current version of the respective variables at a time when they were read during execution of the transaction.
  • step 525 the respective process compares the retrieved version information in the read-set 150 saved at a time of accessing the shared variables to the current globally available version information 220 from metadata 135 for each variable in the read-set 150 .
  • step 530 if the version information is different in step 525 , then the process acknowledges that another process modified the variables used to carry out the present transaction. Accordingly, the process releases any obtained locks and retries the transaction again. This prevents the respective process from causing data corruption.
  • step 535 if the version information is the same in step 525 , then the process acknowledges that no other process modified the variables used to carry out the present transaction. Accordingly, the process can initiate modification of shared data to reflect the data values in the write-set 160 . This prevents the respective process from causing data corruption during the commit phase.
  • step 540 after updating the shared data 125 with the data values in the write-set 160 , the process updates version information 220 associated with modified variables or segments and releases the locks.
  • the locks can be released in any order or in a reverse order relative to the order of obtaining the locks.
  • each of the respective processes 140 can be programmed to occasionally, periodically, sporadically, intermittently, etc. check (prior to the committal phase in flowchart 500 ) whether current version information 220 in metadata 135 matches retrieved version information in its respective read-set 150 for all variables read from shared data 125 . Additionally, each of the respective processes 140 can be programmed to also check (in a similar way) whether a data value and/or corresponding segment has been locked by another process prior to completion.
  • FIG. 6 is a block diagram illustrating an example computer system 610 (e.g., an architecture associated with computer environment 100 ) for executing a parallel processes 140 and other related processes according to embodiments herein.
  • Computer system 610 can be a computerized device such as a personal computer, workstation, portable computing device, console, network terminal, processing device, etc.
  • computer system 610 of the present example includes an interconnect 111 that couples a memory system 112 storing shared data 125 and metadata 135 , one or more processors 113 executing processes 140 , an I/O interface 114 , and a communications interface 115 .
  • Peripheral devices 116 e.g., one or more optional user controlled devices such as a keyboard, mouse, display screens, etc.
  • I/O interface 114 also enables computer system 610 to access repository 180 (that also potentially stores shared data 125 and/or metadata 135 ).
  • Communications interface 115 enables computer system 610 to communicate over network 191 to transmit and receive information from different remote resources.
  • functionality associated with processes 140 can be embodied as software code such as data and/or logic instructions (e.g., code stored in the memory or on another computer readable medium such as a disk) that support functionality according to different embodiments described herein.
  • the functionality associated with processes 140 can be implemented via hardware or a combination of hardware and software code.
  • a respective one of multiple processes 140 executes a transaction defined by a corresponding set of instructions to produce a respective transaction outcome based on use of at least one shared variable from shared data 125 .
  • step 720 after producing the respective transaction outcome (e.g., locally storing computational results in its respective write-set 160 ), the respective process 140 initiates a lock on a given shared variable of shared data 125 to prevent other processes from modifying a data value associated with the given shared variable.
  • the respective transaction outcome e.g., locally storing computational results in its respective write-set 160
  • step 730 the respective process 140 initiates a modification of the data value associated with the given shared variable based on the respective transaction outcome even though at least one of the other processes 140 in computer environment 100 also performed a computation using the data value associated with the given shared variable before the lock and during execution of the transaction by the respective one of multiple processes 140 .
  • FIG. 8 is a flowchart 800 illustrating processing steps associated with processes 140 according to an embodiment herein. Note that techniques discussed in flowchart 800 overlap with the techniques discussed above in the previous figures.
  • each of multiple processes 140 maintains version information in a respective locally managed read set 150 associated with an executed transaction.
  • the read set 150 is generally not accessible by the other processes 140 using the shared variables from shared data 125 .
  • the read set 150 and write-set 160 serve as a local scratch-pad function.
  • the read set 150 can store and identify version information (e.g., includes retrieved version information) associated with each of multiple shared variables used to generate a respective transaction outcome associated with a given process.
  • the version information stored in the read-set 150 indicates respective versions of the multiple shared variables in shared data 125 at a time when the transaction retrieves respective data values associated with the multiple shared variables (e.g., shared data 125 ) from a corresponding globally accessible repository.
  • each of multiple processes 140 potentially competes to initiate a respective lock on a given one or more shared variables (e.g., portions of shared data 125 ) locally modified (as indicated in write-set 160 ) during the transaction to prevent other processes from modifying a data value associated with the given one or more shared variables.
  • shared variables e.g., portions of shared data 125
  • write-set 160 locally modified
  • step 820 after acquiring respective locks associated with the given one or more shared variables and before globally modifying respective data values associated with the given one or more shared variables, a respective process attempting to globally commit its results verifies that newly read (e.g., present or current) version information associated with each of the given one or more shared variables used to generate the respective transaction outcome matches the version information in the locally managed read set associated with the transaction.
  • the newly read version information can be used to identify whether the data values associated with the multiple shared variables have not changed by the other processes during execution of the transaction. There was no change if the newly retrieved version information matches the version information in the read-set 150 .
  • FIG. 9 is a flowchart 900 illustrating another technique associated with use of lock and version information according to embodiments herein. Note that techniques discussed in flowchart 900 overlap and summarize some of the techniques discussed above.
  • step 910 computer environment 100 maintains segments 210 of information (e.g., shared data 125 ) that are shared by multiple processes 140 executing in parallel.
  • information e.g., shared data 125
  • step 915 for each of multiple segments 210 , the computer environment 100 maintains a corresponding location (e.g., a portion of storage) to store a respective version value representing a relative version of contents in a respective segment 210 .
  • the relative version associated with a segment is updated by a respective process each time contents of the respective segment is modified by a process. For example, after committing results to shared data 125 , a respective process can increment the version value by one over the previous version value to notify other processes 140 that the shared data 125 has changed.
  • step 920 computer environment 100 enables the multiple processes to compete and secure an exclusive access lock with respect to each of the multiple segments 210 to prevent other processes 140 from modifying a respective locked segment.
  • step 930 computer environment 100 enables the multiple processes 140 to retrieve version information 220 associated with the respective multiple segments 210 to identify whether contents of a respective segment have changed over time.
  • one embodiment of computer environment 100 enables a respective one of the processes 140 to modify a respective version value representing a relative version value associated with a given segment 210 to a new unique data value to indicate that a respective one of the processes modifies a data value associated with the given segment has been modified.
  • Transactional memories can be static or dynamic, indicating whether the locations transacted on are known in advance (like an n-location CAS) or decided dynamically within the scope of the transaction's execution, the latter type being more general and expressive.
  • STMs dynamic non-blocking software transactional memories
  • a goal of current multiprocessor software design is to introduce parallelism into software applications by allowing operations that do not conflict in accessing memory to proceed concurrently.
  • a key tool in designing concurrent data structures has been the use of locks.
  • course grained locking is easy to program with, but provides very poor performance because of limited parallelism, while designing fine grained lock-based concurrent data structures has long been recognized as a difficult task better left to experts.
  • concurrent programming and data structure design is to become ubiquitous, researchers agree that one must develop alternative approaches that simplify code design and verification.
  • This disclosure addresses “mechanical” methods for transforming sequential code or course-grained lock-based code to concurrent code. In one embodiment, by mechanical we mean that the transformation, whether done by hand, by a preprocessor, or by a compiler, does not require any program specific information (such as the programmer's understanding of the data flow relationships).
  • a preferred unifying theme of parallel processing is that the transactions provided to the programmer, in either hardware or software, will be non-blocking, unbounded, and dynamic.
  • Non-blocking means that transactions do not use locks, and are thus obstruction-free, lock-free, or wait-free.
  • Unbounded means that there is no limit on the number of locations accessed by the transaction.
  • Dynamic means that the set of locations accessed by the transaction is not known in advance and is determined during its execution. Providing all three properties in hardware seems to introduce large degrees of complexity into the design. Providing them in software seems to limit performance: hand-crafted lock-based code, though hard to program and prove correct, greatly outperforms the most effective current software STMs, even when they are programmed using an understanding of the data access relationships. When the STM programmer does not make use of such information, performance of STMs is in general an order of magnitude slower than the hand-crafted counterparts.
  • TL Transactional Locking
  • a blocking approach to designing software based transactional memory mechanisms transforms sequential code into unbounded concurrent dynamic transactions that synchronize using deadlock-free fine grained locking.
  • the scheme itself is highly efficient because it does not try to provide a non-blocking progress guarantee for the transaction as a whole. Instead, static (and therefore simple) non-blocking transactions are used only to provide deadlock freedom when acquiring the set of locks needed to safely complete a transaction.
  • static transactions can be implemented in a trivial manner using today's hardware synchronization operations such as compare-and-swap (CAS), or using hardware transactions when these become available.
  • CAS compare-and-swap
  • a high level TL converts coarse-grained lock operations into transactions, where the transactional infrastructure is implemented with fine-grained locks.
  • the acquisition of the locks in step 2 is essentially a static obstruction-free transaction, one in which the set of accessed locations is know in advance. It can alternately be sped-up using a hardware transaction such as an re-location compare-and-swap (CAS) operation. As noted earlier, this type of operation is simpler than the dynamic hardware transaction.
  • CAS re-location compare-and-swap
  • TL One aspect associated with TL is the observation that the blocking part of a transaction can be limited to the acquisition of a set of lock records. This observation has significant performance implications because it allows one to eliminate all the overheads associated with the mechanisms providing the non-blocking progress guarantee for the transaction as a whole. As we show, this is a major source of overhead of current STM systems.
  • TL's rollback mechanism is simple and local. There are no transaction records, and the collected read-set and write-set is never shared with other threads.
  • a versioned write-lock is a. simple spinlock that uses a compare-and-swap (CAS) operation to acquire the lock and a store to release it. Since one only needs a single bit to indicate that the lock number. This number is incremented by every successful lock-release.
  • CAS compare-and-swap
  • the read-set entries contain the address of the lock and the observed version number of the lock associated with the transactionally loaded variable.
  • the write-set entries contain the address of the variable, the value to be written to the variable, and the address of the lock that “covers” the variable.
  • the write-set is kept in chronological order to avoid write-after-write hazards.
  • a backoff policy could for example be to spin for a certain amount of time before re-attempting acquire the lock.
  • TL a hybrid backup mechanism to extend bounded size dynamic hardware transactions to arbitrary size.
  • OSTM and HyTM instead of their object records, we use the versioned-write-lock associated with each location.
  • Hardware transactions need to verify for each location that they read or write that the associated versioned-write-lock is free. For every write they also need to update the version number of the associated stripe lock. This suffices to provide inter-operability between hardware and software transactions. Any software read will detect concurrent modifications of locations by a hardware writes because the version number of the associated lock will have changed. Any hardware transaction will fail if a concurrent software transaction is holding the lock to write. Software transactions attempting to write will also fail in acquiring a lock on aware synchronization operation (such as CAS or a single location transaction) which will fail if the version number of the location was modified by the hardware transaction.
  • a lock on aware synchronization operation such as CAS or a single location transaction
  • One goal of the present disclosure is to allow the programmer to convert coarse-grain locked data structures to TL so as to enjoy the benefits of parallelism. This can be helpful when transitioning to high-order parallelism with SMT/CMT processors such as NiagaraTM.
  • One key attribute of TL is simplicity. It allows the programmer to extract additional parallelism but without unduly increasing the complexity of their code. The programmer can “think serially” but the code will “execute concurrently”.
  • TL successful if the resultant performance exceeds that of the original coarse-grain locked form.
  • the TL form is competitive with the best-of-breed STM forms. That having been said, for any given problem a specialized, hand-coded, form written by a synchronization expert is likely to be faster than the TL form.
  • An expert in synchronization developing with concurrency in mind as 1st-order requirement, may be aware of relaxed data dependencies in the algorithm and take advantage of domain-specific advantages.
  • red-black tree transformed with TL will out-perform a red-black protected by a naive lock.
  • an exotic ad-hoc red-black tree designed by concurrency experts and subject to considerable research, such as Hanke's red-black algorithm will generally outperform the TL-transformed red-black tree.
  • TL works by transform an operation protected by a coarse-grained lock into optimistic transactional form. We then implement the transactional infrastructure with fine-grain locks, enabling additional parallelism as the access patterns permit.
  • OSTM works by opening and closing records for reading and writing. TL, in a sense, performs the open operations automatically at transactional load- and store-time but leaves the record open until commit time. TL has no way of knowing that prior loads executed within a transaction might have any bearing on results produced by transaction.
  • Thread T1 traverses a long hash bucket chain searching for a value associated with a certain key, iterating over “next” fields. We'll say that T1 locates the appropriate node at or near the end of the linked list. T2 concurrently deletes an unrelated node earlier in the same linked list. T2 commits. At commit-time T1 will abort because the linked-list “next” field written to by T2 is in T1's read-set. T1 must retry the lookup operation (ostensibly locating the same node).
  • TL admits live-lock failure.
  • thread T1's read-set is A and its write-set is B.
  • T2's read-set is B and write-set is A.
  • T1 tries to commit and locks B.
  • T2 tries to commit and acquires A.
  • T1 validates A, in its read-set, and aborts as a Bis locked by T2.
  • T2 validates B in its read-set and aborts as B was locked by T1.
  • To improve “liveness” we use a back-off delay at abort-time, similar in spirit to that found in CSMA-CD MAC protocols.
  • the delay interval is a function of (a) a random number generated at abort-time, (b) the length of the prior (aborted) write-set, and (c) the number of prior aborts for this transactional attempt.
  • the entire commit operation might be feasible as a single ROCK hardware transaction where the original application transaction was too big (too many loads and stores) to be feasible as a single ROCK transaction).
  • the commit operation will be able to make an accurate estimate of ROCK-feasibility given that the size of the read-set and write-set are available (or cheap to compute) at commit-time.
  • the entire commit is feasible as a ROCK hardware transaction, we can avoid changing the lock word from unlocked, to locked, to unlocked (but incremented) by simply fetching the lock word at the start of the commit, verifying that it is unlocked, and then increasing the version sub-field at the end of the transaction, after the data writes are complete.
  • TL operates better in environments with lower mutation rates (that is, where the store:load ratio is low). For example consider a red-black and a skip-list that are protected by a single lock and where the data structure is subject to many concurrent modifications. The relative speedup achieved with TL as compared to the classic lock will usually be higher with the skip-list than with the red-black tree as mutations to a skip list usually only require a few stores, where mutations to a red-black tree many require adjustments to the tree structure that require many stores.
  • Our example embodiment describes a 64-bit lock-word, partitioned into a single lock bit and a 63-bit version subfield. Assuming a 4 Ghz processor and a maximum update rate of I transaction per-clock, the version sub-field will overflow in 68 years. Other example embodiments allow for use of a 32-bit lock-word field. When a counter overflows, for instance, a so-called stop-the-world epoch might be used to stop all threads outside transactions. At that point no thread can have a previously fetched instance of the overflowed lock-word in its read-set; the lock-word version can safely be reset to 0. All threads can then be allowed to resume normal execution.
  • TL allows the C programmer to use normal malloc( ) and free( ) operations to manage the lifecycle of structures containing transactionally accessed shared variables.
  • the only requirement imposed by TL is that a structure being free( )-ed must be allowed to quiesce. That is, any pending transactional stores, detectable by checking the lock-bit in the associated locks, must be allowed to drain into the structure before the structure is freed. After the structure is quiesced it can be accessed with normal load and store operations outside the transactional framework.
  • Node A's Key field contains “I”, its value field contains “1001” and its Next field refers to B.
  • B's Key field contains “2”
  • its Value field contains “1002”
  • C contains 3, the value field “1003” and its Next field is NULL.
  • Thread T1 calls Set(“2”, “2002”).
  • the TL-based Set( ) operator traverses the linked list using transactional loads and finds node B with a key value of “2”. T1 then executes a transactional store in a. Value to change “1002” to “2002”.
  • T1's read-set consists of A.Key, A.Next, B.Key and the write-set consists of “B.Value.”
  • T1 attempts to commit; it acquires the lock covering
  • T2's transaction completes. T2 then calls free(B). T1 resumes in the midst of its commit and stores into B.Value, We have a classic modify-after-free pathology. To avoid such problems T2 calls sterilize(B) after the commit finishes but before free( )ing B. This allows T1's latent transactional ST to drain into B before B is free( )ed and potentially reused. Note, however, that TL (using sterilization) did not admit any out-comes that were not already possible under the original coarse-grained lock.
  • Thread T1 calls Set(“2”, “2002”).
  • the TL-based Set( ) method traverse the list and locates node B having a key value of “2”.
  • Thread T2 then calls Delete(“2”).
  • the Delete( ) operator commits successfully.
  • T2 sterilizes B and then calls free(B).
  • the memory underlying B is recycled and used by some other thread T3.
  • T1 attempts to commit by acquiring the lock covering B.Value.
  • the lock-word is collocated with B.
  • T2 validates the read-set, recognizes that A.Next changed (because of T1's Delete( )) and aborts, restoring the original lock-word value.
  • T1 has cause the memory word underlying the lock for B.value to “flicker”, however. Such modifications are unacceptable; we have a classic modify-after-free error.
  • TL might allow a skilled programmer to explicitly control the mapping by allowing the programmer to define a custom VariableToLock( ) function which takes a variable address as input and returns a lock address.
  • the VariableToLock( ) function is optional.
  • TL can easily be combined with STM interfaces or transactional infrastructures such as Herlihy's SXM.
  • READWRITE corresponds to UNLOCKED
  • EXCLUSIVE corresponds to LOCKED
  • the new state, READONLY is an interim state used only at commit-time.
  • the commit operator is modified to attempt to change all locks in the write-set from READ-WRITE to READONLY with CAS. The commit operator must spin if the lock is found to be in READ-ONLY or EXCLUSIVE state. Once the write-set locks have been made READONLY, the commit operator ratifies versions of the read-set locks and ensures that the read-set locks are in READWRITE state.
  • the commit operator would use CAS to try to change all the write-set locks from READWRITE to READONLY. Once in READONLY state commit would then use normal atomic stores to upgrade the locks from READONLY to EXCLUSIVE. The commit operator would then validate the read-set and, conditionally, write-back the deferred stores saved in the write-set and release the locks, incrementing the version subfields. This adaptation minimizes aggregate lock-hold times. Recall that CAS has high local latency even when successful. Consider a transaction containing stores to variables V1 and V2 covered by distinct locks W1 and W2. The basic commit operator, described earlier, uses CAS to lock W1 and then another CAS to lock W2. The hold-time for W1 is increased because of the latency of the CAS needed to acquire W2. The mechanism described here lessens the impact of CAS latency.
  • Transactions may be nested by folding or “flattening” inner transactions into the outermost transaction. By nature, longer transactions have a higher chance of failing because of concurrent interference, however.
  • embodiments herein can operate in two modes which we will call encounter mode and commit mode. These modes indicate how locks are acquired and how transactions are committed or aborted. We will begin by further describing our commit mode algorithm, later explaining how TL operates in encounter mode.
  • a versioned-write-lock is a simple spin lock that uses a compare-and-swap (GAS) operation to acquire the lock and a store to release it. Since one only needs a single bit to indicate that the lock is taken, we use the rest of the lock word to hold a version number. This number is incremented by every successful lock-release. In encounter mode the version number is displaced and a pointer into a threads private undo log is installed.
  • GAS compare-and-swap
  • PO might be implemented, for instance, by leveraging the header words of Java TM objects.
  • a single PS stripe-lock array may be shared and used for different TL data structures within a single address-space. For instance an application with two distinct TL red-black trees and three TL hash-tables could use a single PS array for all TL locks.
  • our default mapping we chose an array of 220 entries of 32-bit lock words with the mapping function masking the variable address with “Ox3FFFFC” and then adding in the base address of the lock array to derive the lock address.
  • the read-set entries contain the address of the lock and the observed version number of the lock associated with the transactionally loaded variable.
  • the write-set entries contain the address of the variable, the value to be written to the variable, and the address of the lock that “cover” the variable.
  • the write-set is kept in chronological order to avoid write-after-write hazards.
  • a transactional load first checks (using a filter such as a Bloom filter) to see if the load address appears in the write-set, if so the transactional load returns the last value written to the address. This provides the illusion of processor consistency and avoids so-called read-after-write hazards. If the address is not found in the write-set the load operation then fetches the lock value associated with the variable, saving the version in the read-set, and then fetches from the actual shared variable. If the transactional load operation finds the variable locked the load may either spin until the lock is released or abort the operation.
  • a filter such as a Bloom filter
  • Transactional stores to shared locations are handled by saving the address and value into the thread's local write-set.
  • the shared variables are not modified during this step. That is, transactional stores are deferred and contingent upon successfully completing the transaction.
  • During the operation of the transaction we periodically validate the read-set. If the read-set is found to be invalid we abort the transaction. This avoids the possibility of a doomed transaction (a transaction that has read inconsistent global state) from becoming trapped in an infinite loop.
  • step (1) The prior observed reads in step (1) have been validated as forming an atomic snapshot of memory. The transaction is now committed. Write-back all the entries from the local write-set to the appropriate shared variables.
  • the write-locks have been held for a brief time when attempting to commit the transaction. This helps improve performance under high contention.
  • the Bloom filter allows us to determine if a value is not in the write set and need not be searched for by reading the single filter word. Though locks could have been acquired in ascending address order to avoid deadlock, we found that sorting the addresses in the write set was not worth the effort.
  • this mode assumes a type-stable closed memory pool or garbage collection.
  • Transactional stores to shared locations are handled by acquiring locks as the are encountered, saving the address and current value into the thread's local write-set, and pointing from the lock to the write-set entry.
  • the shared variables are written with the new value during this step.
  • a transactional load checks to see if the lock is free or is held by the current transaction and if so reads the value from the location. There is thus no need to look for the value in the write set. If the transactional load operation finds that the lock is held it will spin. During the operation of the transaction we periodically validate the read-set. If the read-set is found to be invalid we abort the transaction. This avoids the possibility of a doomed transaction (a transaction that has read inconsistent global state) from becoming trapped in an infinite loop.
  • step (1) The prior observed reads in step (1) have been validated as forming an atomic snapshot of memory. The transaction is now committed.
  • TL can admit a live-lock failure.
  • thread T1's read-set is A and its write-set is B.
  • T2's read-set is B and write-set is A.
  • T1 tries to commit and locks B.
  • T2 tries to commit and acquires A.
  • T1 validates A, in its read-set, and aborts as a Bis locked by T2.
  • T2 validates B in its read-set and aborts as B was locked by T1.
  • To provide liveness we use bounded spin and a back-off delay at abort-time, similar in spirit to that found in CSMA-CD MAC protocols.
  • the delay interval is a function of (a) a random number generated at abort-time, (b) the length of the prior (aborted) write-set, and (c) the number of prior aborts for this transactional attempt. It is important to note that unlike conventional methods, we found that we do not need mechanisms for one transaction to abort another to allow progress/liveness even in encounter mode.
  • TL lock-mapping policies PS, PO, or PW
  • modes Common or Encounter
  • the GC assures that transactionally accessed memory will only be released once no references remain to the object.
  • C or C++TL preferentially uses the PS/Commit locking scheme to allow the C programmer to use normal malloc( )and free( ) operations to manage the lifecycle of structures containing transactionally accessed shared variables.
  • a node contains Key, Value and Next fields.
  • the data structure implements a traditional key-value mapping.
  • the key-value map (the linked list) is protected by TL using PS.
  • Node A's Key field contains 1, its value field contains 1001 and its Next field refers to B.
  • B's Key field contains 2, its Value field contains 1002 and its Next field refers to C.
  • C's Key field contains 3, the value field 1003 and its Next field is NULL.
  • Thread T1 calls put(2, 2002).
  • the TL-based put( ) operator traverses the linked list using transactional loads and finds node B with a key value of 2.
  • T1 then executes a transactional store into B.Value to change 1002 to 2002.
  • T1's read-set consists of A.Key, A.Next, B.Key and the write-set consists of B.Value.
  • T1 attempts to commit; it acquires the lock covering B.Value and then validates that the previously fetched read-set is consistent by checking the version numbers in the locks converging the read-set.
  • Thread T1 stalls.
  • Thread T2 executes delete(2).
  • the delete( ) operator traverses the linked list and attempts to splice-out Node B by setting A.Next to C. T2 successfully commits.
  • the commit operator stores C into A.Next. T2's transaction completes. T2 then calls free(B).
  • T1 resumes in the midst of its commit and stores into B.Value. We have a classic modify-after-free pathology. To avoid such problems T2 calls quiesce(B) after the commit finishes but before free( )ing B. This allows T1's latent transactional ST to drain into B before B is free( )ed and potentially reused. Note, however, that TL (using quiescing) did not admit any outcomes that were not already possible under a simple coarse-grained lock. Any thread that attempts to write into B will, at commit-time, acquire the lock covering B, validate A.Next and then store into B. Once B has been unlinked there can be at most one thread that has successfully committed and is in the process of writing into B. Other transactions attempting to write into B will fail read-set validation at commit-time as A.Next has changed.
  • Thread T1 calls put(2, 2002).
  • the TL-based put( ) method traverse the list and locates node B having a key value of 2.
  • Thread T2 then calls delete(2).
  • the delete( ) operator commits successfully.
  • T2 waits for B to quiesce and then calls free(B).
  • the memory underlying B is recycled and used by some other thread T3.
  • T1 attempts to commit by acquiring the lock covering B.Value.
  • the lock-word is collocated with B.Value, so the CAS operation transiently change the lock-word contents.
  • T2 validates the read-set, recognizes that A.Next changed (because of T1's delete( )) and aborts, restoring the original lock-word value.
  • T1 has cause the memory word underlying the lock for B.value to “flicker”, however. Such modifications are unacceptable; we have a classic modify after-free error.
  • T1 calls put(2,2002). Put( ) traverses the list and locates node B. T2 then calls delete(2), commits successfully, calls quiesce(B) and free(B). T1 acquires the lock covering B.Value, saves the original B.Value (1002) into its private write undo log, and then stores 2002 into B.Value. Later, during read-set validation at commit time, T1 will discover that its read-set is invalid and abort, rolling back B.Value from 2002 to 1002. As above, this constitutes a modify-after-free pathology where B recycled, but B.Value transiently “flickered” from 1002 to 2002 to 1002. We can avoid this problem by enhancing the encounter protocol to validate the read-set after each lock acquisition but before storing into the shared variable. This confers safety, but at the cost of additional performance.
  • Thread T1 traverses a long hash bucket chain searching for a the value associated with a certain key, iterating over “next” fields. We'll say that T1 locates the appropriate node at or near the end of the linked list. T2 concurrently deletes an unrelated node earlier in the same linked list. T2 commits. At commit-time T1 will abort because the linked-list “next” field written to by T2 is in T1's read-set. T1 must retry the lookup operation (ostensibly locating the same node).
  • TL is a software based scheme, it can be made inter-operable with HTM systems on several levels.

Abstract

For each of multiple processes executing in parallel, as long as corresponding version information associated with a respective set of one or more shared variables used for computational purposes has not changed during execution of a respective transaction, results of the respective transaction can be globally committed to memory without causing data corruption. If version information associated with one or more respective shared variables (used to produce the transaction results) happens to change during a process of generating respective results, then a respective process can identify that another process modified the one or more respective shared variables during execution and that its transaction results should not be committed to memory. In this latter case, the transaction repeats itself until it is able to commit respective results without causing data corruption.

Description

    RELATED APPLICATION
  • This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 60/775,580 (Attorney's docket no. SUN06-02 (060720)p, filed on Feb. 22, 2006, entitled “Transactional Locking,” the entire teachings of which are incorporated herein by this reference.
  • This application is related to U.S. patent application identified by Attorney's docket no. SUN06-03(060711), filed on same date as the present application, entitled “METHODS AND APPARATUS TO IMPLEMENT PARALLEL TRANSACTIONS,” which itself claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 60/775,564 (Attorney's docket no. SUN06-01(060711)p, filed on Feb. 22, 2006, entitled “Switching Between Read-Write Locks and Transactional Locking,” the entire teachings of which are incorporated herein by this reference.
  • This application is related to U.S. patent application identified by Attorney's docket no. SUN06-06(060908), filed on same date as the present application, entitled “METHODS AND APPARATUS TO IMPLEMENT PARALLEL TRANSACTIONS,” which itself claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 60/789,483 (Attorney's docket no. SUN06-05(060908)p, filed on Apr. 5, 2006, entitled “Globally Versioned Transactional Locking,” the entire teachings of which are incorporated herein by this reference.
  • This application is related to U.S. patent application identified by Attorney's docket no. SUN06-08(061191), filed on same date as the present application, entitled “METHODS AND APPARATUS TO IMPLEMENT PARALLEL TRANSACTIONS,” which itself claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 60/775,564 (Attorney's docket no. SUN06-01(060711)p, filed on Feb. 22, 2006, entitled “Switching Between Read-Write Locks and Transactional Locking,” the entire teachings of which are incorporated herein by this reference.
  • BACKGROUND
  • There has been an ongoing trend in the information technology industry to execute software programs more quickly. For example, there are various conventional advancements that provide for increased execution speed of software programs. One technique for increasing execution speed of a program is called parallelism. Parallelism is the practice of executing or performing multiple things simultaneously. Parallelism can be possible on multiple levels, from executing multiple instructions at the same time, to executing multiple threads at the same time, to executing multiple programs at the same time, and so on. Instruction Level Parallelism or ILP is parallelism at the lowest level and involves executing multiple instructions simultaneously. Processors that exploit ILP are typically called multiple-issue processors, meaning they can issue multiple instructions in a single clock cycle to the various functional units on the processor chip.
  • There are different types of conventional multiple-issue processors. One type of multiple-issue processor is a superscalar processor in which a sequential list of program instructions are dynamically scheduled. A respective processor determines which instructions can be executed on the same clock cycle, and sends them out to their respective functional units to be executed. This type of multi-issue processor is called an in-order-issue processor since issuance of instructions is performed in the same sequential order as the program sequence, but issued instructions may complete at different times (e.g., short instructions requiring fewer cycles may complete before longer ones requiring more cycles).
  • Another type of multi-issue processor is called a VLIW (Very Large Instruction Width) processor. A VLIW processor depends on a compiler to do all the work of instruction reordering and the processor executes the instructions that the compiler provides as fast as possible according to the compiler-determined order. Other types of multi-issue processors issue out of order instructions, meaning the instruction issue order is not be the same order as the order of instructions as they appear in the program.
  • Conventional techniques for executing instructions using ILP can utilize look-ahead techniques to find a larger amount of instructions that can execute in parallel within an instruction window. Looking-ahead often involves determining which instructions might depend upon others during execution for such things as shared variables, shared memory, interference conditions, and the like. When scheduling, a handler associated with the processor detects a group of instructions that do not interfere or depend on each other. The processor can then issue execution of these instructions in parallel thus conserving processor cycles and resulting in faster execution of the program.
  • One type of conventional parallel processing involves a use of coarse-grained locking. As its name suggests, coarse-grained locking prevents conflicting groups of code from operating on different processes at the same time based on use of lockouts. Accordingly, this technique enables non-conflicting transactions or sets of instructions to execute in parallel.
  • Another type of conventional parallel processing involves a use of fine-grain locking. As its name suggests, fine-grain locking prevents conflicting instructions from being simultaneously executed in parallel based on use of lockouts. This technique enables non-conflicting instructions to execute in parallel.
  • SUMMARY
  • Conventional applications that support parallel processing can suffer from a number of deficiencies. For example, although easy to implement from the perspective of a software developer, coarse-grained locking techniques provide very poor performance because they can severely limit parallelism. Although fine-grain lock-based concurrent software can perform exceptionally well during run-time, developing such code can be a very difficult task for a respective one or more software developers.
  • Techniques discussed herein deviate with respect to conventional applications such as those discussed above as well as other techniques known in the prior art. For example, embodiments herein include techniques for enhancing performance associated with transactions executing in parallel.
  • In general, a transactional memory programming technique according to embodiments herein provides an alternative type of “lock” method over the conventional techniques as discussed above. For example, one embodiment herein involves use and/or maintenance of version information indicating whether any of multiple “globally” shared variables has been modified during a course of executing a respective transaction (e.g., a set of software instructions initiating a respective computation). Any one of multiple possible processes executing in parallel can update respective version information associated with a globally shared variable (e.g., a shared variable accessible by any of multiple processes) in order to indicate that the shared variable has been modified. Accordingly, other processes keeping track of the version information during execution of their own respective transaction can (keep track of) and identify if and when any shared variables have been modified during a window of use. If any critical variables have been modified, a respective process can prevent corresponding computational results from being committed to memory.
  • That is, for each of multiple processes executing in parallel, as long as version information associated with a respective set of one or more shared variables used for computational purposes has not changed during execution of a respective transaction, results of the respective transaction can be committed globally without causing data corruption by one or more processes simultaneously using the shared variable. If version information associated with one or more respective shared variables (used to produce the transaction results) happens to change during a process of generating respective results, then a respective process can identify that another process modified the one or more respective shared variables during execution and prevent global committal of the respective results. In this latter case, the transaction can repeat itself (e.g., execute again or retry) until the process is able to commit respective results without causing data corruption. In this way, each of multiple processes executing in parallel can “blindly” initiate computations using the shared variables even though there is a chance that another process executing in parallel modifies a mutually used shared variable and prevents the process from globally committing its results.
  • In view of the specific embodiment discussed above, more general embodiments herein are directed to maintaining version information associated with shared variables. In one embodiment, a computer environment includes segments of information (e.g., a groupings, sections, portions, etc. of a repository for storing data values associated with one or more variables) that are shared by multiple processes executing in parallel. For each of at least two of the segments, the computer environment includes a corresponding location to store a respective version value (e.g., version information) representing a relative version of a respective segment. A relative version associated with a segment is changed or updated by a respective process each time any contents (e.g., data values of one or more respective shared variables) in a respective segment has been modified. Accordingly, other processes keeping track of version information associated with a respective segment can identify if and when contents of the respective segment have been modified.
  • In one embodiment, one or more processes in the computer environment can use contents stored in the one or more segments to generate new data values for storage in a segment. A respective process can initiate modification of a data value associated with a shared variable. For example, in one embodiment, the processes can compete to secure an exclusive access lock with respect to each of multiple segments to prevent other processes from modifying a respective locked segment. Locking of a segment (e.g., a single or multiple shared variables) can prevent two or more processes from modifying a same data segment. Locking of a segment also may provide notification to other processes that the other processes should not use contents of a respective segment for a current transaction and/or that previous computations associated with a current transaction must be aborted.
  • According to further embodiments, a computer environment can be configured to maintain, for each of multiple segments of shared data, a corresponding location to store globally accessible lock information indicating whether one of the multiple processes executing in parallel has locked a respective segment for: i) changing a respective one or more data value therein, and ii) preventing other processes from reading respective data values from the respective segment. In other words, acquiring a lock on a segment prevents other processes from accessing data values in the locked segment.
  • Additionally, the computer environment can enable the multiple processes to maintain (e.g., store, retrieve, use, etc.) version information associated with the respective multiple segments to identify whether contents of a respective segment have changed over time. For example, a computer environment can include globally accessible version information enabling a respective one of the processes to modify respective version value information associated with shared variables. The version value information can represent a relative version value associated with a given segment as modified by a respective process to a new unique data value to indicate that the respective process modified a data value associated with the given segment.
  • As a more specific example, a first process can retrieve a data value associated with a shared variable as well as retrieve a current version value associated with the shared variable when the shared variable is accessed. The first process stores the version value associated with the shared variable and then can perform computations (e.g., a transaction) using the shared variable. Prior to globally committing results associated with the transaction, the first process can verify that no other process modified the shared variable by checking current version information associated with the shared variable. If the version information associated with one or more shared variables at a committal phase of the transaction matches corresponding originally obtained version information associated with the one or more shared variables during an execution phase of the transaction, then the first process can globally commit results of the transaction to memory. Alternatively, the first process can abort and repeat a transaction until it is able to complete without interference. If and when the first process is able to globally commit it results from a respective transaction to memory, then the first process updates version information associated with any data values (or segments) that are modified during the commit phase. Accordingly, a second process (or multiple other processes) can identify if and when a data value associated with the one or more shared variables changes and prevent or initiate its own global committal depending on current processing circumstances.
  • Techniques herein are well suited for use in applications such as those supporting parallel processing and use of shared data. However, it should be noted that configurations herein are not limited to such use and thus configurations herein and deviations thereof are well suited for use in other environments as well.
  • In addition to the embodiments discussed above, other embodiments herein include a computerized device (e.g., a host computer, workstation, etc.) configured to support the techniques disclosed herein such as supporting parallel execution of transaction performed by different processes. In such embodiments, a computer environment includes a memory system, a processor (e.g., a processing device), a respective display, and an interconnect connecting the processor and the memory system. The interconnect can also support communications with the respective display (e.g., display screen or display medium). The memory system is encoded with an application that, when executed on the processor, supports parallel processing according to techniques herein.
  • Yet other embodiments of the present disclosure include software programs to perform the method embodiment and operations summarized above and disclosed in detail below in the Detailed Description section of this disclosure. More specifically, one embodiment herein includes a computer program product (e.g., a computer-readable medium). The computer program product includes computer program logic (e.g., software instructions) encoded thereon. Such computer instructions can be executed on a computerized device to support parallel processing according to embodiments herein. For example, the computer program logic, when executed on at least one processor associated with a computing system, causes the processor to perform the operations (e.g., the methods) indicated herein as embodiments of the present disclosure. Such arrangements as further disclosed herein can be provided as software, code and/or other data structures arranged or encoded on a computer readable medium such as an optical medium (e.g., CD-ROM), floppy or hard disk, or other medium such as firmware or microcode in one or more ROM or RAM or PROM chips or as an Application Specific Integrated Circuit (ASIC). The software or firmware or other such configurations can be installed on a computerized device to cause one or more processors in the computerized device to perform the techniques explained herein.
  • Yet another more particular technique of the present disclosure is directed to a computer program product that includes a computer readable medium having instructions stored thereon for to facilitate use of shared information among multiple processes. The instructions, when carried out by a processor of a respective computer device, cause the processor to perform the steps of: i) executing a transaction defined by a corresponding set of instructions to produce a respective transaction outcome based on use of at least one shared variable; ii) after producing the respective transaction outcome, initiating a lock on a given shared variable to prevent other processes from modifying a data value associated with the given shared variable; and iii) initiating a modification of the data value associated with the given shared variable based on the respective transaction outcome even though at least one of the other processes performed a computation using the data value associated with the given shared variable before the lock. Other embodiments of the present application include software programs to perform any of the method embodiment steps and operations summarized above and disclosed in detail below.
  • It is to be understood that the system of the invention can be embodied as a software program, as software and hardware, and/or as hardware alone. Example embodiments of the invention may be implemented within computer systems, processors, and computer program products and/or software applications manufactured by Sun Microsystems Inc. of Palo Alto, Calif., USA.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing and other objects, features, and advantages of the present application will be apparent from the following more particular description of preferred embodiments of the present disclosure, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, with emphasis instead being placed upon illustrating the embodiments, principles and concepts.
  • FIG. 1 is a diagram illustrating a computer environment enabling multiple processes to access shared variable data according to embodiments herein.
  • FIG. 2 is a diagram illustrating maintenance and use of version and lock information associated with shared data according to embodiments herein.
  • FIG. 3 is a diagram of a sample process including a read-set and write-set according to embodiments herein.
  • FIG. 4 is a diagram of a flowchart illustrating execution of a transaction according to an embodiment herein.
  • FIG. 5 is a diagram of a flowchart illustrating execution of a transaction according to embodiments herein.
  • FIG. 6 is a diagram of a sample architecture supporting shared use of data according to embodiments herein.
  • FIG. 7 is a diagram of a flowchart according to an embodiment herein.
  • FIG. 8 is a diagram of a flowchart according to an embodiment herein.
  • FIG. 9 is a diagram of a flowchart according to an embodiment herein.
  • DETAILED DESCRIPTION
  • For each of multiple processes executing in parallel, as long as corresponding version information associated with a respective set of one or more shared variables used for computational purposes has not changed during execution of a respective transaction, results of the respective transaction can be globally committed to memory without causing data corruption. If version information associated with one or more corresponding shared variables (used to produce the transaction results for the respective transaction) happens to change thus indicating that another process modified shared data used to generate results associated with the respective transaction, then results associated with the respective transaction are not committed to memory for global access. In this latter case, the respective transaction repeats itself until the respective transaction is able to commit respective results without causing potential data corruption as a result of data changing during execution of the respective transaction.
  • FIG. 1 is a block diagram of a computer environment 100 according to an embodiment herein. As shown, computer environment 100 includes shared data 125 and corresponding metadata 135 (e.g., in a respective repository) that is globally accessible by multiple processes 140 such as process 140-1, process 140-2, . . . process 140-M. In one embodiment, each of processes 140 is a processing thread. Metadata 135 enables each of processes 140 to identify whether portions of shared data 125 have been “locked” and/or whether any portions of shared data 125 have changed during execution of a respective transaction.
  • Each of processes 140 includes a respective read-set 150 and write-set 160 for storing information associated with shared data used to carry computations with respect to a transaction. For example, process 140-1 includes read-set 150-1 and write-set 160-1 to carry out a respective one or more transactions associated with process 140-1. Process 140-2 includes read-set 150-2 and write-set 160-2 to carry out a respective transaction associated with process 140-2. Process 140-M includes read-set 150-M and write-set 160-M to carry out one or more transactions associated with process 140-M.
  • Transactions executed by respective processes 140 can be defined by one or more instructions of software code. Accordingly, each of processes 140 can execute a respective set of instructions to carry out a respective transaction. In one embodiment, the transactions executed by the processes 140 come from the same overall program or application running on one or more computers. Alternatively, the processes 140 execute transactions associated with different programs.
  • In the context of a general embodiment herein such as computer environment 100 in which multiple processes 140 (e.g., processing threads) execute transactions in parallel, each of processes 140 accesses shared data 125 to generate computational results (e.g., transaction results) that are eventually committed for storage in a respective repository storing shared data 125. Shared data 125 is considered to be globally accessible because each of the multiple processes 140 can access the shared data 125.
  • Each of processes 140 can store data values locally that are not accessible by the other processes 140. For example, process 140-1 can globally access a data value and store a respective copy locally in write-set 160-1 that is not accessible by any of the other processes. During execution of a respective transaction, the process 140-1 is able to locally modify the data value in its write-set 160. Accordingly, one purpose of write-set 160 is to store globally accessed data that is modified locally.
  • As will be discussed later in this specification, the results of executing the respective transaction can be globally committed back to a respective repository storing shared data 125 depending on whether globally accessed data values happened to change during the course of the transaction executed by process 140-1. In general, a respective read-set 150-1 associated with each process stores information for determining which shared data 125 has been accessed during a respective transaction and whether any respective data values associated with globally accessed shared data 125 happens to change during execution of a respective transaction.
  • In one embodiment, each of one or more processes 140 complies with a respective rule or set of rules indicating transaction size limitations associated with the parallel transactions to enhance efficiency of multiple processes executing different transactions using a same set of shared variables including the given shared variable to produce respective transaction outcomes. For example, each transaction can be limited to a certain number of lines of code, a number of data value modifications, time limit, etc. so that potentially competing transactions do not end up in a deadlock.
  • As will be further discussed, embodiments herein include: i) maintaining a locally managed and accessible write set of data values associated with each of multiple shared variables that are locally modified during execution of the transaction, the local write set representing data values not yet a) globally committed and b) accessible by the other processes; ii) initiating locks on each of the multiple shared variables specified in the write set which were locally modified during execution of the transaction to prevent the other processes from changing data values associated with the multiple shared variables to be modified; iii) verifying that respective data values associated with the multiple shared variables accessed during the transaction have not been globally modified by the other processes during execution of the transaction by checking that respective version values associated with the multiple shared variables have not changed during execution of the transaction; and vi) after modifying data values associated with the multiple shared variables, releasing the locks on each of the multiple shared variables.
  • FIG. 2 is a diagram illustrating shared data 125 and corresponding metadata 135 according to embodiments herein. As shown, shared data 125 can be partitioned to include segment 210-1, segment 210-2, . . . , segment 210-J. A respective segment of shared data 125 can be a resource such as a single variable, a set of variables, an object, a stripe, a portion of memory, etc. Metadata 135 includes respective version information 220 and lock information 230 associated with each corresponding segment 210 of shared data 125. In one embodiment, version information 220 is a multi-bit value that is incremented each time a respective process 140 modifies contents of a corresponding segment 210 of shared data 135. The lock information 230 and version information 220 can make up a single 64-bit word.
  • In one embodiment, each of processes 140 (e.g., software) need not be responsible for updating the version information 220. For example, a monitor function separate or integrated with processes 140 automatically initiate changing version information 220 each time contents of a respective segment is modified.
  • As an example, assume that process 140-2 (e.g., a software processing entity) modifies contents of segment 210-1 during a commit phase of a respective executed transaction. Prior to committing transaction results globally to shared data 125, process 140-2 would read and store version information 220-1 associated with segment 210-1 or shared variable. After modifying contents of segment 210-1 during the commit phase, the process 140-2 would modify the version information 220-1 in metadata 135 to a new value. More specifically, prior to modifying segment 210-1, the version information 220-1 may have been a count value of 1326. After modifying segment 210-1, the process 140-2 updates (e.g., increments) the version information 220-1 to be a count value of 1327. Each of the processes 140 performs a similar updating of corresponding version information 220 each time a respective process 140 modifies a respective segment 210 of shared data 125. Accordingly, the processes can monitor the version information 220-1 to identify when changes have been made to a respective segment 210 of shared data 125.
  • Note that metadata 135 also maintains lock information 230 associated with each respective segment 210 of shared data 125. In one embodiment, the lock information 230 associated with each segment 210 is a globally accessible single bit indicating whether one of processes 140 currently has “locked” a corresponding segment for purposes of modifying its contents. For example, a respective process such as process 140-1 can set the lock information 230-J to a logic one indicating that segment 210-J has been locked for use. Other processes know that contents of segment 210-J should not be accessed, used, modified, etc. during the lock phase initiated by process 140-1. Upon completing a respective modification to contents of segment 210-J, process 140-1 sets the lock information 230-J to a logic zero. All processes 140 can then compete again to obtain a lock with respect to segment 210-J.
  • FIG. 3 is a diagram more particularly illustrating details of respective read-sets 150 and write-sets 160 associated with processes 140 according to embodiments herein. As shown, process 140-1 executes transaction 351 (e.g., a set of software instructions). Read-set 150-1 stores retrieved version information 320-1, retrieved version information 320-2, . . . , retrieved version information 320-K associated with corresponding data values (or segments) accessed from shared data 125 during execution of transaction 351. Accordingly, the process 140-1 can keep track of version information associated with any globally accessed data.
  • Write-set 160-1 stores shared variable identifier information 340 (e.g., address information, variable identifier information, etc.) for each respective globally shared variable that is locally modified during execution of the transaction 351. Local modification involves maintaining and modifying locally used values of shared variables in write-set 160-1 rather than actually modifying the global variables during execution of transaction 351. As discussed above and as will be further discussed, the process 140-1 attempts to globally commit information in write-set 160-1 to shared data 125 upon completion of transaction 351. In the context of the present example, process 140-1 maintains write-set 160-1 to include i) shared variable identifier information 340-1 (e.g., segment or variable identifier information) of a respective variable accessed from shared data 125 and corresponding locally used value of shared variable 350-1, ii) shared variable identifier information 340-2 (e.g., segment or variable identifier information) of a variable or segment accessed from shared data 125 and corresponding locally used value of shared variable 350-2, an so on. Accordingly, process 140-1 uses write-set 160-1 as a scratch-pad to carry out execution of transaction 351 and keep track of locally modified variables and corresponding identifier information.
  • FIG. 4 is a flowchart illustrating a more specific use of read-sets 150, write-sets 160, version information 220, and lock information 230 according to embodiments herein. In general, flowchart 400 indicates how each of multiple processes 140 utilizes use of read-sets 150 and write-sets 160 while carrying out a respective transaction.
  • Step 405 indicates a start of a respective transaction. As previously discussed, a transaction can include a set of software instructions indicating how to carry out one or more computations using shared data 125.
  • In step 410, a respective process 140 executes an instruction associated with the transaction identifying a specific variable in shared data 125.
  • In step 415, the respective process checks whether the variable exists in its respective write-set 160. If the variable already exists in its respective write-set 160 in step 420, then processing continues at step 440 in which the respective process 140 fetches a locally maintained value from its write-set 160.
  • If a locally stored data value associated with the variable does not already exist in its respective write-set 160 (e.g., because the variable was never fetched yet and/or modified locally) as identified in step 415, then processing continues at step 420 in which the respective process 140 attempts to globally fetch a data value associated with the variable based on a respective access to shared data 125. For example, as further indicated in step 425, the process 140 checks whether the variable to be globally fetched is locked by another process. As previously discussed, another process may lock variables, segments, etc. of shared data 125 to prevent others from accessing the variables. Globally accessible lock information 230 (e.g., a single bit of information) in metadata 135 indicates which variables have been locked for use.
  • If an active lock is identified in step 425, the respective process initiates step 430 to abort and retry a respective transaction or initiate execution of a so-called back-off function to access the variable. In the latter instance, the back-off function can specify a random or fixed amount of time for the process to wait before attempting to read the variable again with hopes that a lock will be released. The respective lock on the variable may be released by during a second or subsequent attempt to read the variable.
  • If no lock is present on the variable during execution of step 425, the respective process initiates step 435 to globally fetch a data value associated with the specified variable from shared data 125. In addition to globally accessing the data value associated with the shared variable, the respective process retrieves version information 220 associated with the globally fetched variable. The process stores retrieved version information associated with the variable in its respective read-set 150 for later use during a commit phase.
  • In step 445, the respective process utilizes the fetched data value associated with the variable to carry out one or more computations associated with the transaction. Based on the paths discussed above, the data value associated with the variable can be obtained from either write-set 160 or shared data 125.
  • In step 450, the process performs a check to identify whether use of the fetched variable (in the transaction) involve modifying a value associated with the fetched variable. If so, in step 455, the process modifies the locally used value of shared variable 350 in write-set 160. The respective process skips executing step 455 if use of the variable (as specified by the executed transaction) does not involve modification of the variable.
  • In step 460, the respective process identifies whether a respective transaction has completed. If not, the process continues at step 410 to perform a similar loop for each of additional variables used during a course of executing the transaction. If the transaction has completed in step 460, the respective process continues at step 500 (e.g., the flowchart 500 in FIG. 5) in which the process attempts to globally commit values in its write-set 160 to globally accessible shared data 125.
  • Accordingly, in response to identifying that a corresponding data value associated with one or more shared variable was modified during execution of the transaction, a respective process can abort a respective transaction in lieu of modifying a data value associated with shared data 125 and initiate execution of the transaction again at a later time to produce attempt to produce a respective transaction outcome.
  • FIG. 5 is a flowchart 500 illustrating a technique for committing results of a transaction to shared data 125 according to embodiments herein. Up until his point, the process executing the respective transaction has not initiated any locks on any shared data yet although the process does initiate execution of computations associated with accessed shared data 125. Waiting to obtain locks at the following “commit phase” enables other processes 140 to perform other transactions in parallel because a respective process initiating storage of results during the commit phase holds the locks for a relatively short amount of time. In
  • In step 505, the respective process that executed the transaction attempts to obtain locks associated with each variable in its write-set 160. For example, the process checks whether lock information in metadata 135 indicates whether the variables to be written to (e.g., specific portions of globally accessible shared data 125) are locked by another process. The process initiates locking the variables (or segments as the case may be) to block other process from using or locking the variables. In one embodiment, a respective process attempts to obtain locks according to a specific ordering such as an order of initiating local modifications to retrieved shared variables during execution of a respective transaction, addresses associated with the globally shared variables, etc.
  • If all locks cannot be immediately obtained in step 510, then the process can abort and retry a transaction or initiate a back-off function to acquire locks associated with the variables that are locally modified during execution of the transaction.
  • After all appropriate locks have been obtained by writing respective lock information 230, processing continues at step 520 in which the process obtains the stored version information associated with variables read from shared data 125. As previously discussed, the version information 230 of metadata 135 indicates a current version of the respective variables at a time when they were read during execution of the transaction.
  • In step 525, the respective process compares the retrieved version information in the read-set 150 saved at a time of accessing the shared variables to the current globally available version information 220 from metadata 135 for each variable in the read-set 150.
  • In step 530, if the version information is different in step 525, then the process acknowledges that another process modified the variables used to carry out the present transaction. Accordingly, the process releases any obtained locks and retries the transaction again. This prevents the respective process from causing data corruption.
  • In step 535, if the version information is the same in step 525, then the process acknowledges that no other process modified the variables used to carry out the present transaction. Accordingly, the process can initiate modification of shared data to reflect the data values in the write-set 160. This prevents the respective process from causing data corruption during the commit phase.
  • Finally, in step 540, after updating the shared data 125 with the data values in the write-set 160, the process updates version information 220 associated with modified variables or segments and releases the locks. The locks can be released in any order or in a reverse order relative to the order of obtaining the locks.
  • Note that during the commit phase as discussed above in flowchart 500, if a lock associated with a location in the process's write-set 160 also appears in the read-set 150, then the process must atomically: a) acquire a respective lock and b) validate that current version information associated with the variable (or variables) is the same as the retrieved version information stored in the read-set 150. In one embodiment, a CAS (Compare and Swap) operation can be used to accomplish both a) and b).
  • Also, note that each of the respective processes 140 can be programmed to occasionally, periodically, sporadically, intermittently, etc. check (prior to the committal phase in flowchart 500) whether current version information 220 in metadata 135 matches retrieved version information in its respective read-set 150 for all variables read from shared data 125. Additionally, each of the respective processes 140 can be programmed to also check (in a similar way) whether a data value and/or corresponding segment has been locked by another process prior to completion. If a change is detected in the version information 220 (e.g., there is a difference between retrieved version information 320 in read-set 150 and current version information 220) and/or a lock is implemented on a data value or segment used by a given process, the given process can abort and retry the current transaction, prior to executing the transaction to the commit phase. Early abortion of transactions doomed to fail (because of an other process locking and modifying) can increase overall efficiency associated with parallel processing.
  • Use of version information and lock information according to embodiments herein can prevent corruption of data. For example, suppose that as an alternative to the above technique of using version information to verify that relied upon information (associated with a respective transaction) has not changed by the end of a transaction, a process reads data values (as identified in a respective read-set) from shared data 125 again at commit time to ensure that the data values are the same as were when first being fetched by the respective process. Unfortunately, this technique can be misleading and cause errors because of the occurrence of race conditions. For example, a first process may read and verify that a globally accessible data value in shared data 125 has not changed while soon after (or at nearly the same time) another respective process modifies the globally accessible data value. This would result in corruption if the first process committed its results to shared data 125. The techniques herein are advantageous because use of version and lock information in the same word prevents corruption as a result of two different processes accessing the word at the same or nearly the same time.
  • FIG. 6 is a block diagram illustrating an example computer system 610 (e.g., an architecture associated with computer environment 100) for executing a parallel processes 140 and other related processes according to embodiments herein. Computer system 610 can be a computerized device such as a personal computer, workstation, portable computing device, console, network terminal, processing device, etc.
  • As shown, computer system 610 of the present example includes an interconnect 111 that couples a memory system 112 storing shared data 125 and metadata 135, one or more processors 113 executing processes 140, an I/O interface 114, and a communications interface 115. Peripheral devices 116 (e.g., one or more optional user controlled devices such as a keyboard, mouse, display screens, etc.) can couple to processor 113 through I/O interface 114. I/O interface 114 also enables computer system 610 to access repository 180 (that also potentially stores shared data 125 and/or metadata 135). Communications interface 115 enables computer system 610 to communicate over network 191 to transmit and receive information from different remote resources.
  • Note that functionality associated with processes 140 can be embodied as software code such as data and/or logic instructions (e.g., code stored in the memory or on another computer readable medium such as a disk) that support functionality according to different embodiments described herein. Alternatively, the functionality associated with processes 140 can be implemented via hardware or a combination of hardware and software code.
  • It should be noted that, in addition to the processes 140 themselves, embodiments herein include a respective application and/or set of instructions to carry out processes 140. Such a set of instructions associated with processes 140 can be stored on a computer readable medium such as a floppy disk, hard disk, optical medium, etc. The set of instruction can also be stored in a memory type system such as in firmware, RAM (Random Access Memory), read only memory (ROM), etc. or, as in this example, as executable code.
  • Attributes associated with processes 140 will now be discussed with respect to flowcharts in FIG. 7-9. For purposes of this discussion, each of the multiple processes 140 in computer environment 100 can execute or carry out the steps described in the respective flowcharts. Note that the steps in the below flowcharts need not always be executed in the order shown.
  • Now, more particularly, FIG. 7 is a flowchart 700 illustrating a technique supporting execution of parallel transactions in computer environment 100 according to an embodiment herein. Note that techniques discussed in flowchart 700 overlap and summarize some of the techniques discussed above.
  • In step 710, a respective one of multiple processes 140 executes a transaction defined by a corresponding set of instructions to produce a respective transaction outcome based on use of at least one shared variable from shared data 125.
  • In step 720, after producing the respective transaction outcome (e.g., locally storing computational results in its respective write-set 160), the respective process 140 initiates a lock on a given shared variable of shared data 125 to prevent other processes from modifying a data value associated with the given shared variable.
  • In step 730, the respective process 140 initiates a modification of the data value associated with the given shared variable based on the respective transaction outcome even though at least one of the other processes 140 in computer environment 100 also performed a computation using the data value associated with the given shared variable before the lock and during execution of the transaction by the respective one of multiple processes 140.
  • FIG. 8 is a flowchart 800 illustrating processing steps associated with processes 140 according to an embodiment herein. Note that techniques discussed in flowchart 800 overlap with the techniques discussed above in the previous figures.
  • In step 810, each of multiple processes 140 maintains version information in a respective locally managed read set 150 associated with an executed transaction. In one embodiment, the read set 150 is generally not accessible by the other processes 140 using the shared variables from shared data 125. Accordingly, the read set 150 and write-set 160 serve as a local scratch-pad function. As previously discussed, the read set 150 can store and identify version information (e.g., includes retrieved version information) associated with each of multiple shared variables used to generate a respective transaction outcome associated with a given process. The version information stored in the read-set 150 indicates respective versions of the multiple shared variables in shared data 125 at a time when the transaction retrieves respective data values associated with the multiple shared variables (e.g., shared data 125) from a corresponding globally accessible repository.
  • In step 815, after producing a respective transaction outcome associated with an executed transaction, each of multiple processes 140 potentially competes to initiate a respective lock on a given one or more shared variables (e.g., portions of shared data 125) locally modified (as indicated in write-set 160) during the transaction to prevent other processes from modifying a data value associated with the given one or more shared variables.
  • In step 820, after acquiring respective locks associated with the given one or more shared variables and before globally modifying respective data values associated with the given one or more shared variables, a respective process attempting to globally commit its results verifies that newly read (e.g., present or current) version information associated with each of the given one or more shared variables used to generate the respective transaction outcome matches the version information in the locally managed read set associated with the transaction. The newly read version information can be used to identify whether the data values associated with the multiple shared variables have not changed by the other processes during execution of the transaction. There was no change if the newly retrieved version information matches the version information in the read-set 150.
  • In step 825, after verifying that “before-and-after” version information matches and obtaining locks, a respective one of the multiple processes 140 initiates a modification of data values associated with the given one or more shared variables based on the respective transaction outcome. The respective process globally modifies the data values associated with the transaction outcome even though one or more of the other processes 140 performed a computation using the data value associated with the given shared variable before the respective process obtains the lock.
  • In step 830, after the modification of the data values in the shared data 125 associated with the given one or more shared variables in write-set 160, the respective process modifies globally accessible version information 220 associated with the modified segments of shared data 125 (e.g., one or more shared variable) to indicate to other processes that contents of a respective segment have been modified.
  • FIG. 9 is a flowchart 900 illustrating another technique associated with use of lock and version information according to embodiments herein. Note that techniques discussed in flowchart 900 overlap and summarize some of the techniques discussed above.
  • In step 910, computer environment 100 maintains segments 210 of information (e.g., shared data 125) that are shared by multiple processes 140 executing in parallel.
  • In step 915, for each of multiple segments 210, the computer environment 100 maintains a corresponding location (e.g., a portion of storage) to store a respective version value representing a relative version of contents in a respective segment 210. As previously discussed, the relative version associated with a segment is updated by a respective process each time contents of the respective segment is modified by a process. For example, after committing results to shared data 125, a respective process can increment the version value by one over the previous version value to notify other processes 140 that the shared data 125 has changed.
  • In step 920, computer environment 100 enables the multiple processes to compete and secure an exclusive access lock with respect to each of the multiple segments 210 to prevent other processes 140 from modifying a respective locked segment.
  • In step 925, for each of the multiple segments 210, computer environment 100 maintains a corresponding location to store globally accessible lock information (e.g., lock information 230) indicating whether one of the multiple processes 140 executing in parallel has locked a respective segment 210 for: i) changing a respective data value in the respective segment 210, and ii) preventing other processes from reading respective data values from the respective segment 210.
  • In step 930, computer environment 100 enables the multiple processes 140 to retrieve version information 220 associated with the respective multiple segments 210 to identify whether contents of a respective segment have changed over time.
  • In sub-step 935 of step 930, one embodiment of computer environment 100 enables a respective one of the processes 140 to modify a respective version value representing a relative version value associated with a given segment 210 to a new unique data value to indicate that a respective one of the processes modifies a data value associated with the given segment has been modified.
  • As discussed above, techniques herein are well suited for use in applications such as those that support parallel processing of threads in the same or different processors. However, it should be noted that configurations herein are not limited to such use and thus configurations herein and deviations thereof are well suited for use in other environments as well.
  • Further Embodiments Associated with Transactional Locking
  • A leading approach for simplifying concurrent programming is a class of non-blocking software (and hardware) mechanisms called transactional memories. Transactional memories can be static or dynamic, indicating whether the locations transacted on are known in advance (like an n-location CAS) or decided dynamically within the scope of the transaction's execution, the latter type being more general and expressive. Unfortunately, current implementations of dynamic non-blocking software transactional memories (STMs) have unsatisfactory performance.
  • This disclosure presents a new software based dynamic transactional memory mechanism which we call Transactional Locking (TL). TL is essentially a way of using static (and therefore simple) non-blocking transactions in software or hardware to transform sequential code into deadlock-free dynamic transactions based on fine grained locks.
  • Initial performance benchmarks of an “all-software” TL mechanism are surprisingly good. TL implementations of concurrent data structures significantly outperform the most effective S™ based implementations, and, more importantly, are within a competitive margin from the most efficient hand crafted implementations. These surprising performance results bring us to question two assumptions that have recently taken hold in the transactional memory development community: that software transactions should be non-blocking, and that to be useful, hardware transactions need to be dynamic.
  • 1.0 Introduction
  • A goal of current multiprocessor software design is to introduce parallelism into software applications by allowing operations that do not conflict in accessing memory to proceed concurrently. As discussed above, a key tool in designing concurrent data structures has been the use of locks. Unfortunately, course grained locking is easy to program with, but provides very poor performance because of limited parallelism, while designing fine grained lock-based concurrent data structures has long been recognized as a difficult task better left to experts. If concurrent programming and data structure design is to become ubiquitous, researchers agree that one must develop alternative approaches that simplify code design and verification. This disclosure addresses “mechanical” methods for transforming sequential code or course-grained lock-based code to concurrent code. In one embodiment, by mechanical we mean that the transformation, whether done by hand, by a preprocessor, or by a compiler, does not require any program specific information (such as the programmer's understanding of the data flow relationships).
  • 1.1 Transactional Programming
  • Transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. Combining sequences of concurrent operations into atomic transactions seems to promise a great reduction in the complexity of both programming and verification, by making parts of the code appear to be sequential without the need to use fine-grained locking. Transactions will hopefully remove from the programmer the burden of figuring out the interaction among concurrent operations that happen to conflict when accessing the same locations in memory. Transactions that do not conflict in accessing memory will run uninterrupted in parallel, and those that do will be aborted and retried with the programmer having to worry about issues such as deadlock. There are currently proposals for hardware implementations of transactional memory (HTM), purely software based ones (i.e. Software Transactional Memories (3™), and hybrid schemes that combine hardware and software.
  • A preferred unifying theme of parallel processing is that the transactions provided to the programmer, in either hardware or software, will be non-blocking, unbounded, and dynamic. Non-blocking means that transactions do not use locks, and are thus obstruction-free, lock-free, or wait-free. Unbounded means that there is no limit on the number of locations accessed by the transaction. Dynamic means that the set of locations accessed by the transaction is not known in advance and is determined during its execution. Providing all three properties in hardware seems to introduce large degrees of complexity into the design. Providing them in software seems to limit performance: hand-crafted lock-based code, though hard to program and prove correct, greatly outperforms the most effective current software STMs, even when they are programmed using an understanding of the data access relationships. When the STM programmer does not make use of such information, performance of STMs is in general an order of magnitude slower than the hand-crafted counterparts.
  • 1.2 Transactional Locking
  • This disclosure, according to one embodiment, suggests that it is perhaps time to re-examine these basic development requirements. We contend that on modem operating systems, deadlock avoidance is the only compelling reason for making transactions non-blocking, and that there is no reason to provide it for transactions at the user level. Conventional mechanisms already exist whereby threads might yield their quanta to other threads. In particular, one conventional method such as so-called “schedctl” (e.g., a feature in the Solaris™ operating system) allows threads to transiently defer preemption while holding locks. In sense, rather than trying to improve on hand-crafted lock-based implementations by being non-blocking, we propose to get as close to their behavior as one can with a mechanical approach, that is, one that does not require the programmer to understand their data access relationships.
  • With this in mind, the disclosure introduces a new way of Transactional Locking (TL), a blocking approach to designing software based transactional memory mechanisms. TL according to embodiments herein transforms sequential code into unbounded concurrent dynamic transactions that synchronize using deadlock-free fine grained locking. The scheme itself is highly efficient because it does not try to provide a non-blocking progress guarantee for the transaction as a whole. Instead, static (and therefore simple) non-blocking transactions are used only to provide deadlock freedom when acquiring the set of locks needed to safely complete a transaction. These simple static transactions can be implemented in a trivial manner using today's hardware synchronization operations such as compare-and-swap (CAS), or using hardware transactions when these become available. We note that implementing static transactions in hardware may prove significantly simpler than implementing the more general dynamic ones proposed in current HTM schemes.
  • 1.3 A TL Approach in a Nutshell
  • One TL mechanism is based on coordination via a special versioned-read-write-lock. Each shared variable is associated with and protected by one lock. The mapping between variables and locks can be one-to-one or many-to-few. For instance there may be one lock per variable, where the lock is allocated adjacent to the variable; one lock per object; or a separate array of locks indexed by a hash of the variable address. Other mappings are possible as well. A versioned-read-write lock has a version field in the lock word and increments the lock's version number on every successful write attempt. In an example embodiment the versioned-read-write lock would consist of a word where the low-order bit served as a lock-bit and the remaining bits served as a version subfield. On a high level a dynamic transaction is executed as follows:
      • 1. Run the transactional code, reading the locks of all fetched-from shared locations and building a local read-set and write-set (use a safe load operation to avoid running off null pointers as a result of reading an in-consistent view of memory).
      • A transactional load first checks to see if the load address appears in the write-set. if so the transactional load returns the last value written to the address. This provides the illusion of processor consistency and avoids so-called read-after-write hazards. If the address is not found in the write-set the load operation then fetches the lock value associated with the variable, saving the version in the read-set, and then fetches from the actual shared variable. If the transactional load operation finds the variable locked the load may either spin until the lock is released or abort the operation.
      • Transactional stores to shared locations are handled by saving the address and value into the thread's local write-set. The shared variables are not modified during this step. That is, transactional stores are deferred and contingent upon successfully completing the transaction.
      • 2. Attempt to commit the transaction. Acquire the locks of locations to be written. If a lock in the write-set (or more precisely a lock associated with a location in the write-set) also appears in the read-set then the acquire operation must atomically (a) acquire the lock and, (b) validate that the current lock version subfield agrees with the version found in the earliest read-entry associated with that same lock. An atomic CAS can accomplish both (a) and (b). In its simplest form, acquire locks in ascending lock address order, avoiding deadlocks.
      • Alternately, the implementation might acquire the locks in some other order, using bounded spinning to avoid indefinite deadlock.
      • 3. Re-read the locks of all read-only locations to make sure version numbers haven't changed. If version does not match, roll-back (release) the locks, abort the transaction, and retry.
      • 4. The prior observed reads in step (1) have been validated as mutually consistent. The transaction is now committed. Write-back all the entries from the local write-set to the appropriate shared variables.
      • 5. Release all the locks identified in the write-set by atomically incrementing the version and clearing the write-lock bit. Critically, the write-locks have been held for a brief time.
  • At a high level TL according to embodiments herein converts coarse-grained lock operations into transactions, where the transactional infrastructure is implemented with fine-grained locks.
  • There are various other optimizations and contention reduction mechanisms that one should add to this basic scheme to improve performance, but, as can be seen, at its core it is painfully simple. The acquisition of the locks in step 2 is essentially a static obstruction-free transaction, one in which the set of accessed locations is know in advance. It can alternately be sped-up using a hardware transaction such as an re-location compare-and-swap (CAS) operation. As noted earlier, this type of operation is simpler than the dynamic hardware transaction.
  • 1.4 TL vs. STM and Hand-Grafted Locking
  • One aspect associated with TL is the observation that the blocking part of a transaction can be limited to the acquisition of a set of lock records. This observation has significant performance implications because it allows one to eliminate all the overheads associated with the mechanisms providing the non-blocking progress guarantee for the transaction as a whole. As we show, this is a major source of overhead of current STM systems.
  • When compared to hand-crafted lock-based structures, one can think of TL as using a non-blocking transaction to overcome the need to understand the data-access relationships, while keeping the basic fine-grained locking structure of a lock per object or field.
  • A few more detailed differences are as follows.
  • Like OSTM (Object-based STM) or Hy™ (Hybrid™), TL associates a special coordination word with each transacted memory location. However, while STM systems like OSTM and Hy™ use this word as a pointer to a transaction record, TL uses it as a lock, as in the hand-crafted fine-locked structure. One immediate implication is a saving of a level of indireaction over STMs.
  • Unlike STMs, TL's rollback mechanism is simple and local. There are no transaction records, and the collected read-set and write-set is never shared with other threads.
  • OSTM derives a large part of its efficiency from the programmer's help in deciding when to “open” a transacted object for reading or writing. Without this help, it has been shown that OSTM's performance is rather poor. The TL transformation requires no programmer understanding of the data structure in order to make the transformation efficient. We believe it should not be difficult, given a simple set of constraints on program structure, to turn it into a straightforward mechanical transformation.
  • There is an inherent overhead of the general mechanical (and hence “dumb”) transformation when compared to hand-crafted code. For example, in Eraser's elegant fine-locked skiplist implementation he makes use of his understanding of the structure's semantics and the mechanics of his GC to allow list traversal to ignore locks on nodes since the traversal still works even if a node is concurrently removed. It is hard to imagine that a mechanical approach could be made to ignore the fact that a node is locked and might be removed from the list.
  • 2. The TL Algorithm
  • According to one aspect of this disclosure, we associate a special versioned-write-lock with every transacted memory location. In the example embodiment a versioned write-lock is a. simple spinlock that uses a compare-and-swap (CAS) operation to acquire the lock and a store to release it. Since one only needs a single bit to indicate that the lock number. This number is incremented by every successful lock-release.
  • We allocate a collection of versioned-write-locks. We use various schemes for associating locks with shared variables: per object (PO), where a lock is assigned per shared object, per stripe (PS), where we allocate a separate large array of locks and memory is stripped (divided up) using some hash function to map each location to a separate stripe, and per word (PW) where each transactionally referenced variable (word) is collocated adjacent to a lock. Other mappings between transactional shared variables and locks are possible. The PW and PO schemes require either manual or compiler assisted automatic insertion of lock fields whereas PS can be used with unmodified data structures. PO might be implemented, for instance, by leveraging the header words of Java™ objects. A single PS stripe-lock array may be shared and used for different TL data structures within a single address-space. For instance an application with two distinct TL red-black trees and three TL hash-tables could use a single PS array for all TL locks.
  • The following is a description of the PS algorithm although most of the details carry through verbatim for PO and PW as well. We maintain thread local read- and write-sets as linked lists. The read-set entries contain the address of the lock and the observed version number of the lock associated with the transactionally loaded variable. The write-set entries contain the address of the variable, the value to be written to the variable, and the address of the lock that “covers” the variable. The write-set is kept in chronological order to avoid write-after-write hazards.
  • We now describe how TL executes a sequential code fragment that was placed within a TL transaction. We later describe the limitations placed on the programmer in terms of structure of this code so as to allow it to be mechanically transformed into a TL transaction. The transaction proceeds through the code as follows:
  • 1. For every location read, read its lock value, and
  • (a) if it is not locked, add the lock's version number to the read-set. We use a safe load operation to avoid running off null pointers as a result of reading an inconsistent view of memory. Safe loads may be implemented with SPARC™ non-faulting loads or by a complicit user-level trap handler that skips over potentially trapping safe load instructions.
  • (b) if it is locked by another thread then we spin briefly. If the spin fails abort the transaction and retry.
  • 2. For every location to be written, record the location and the value to be written.
  • Upon completion of the pass through the code, reread the version numbers of all locations in the read-set.
  • 1. Attempt to acquire all locks in the write-set in ascending lock address order. Upon failing to acquire a lock, apply some type of backoff policy or abort and retry the transaction. A backoff policy could for example be to spin for a certain amount of time before re-attempting acquire the lock.
  • 2. Once all locks are acquired, re-read the locks of all read-set locations to make sure version numbers have not changed.
  • (a) If a location has changed, release locks, abort and retry the transaction.
  • (b) If not, perform stores in write set and release locks in any order. The transaction is complete.
  • The transaction's re-reading of all the locks of locations in the read set before attempting to acquire the locks is only a performance optimization. It is not required for correctness. Empirically we have found that many transactions fail due to modifications before locks are acquired. Pre-validating the lock versions in the write-set avoids acquired the locks for a transaction that is fated to abort. We note that spinning as a backoff policy does not introduce deadlocks because locks are acquired in ascending order. The above algorithm, which we call sort TL acquires locks in order. We have also experimented with algorithms that acquire locks as they are encountered TL and uses randomized backoff to avoid deadlock. The advantage of the latter is that the transacting thread does not need to search the read set for values of locations it updated since locations are updated “in place.”
  • 2.1 Intentionally Left Blank
  • 2.2 Mechanical Transformation
  • As we discussed earlier, the algorithm we describe can be added to code in a mechanical fashion, that is, without understanding anything about how the code works or what the program itself does. In our benchmarks, we performed the transformation by hand. We do however believe that
  • It should not be hard to automate this process and allow a compiler to perform the transformation given a few rather simple limitations on the code structure within a transaction.
  • 2.3 Software-Hardware Inter-Operability
  • Though we have described TL as a software based scheme, it can be made inter-operable with HTM systems on several levels.
  • In its simplest form, one can use static bounded size obstruction free hardware transaction to speed up software TL. This is done by using the hardware transactions to acquire the write locks of a TL transaction in order. Since the write set is know in advance, we require only static hard-ware transactions. Because for many data structures the number of writes is significantly smaller than the number of reads, it may well be that in most cases these hardware transactions can be bounded in size. If all write locks do not fit in a single hardware transaction, one can apply several of them in sequence using the same scheme we currently use to acquire individual locks, avoiding deadlock because the locations are acquired in ascending order.
  • One can also use TL as a hybrid backup mechanism to extend bounded size dynamic hardware transactions to arbitrary size. We can use a scheme similar to OSTM and Hy™ where instead of their object records, we use the versioned-write-lock associated with each location.
  • Hardware transactions need to verify for each location that they read or write that the associated versioned-write-lock is free. For every write they also need to update the version number of the associated stripe lock. This suffices to provide inter-operability between hardware and software transactions. Any software read will detect concurrent modifications of locations by a hardware writes because the version number of the associated lock will have changed. Any hardware transaction will fail if a concurrent software transaction is holding the lock to write. Software transactions attempting to write will also fail in acquiring a lock on aware synchronization operation (such as CAS or a single location transaction) which will fail if the version number of the location was modified by the hardware transaction.
  • 3.0 Remarks
  • 1. One goal of the present disclosure is to allow the programmer to convert coarse-grain locked data structures to TL so as to enjoy the benefits of parallelism. This can be helpful when transitioning to high-order parallelism with SMT/CMT processors such as Niagara™. One key attribute of TL is simplicity. It allows the programmer to extract additional parallelism but without unduly increasing the complexity of their code. The programmer can “think serially” but the code will “execute concurrently”.
  • For a given problem we deem TL successful if the resultant performance exceeds that of the original coarse-grain locked form. In many cases the TL form is competitive with the best-of-breed STM forms. That having been said, for any given problem a specialized, hand-coded, form written by a synchronization expert is likely to be faster than the TL form. An expert in synchronization, developing with concurrency in mind as 1st-order requirement, may be aware of relaxed data dependencies in the algorithm and take advantage of domain-specific advantages.
  • For example a red-black tree transformed with TL will out-perform a red-black protected by a naive lock. But an exotic ad-hoc red-black tree designed by concurrency experts and subject to considerable research, such as Hanke's red-black algorithm will generally outperform the TL-transformed red-black tree.
  • 2. Broadly, TL works by transform an operation protected by a coarse-grained lock into optimistic transactional form. We then implement the transactional infrastructure with fine-grain locks, enabling additional parallelism as the access patterns permit.
  • 3. OSTM works by opening and closing records for reading and writing. TL, in a sense, performs the open operations automatically at transactional load- and store-time but leaves the record open until commit time. TL has no way of knowing that prior loads executed within a transaction might have any bearing on results produced by transaction.
  • In such cases the load could safely be removed from the read-set but TL doesn't currently provide that capability. As such, the TL transaction admits exposed to false-positive failures.
  • Consider the following scenario where we have a TL-protected hash table. Thread T1 traverses a long hash bucket chain searching for a value associated with a certain key, iterating over “next” fields. We'll say that T1 locates the appropriate node at or near the end of the linked list. T2 concurrently deletes an unrelated node earlier in the same linked list. T2 commits. At commit-time T1 will abort because the linked-list “next” field written to by T2 is in T1's read-set. T1 must retry the lookup operation (ostensibly locating the same node). Given our domain-specific knowledge of the linked list we understand that the lookup and delete operations didn't really conflict and could have been allowed to operate concurrently with no aborts. A clever “hand over hand” ad-hoc hand-coded locking scheme would allow the desired concurrency.
  • 4. As described above TL admits live-lock failure. Consider where thread T1's read-set is A and its write-set is B. T2's read-set is B and write-set is A. T1 tries to commit and locks B. T2 tries to commit and acquires A. T1 validates A, in its read-set, and aborts as a Bis locked by T2. T2 validates B in its read-set and aborts as B was locked by T1. We have mutual abort with no progress. To improve “liveness” we use a back-off delay at abort-time, similar in spirit to that found in CSMA-CD MAC protocols. The delay interval is a function of (a) a random number generated at abort-time, (b) the length of the prior (aborted) write-set, and (c) the number of prior aborts for this transactional attempt.
  • 5. As described above, at commit-time the transactional mechanism will acquire write-set locks, validate the read-set, perform the write-back, and then release (and increment) the write-locks. Lock acquisition is accomplished with CAS and lock-release with a simple store. Given the availability of restricted capacity hardware transactional memory, such as will be available in Sun's forthcoming “ROCK” SPARC processor, we eliminate the CAS operations and try to acquire the locks in groups (replacing a set of CAS operations with a single ROCK hardware transaction).
  • In addition it is possible that the entire commit operation might be feasible as a single ROCK hardware transaction where the original application transaction was too big (too many loads and stores) to be feasible as a single ROCK transaction). The commit operation will be able to make an accurate estimate of ROCK-feasibility given that the size of the read-set and write-set are available (or cheap to compute) at commit-time. Finally, if the entire commit is feasible as a ROCK hardware transaction, we can avoid changing the lock word from unlocked, to locked, to unlocked (but incremented) by simply fetching the lock word at the start of the commit, verifying that it is unlocked, and then increasing the version sub-field at the end of the transaction, after the data writes are complete.
  • 6. Changes to non-transactional variables, such as automatic variables, must not be allowed to escape or “leak” out of an abort transaction. Where needed, the transactional infrastructure must log such changes and roll-back any updates at abort-time. Similarly, exceptions in aborted or doomed transactions must not propagate out of the transactional intra structure. The SXM scheme, where transactions are encapsulated in method calls, handily deals with this issue.
  • 7. All accesses to shared variables within a transformed TL critical section must be performed transactionally. Mixed-mode access can be unsafe. Transactions should not perform or initiate 10 or otherwise interface with non-transactional components. Transactions should not access device-memory (memory mapped devices) with transactional loads and stores as loads from device-memory are not necessarily idem potent and may have side effects.
  • 8. Under TL pure read operations don't require any store operations. This is important as stores to shared variables under typical snoop- or directory-based coherency protocols result can result in considerable coherency bus traffic. Such stores result in a local latency penalty scalability issues as the coherency traffic consumes precious bandwidth on the shared coherency bus.
  • 9. Write-locks are held for a brief time—just long enough to validate the read-set and write-back the deferred transactional stores.
  • 10. If a transaction acquires many distinct locks, it can suffer a local latency penalty as the CAS instruction is typically slow. A balance must be struck between lots of locks (and increased potential parallelism) and un-contended lock acquisition overhead. The mapping strategy between variables and locks is critical.
  • 11. As noted above, the PW scheme may suffer undue local CAS latency if many distinct write-locks must be acquired. One possible solution is add an indireaction-bit to the lock-word. When set, the lock-word contains a pointer to the actual lock. Multiple indireaction is not allowed. Objects are initialized so that the per-field lock words point to either a canonical non-indirect field lock within the same object, or to a lock that protects the entire data structure (e.g., the entire red-black tree or skip-list). Initially we have coarse-grain locking with a many: I relationship between locks fields and actual locks, but as we encounter contention we can convert automatically to fine-grain locking by replacing the indireaction pointer with a normal non-indirected lock value. For safety, only the currently lock-owner can “split” or upgrade the load from the in-directed form (coarse-grained) to a per-field lock (fine-grained). The transition is unidireactional—we never try to aggregate multiple fine-grain locks to refer to a single coarse-grain lock. The onset of contention (or more precisely, aborts caused by encountering a locked object) triggers splitting. When the contending thread eventually acquires the lock it can perform the split operation. By automatically splitting the locks and switching to finer grained locking we minimize the number of high-latency CAS operations needed to lock low-contention fields, but maximize the potential parallelism for operations that access high-contention fields.
  • One can the same optimization to PS, where the 1st lock in the array is a normal lock and all other locks are indirect locks, pointing to the 1st element.
  • 12. Broadly, TL operates better in environments with lower mutation rates (that is, where the store:load ratio is low). For example consider a red-black and a skip-list that are protected by a single lock and where the data structure is subject to many concurrent modifications. The relative speedup achieved with TL as compared to the classic lock will usually be higher with the skip-list than with the red-black tree as mutations to a skip list usually only require a few stores, where mutations to a red-black tree many require adjustments to the tree structure that require many stores.
  • 13. We claim that TL admits no schedules that were not already possible for the data structure as protected by the coarse-grained lock.
  • 14. TL could be used to implement the “atomic . . . ” construct where no lock is specified.
  • 15. Our example embodiment describes a 64-bit lock-word, partitioned into a single lock bit and a 63-bit version subfield. Assuming a 4 Ghz processor and a maximum update rate of I transaction per-clock, the version sub-field will overflow in 68 years. Other example embodiments allow for use of a 32-bit lock-word field. When a counter overflows, for instance, a so-called stop-the-world epoch might be used to stop all threads outside transactions. At that point no thread can have a previously fetched instance of the overflowed lock-word in its read-set; the lock-word version can safely be reset to 0. All threads can then be allowed to resume normal execution.
  • 16. Unlike some other STMs which incorporate and depend on their own garbage-collection mechanisms, TL allows the C programmer to use normal malloc( ) and free( ) operations to manage the lifecycle of structures containing transactionally accessed shared variables. The only requirement imposed by TL is that a structure being free( )-ed must be allowed to quiesce. That is, any pending transactional stores, detectable by checking the lock-bit in the associated locks, must be allowed to drain into the structure before the structure is freed. After the structure is quiesced it can be accessed with normal load and store operations outside the transactional framework.
  • 17. Concurrent mixed-mode transactional and non-transactional accesses are proscribed. When a particular object is being accessed with transactional load and store operations it must not be accessed with normal non-traditional load and store operations. (When any accesses to an object are transactional, all accesses must be transactional). An object can exit the transactional domain and subsequently be accessed with normal non-transactional loads and stores, we must sterilize the object before it escapes. To motivate the need for sterilization consider the following scenario. We have a linked list of 3 nodes identified by addresses A, B and C. A node contains Key, Value and Next fields. The data structure implements a traditional key-value mapping. The key-value map (the linked list) is protected by TL using PS. Node A's Key field contains “I”, its value field contains “1001” and its Next field refers to B. B's Key field contains “2”, its Value field contains “1002” and its Next field refers to C. C's Key field contains 3, the value field “1003” and its Next field is NULL. Thread T1 calls Set(“2”, “2002”). The TL-based Set( ) operator traverses the linked list using transactional loads and finds node B with a key value of “2”. T1 then executes a transactional store in a. Value to change “1002” to “2002”. T1's read-set consists of A.Key, A.Next, B.Key and the write-set consists of “B.Value.” T1 attempts to commit; it acquires the lock covering
  • B,Value and then validates that the previously fetched read-set is consistent by checking the version numbers in the locks convering the read-set. Thread T1 stalls. Thread T2 executes Delete(“2”). The DeleteO operator traverses the linked list and attempts to splice-out Node B by setting A.Next to C. T2 successfully commits. The commit operator stores C into A.Next.
  • T2's transaction completes. T2 then calls free(B). T1 resumes in the midst of its commit and stores into B.Value, We have a classic modify-after-free pathology. To avoid such problems T2 calls sterilize(B) after the commit finishes but before free( )ing B. This allows T1's latent transactional ST to drain into B before B is free( )ed and potentially reused. Note, however, that TL (using sterilization) did not admit any out-comes that were not already possible under the original coarse-grained lock.
  • 18. Consider the following problematic lifecycle based on the A, B, C linked list, above. Lets say we using TL in the “C” language to moderate concurrent access to the list, but with either PO or PW mode where the lock word(s) are embedded in the node. Thread T1 calls Set(“2”, “2002”). The TL-based Set( ) method traverse the list and locates node B having a key value of “2”. Thread T2 then calls Delete(“2”). The Delete( ) operator commits successfully. T2 sterilizes B and then calls free(B). The memory underlying B is recycled and used by some other thread T3. T1 attempts to commit by acquiring the lock covering B.Value. The lock-word is collocated with B. Value, so the CAS operation transiently change the lock-word contents. T2 then validates the read-set, recognizes that A.Next changed (because of T1's Delete( )) and aborts, restoring the original lock-word value. T1 has cause the memory word underlying the lock for B.value to “flicker”, however. Such modifications are unacceptable; we have a classic modify-after-free error.
  • As such, we advocate using PS for normal C code as the lock-words (metadata) are stored seperately in type-stable memory distinct from the data protected by the locks. This proviso can be relaxed if the C-code uses some type of garbage collection (such as Boehm-style conservative garbage collection for C, Maged-style hazard pointers or Fraser-stye Epoch-Based Reclamation) or type-stable storage for the nodes. For type-safe garbage collected managed runtime environments such as Java any of the mapping policies (PS, PO or PW) are safe. Relatedly, use-after-free errors are impossible in Java, so sterilization would be needed only for objects that escape the transactional domain and will subsequently be accessed with normal loads and stores.
  • Alternately, we could employ PO or PW with C-code but replace the embedded lock-words with immutable words that point to type-stable or immortal lock-words. Under PO, for instance, the object would contain an immutable field that points to some other lock-word. The field would be initialize to point to the associated lock-word either at object construction-time, or initialization could be deferred until the 1st transactional store or load.
  • 19. It is possible to use C++ operator overloading and template functions to interpose on all load and store operations for variables defined to be used in a transactional fashion. This approach obviates the need to explicitly call transactional load and store operators, making the set of modifications required to switch to TL much smaller.
  • 20. We previously described the PW, PO and PS schemes for associating variables with locks. More generally, TL might allow a skilled programmer to explicitly control the mapping by allowing the programmer to define a custom VariableToLock( ) function which takes a variable address as input and returns a lock address. The VariableToLock( ) function is optional.
  • 21. TL can easily be combined with STM interfaces or transactional infrastructures such as Herlihy's SXM.
  • 22. TL protects data accessed within a critical section. TL should not be used where a lock is used an execution barrier and shared data is accessed outside the lock. For instance lets say thread T1 acquires Lock A, and spawns thread T2, increments some global variable B and then releases A. T2 will acquire A, release A, and then increment B. Access to the shared variable B is protected by the lock, but the accesses are outside the critical section. In fact the critical section is empty and degenerate.
  • 23. Code that assumes memory barrier (fence)-equivalent semantics for lock and unlock should not be transformed with TL.
  • 24. We can extend the lock-word encoding from LOCKED/UN-LOCKED to READWRITE/READONLY/EXCLUSIVE as follows. READWRITE corresponds to UNLOCKED and EXCLUSIVE corresponds to LOCKED. The new state, READONLY, is an interim state used only at commit-time. The commit operator is modified to attempt to change all locks in the write-set from READ-WRITE to READONLY with CAS. The commit operator must spin if the lock is found to be in READ-ONLY or EXCLUSIVE state. Once the write-set locks have been made READONLY, the commit operator ratifies versions of the read-set locks and ensures that the read-set locks are in READWRITE state. If the read-set is invalid the commit operator restores the write-set locks to READWRITE and aborts the transaction. Otherwise the commit operator uses simple store operations to upgrade all the write-set locks from READONLY to EXCLUSIVE. The commit operator then writes back the deferred stores saved in the write-set and then releases the locks and increments the versions, changing (V.EXCLUSIVE) to V+I.READWRITE) with a single atomic store. Note that the upgrade to EXCLUSIVE, write-back, and release can be fused into a single loop that interates over the write-set in chronological order. This modification decreases the lock-hold time—that is the time that locks are in EXCLUSIVE state. Critically, if a lock is in READONLY state because of a commit operation being executed by thread T1, concurrent transactional loads performed by thread T2 is allowed to proceed. (That is, when a thread executing a commit has placed a lock in READ-ONLY state, concurrent transactional loads performed by other threads are allowed to proceed).
  • In yet another variation the commit operator would use CAS to try to change all the write-set locks from READWRITE to READONLY. Once in READONLY state commit would then use normal atomic stores to upgrade the locks from READONLY to EXCLUSIVE. The commit operator would then validate the read-set and, conditionally, write-back the deferred stores saved in the write-set and release the locks, incrementing the version subfields. This adaptation minimizes aggregate lock-hold times. Recall that CAS has high local latency even when successful. Consider a transaction containing stores to variables V1 and V2 covered by distinct locks W1 and W2. The basic commit operator, described earlier, uses CAS to lock W1 and then another CAS to lock W2. The hold-time for W1 is increased because of the latency of the CAS needed to acquire W2. The mechanism described here lessens the impact of CAS latency.
  • 25. Transactions may be nested by folding or “flattening” inner transactions into the outermost transaction. By nature, longer transactions have a higher chance of failing because of concurrent interference, however.
  • 4.0 Additional Embodiments
  • In furtherance of the above discussion, embodiments herein can operate in two modes which we will call encounter mode and commit mode. These modes indicate how locks are acquired and how transactions are committed or aborted. We will begin by further describing our commit mode algorithm, later explaining how TL operates in encounter mode.
  • We associate a special versioned-write-lock with every transacted memory location. A versioned-write-lock is a simple spin lock that uses a compare-and-swap (GAS) operation to acquire the lock and a store to release it. Since one only needs a single bit to indicate that the lock is taken, we use the rest of the lock word to hold a version number. This number is incremented by every successful lock-release. In encounter mode the version number is displaced and a pointer into a threads private undo log is installed.
  • We allocate a collection of versioned-write-locks. We use various schemes for associating locks with shared: per object (PO), where a lock is assigned per shared object, per stripe (PS), where we allocate a separate large array of locks and memory is stripped (divided up) using some hash function to map each location to a separate stripe, and per word(PW) where each transactionally referenced variable (word) is collocated adjacent to a lock. Other mappings between transactional shared variables and locks are possible. The PW and PO schemes require either manual or compiler assisted automatic put of lock fields whereas PS can be used with unmodified data structures. Since in general PO showed better performance than PW we will focus on PO and do not discuss PW further. PO might be implemented, for instance, by leveraging the header words of Java TM objects. A single PS stripe-lock array may be shared and used for different TL data structures within a single address-space. For instance an application with two distinct TL red-black trees and three TL hash-tables could use a single PS array for all TL locks. As our default mapping we chose an array of 220 entries of 32-bit lock words with the mapping function masking the variable address with “Ox3FFFFC” and then adding in the base address of the lock array to derive the lock address.
  • The following is a description of the PS algorithm although most of the details carry through verbatim for PO and PW as well. We maintain thread local read- and write-sets as linked lists. The read-set entries contain the address of the lock and the observed version number of the lock associated with the transactionally loaded variable. The write-set entries contain the address of the variable, the value to be written to the variable, and the address of the lock that “cover” the variable. The write-set is kept in chronological order to avoid write-after-write hazards.
  • 4.1 Commit Mode
  • We now describe how TL executes in commit mode a sequential code fragment that was placed within a TL transaction. As we explain, this mode does not require type-stable garbage collection, and works seamlessly with the memory life-cycle of languages like C and C++.
  • 1. Run the transactional code, reading the locks of all fetched-from shared locations and building a local read set and write-set (use a safe load operation to avoid running off null pointers as a result of reading an inconsistent view of memory).
  • A transactional load first checks (using a filter such as a Bloom filter) to see if the load address appears in the write-set, if so the transactional load returns the last value written to the address. This provides the illusion of processor consistency and avoids so-called read-after-write hazards. If the address is not found in the write-set the load operation then fetches the lock value associated with the variable, saving the version in the read-set, and then fetches from the actual shared variable. If the transactional load operation finds the variable locked the load may either spin until the lock is released or abort the operation.
  • Transactional stores to shared locations are handled by saving the address and value into the thread's local write-set. The shared variables are not modified during this step. That is, transactional stores are deferred and contingent upon successfully completing the transaction. During the operation of the transaction we periodically validate the read-set. If the read-set is found to be invalid we abort the transaction. This avoids the possibility of a doomed transaction (a transaction that has read inconsistent global state) from becoming trapped in an infinite loop.
  • 2. Attempt to commit the transaction. Acquire the locks of locations to be written. If a lock in the write-set (or more precisely a lock associated with a location in the write-set) also appears in the read-set then the acquire operation must atomically (a) acquire the lock and, (b) validate that the current lock version sub-field agrees with the version found in the earliest read-entry associated with that same lock. An atomic CAS can accomplish both (a) and (b). Acquire the locks in any convenient order using bounded spinning to avoid indefinite deadlock.
  • 3. Re-read the locks of all read-only locations to make sure version numbers haven't changed. If a version does not match, roll-back (release) the locks, abort the transaction, and retry.
  • 4. The prior observed reads in step (1) have been validated as forming an atomic snapshot of memory. The transaction is now committed. Write-back all the entries from the local write-set to the appropriate shared variables.
  • 5. Release all the locks identified in the write-set by atomically incrementing the version and clearing the write-lock bit (using a simple store).
  • A few things to note. The write-locks have been held for a brief time when attempting to commit the transaction. This helps improve performance under high contention. The Bloom filter allows us to determine if a value is not in the write set and need not be searched for by reading the single filter word. Though locks could have been acquired in ascending address order to avoid deadlock, we found that sorting the addresses in the write set was not worth the effort.
  • 4.2 Encounter Mode
  • The following is the TL encounter mode transaction. For reasons we explain later, this mode assumes a type-stable closed memory pool or garbage collection.
  • 1. Run the transactional code, reading the locks of all fetched-from shared locations and building a local read-set and write-set (the write set is an undo set of the values before the transactional writes).
  • Transactional stores to shared locations are handled by acquiring locks as the are encountered, saving the address and current value into the thread's local write-set, and pointing from the lock to the write-set entry. The shared variables are written with the new value during this step.
  • A transactional load checks to see if the lock is free or is held by the current transaction and if so reads the value from the location. There is thus no need to look for the value in the write set. If the transactional load operation finds that the lock is held it will spin. During the operation of the transaction we periodically validate the read-set. If the read-set is found to be invalid we abort the transaction. This avoids the possibility of a doomed transaction (a transaction that has read inconsistent global state) from becoming trapped in an infinite loop.
  • 2. Attempt to commit the transaction. Acquire the locks associated with the write-set in any convenient order, using bounded spinning to avoid deadlock.
  • 3. Re-read the locks of all read-only locations to make sure version numbers haven't changed. If a version does not match, restore the values using the write-set, roll-back (release) the locks, abort the transaction, and retry.
  • 4. The prior observed reads in step (1) have been validated as forming an atomic snapshot of memory. The transaction is now committed.
  • 5. Release all the locks identified in the write-set by atomically incrementing the version and clearing the write-lock bit.
  • We note that the locks in encounter mode are held for a longer duration than in commit mode, which accounts for weaker performance under contention. However, one does not need to look-aside and search through the write set for every read.
  • 4.3 Contention Management
  • As described above TL can admit a live-lock failure. Consider where thread T1's read-set is A and its write-set is B. T2's read-set is B and write-set is A. T1 tries to commit and locks B. T2 tries to commit and acquires A. T1 validates A, in its read-set, and aborts as a Bis locked by T2. T2 validates B in its read-set and aborts as B was locked by T1. We have mutual abort with no progress. To provide liveness we use bounded spin and a back-off delay at abort-time, similar in spirit to that found in CSMA-CD MAC protocols. The delay interval is a function of (a) a random number generated at abort-time, (b) the length of the prior (aborted) write-set, and (c) the number of prior aborts for this transactional attempt. It is important to note that unlike conventional methods, we found that we do not need mechanisms for one transaction to abort another to allow progress/liveness even in encounter mode.
  • These mechanisms are unnecessary for performance or deadlock avoidance, and in a sense contradict the very philosophy behind transactional locking: rather than trying to improve on hand-crafted lock-based implementations by being non-blocking (hand-crafted lock-based data structures are not obstruction free), we try and build lock-based STMs that will get us as close to their behavior as one can with a completely mechanical approach, that is, one that truly simplifies the job of the concurrent programmer.
  • 4.4 The Pathology of Transactional Memory Management
  • For type-safe garbage collected managed runtime environments such as Java any of the TL lock-mapping policies (PS, PO, or PW) and modes (Commit or Encounter) are safe, as the GC assures that transactionally accessed memory will only be released once no references remain to the object. In C or C++TL preferentially uses the PS/Commit locking scheme to allow the C programmer to use normal malloc( )and free( ) operations to manage the lifecycle of structures containing transactionally accessed shared variables.
  • Concurrent mixed-mode transactional and non-transactional accesses are proscribed. When a particular object is being accessed with transactional load and store operations it must not be accessed with normal non-transactional load and store operations. (When any accesses to an object are transactional, all accesses must be transactional). In PS/-Commit mode an object can exit the transactional domain and subsequently be accessed with normal non-transactional loads and stores, but we must wait for the object to quiesce before it leaves. There can be at most one transaction holding the transactional lock, and quiescing means waiting for that lock to be released, implying that all pending transactional stores to the location have been “drained”, before allowing the object to exit the transactional domain and subsequently to be accessed with normal load and store operations. Once it has quiesced, the memory can be freed and recycled in a normal fashion, because any transaction that may acquire the lock and reach the disconnected location will fail its read-set validation.
  • To motivate the need for quiescing, consider the following scenario with PS/Commit. We have a linked list of 3 nodes identified by addresses A, B and C. A node contains Key, Value and Next fields. The data structure implements a traditional key-value mapping. The key-value map (the linked list) is protected by TL using PS. Node A's Key field contains 1, its value field contains 1001 and its Next field refers to B. B's Key field contains 2, its Value field contains 1002 and its Next field refers to C. C's Key field contains 3, the value field 1003 and its Next field is NULL. Thread T1 calls put(2, 2002). The TL-based put( ) operator traverses the linked list using transactional loads and finds node B with a key value of 2. T1 then executes a transactional store into B.Value to change 1002 to 2002. T1's read-set consists of A.Key, A.Next, B.Key and the write-set consists of B.Value. T1 attempts to commit; it acquires the lock covering B.Value and then validates that the previously fetched read-set is consistent by checking the version numbers in the locks converging the read-set. Thread T1 stalls. Thread T2 executes delete(2). The delete( ) operator traverses the linked list and attempts to splice-out Node B by setting A.Next to C. T2 successfully commits. The commit operator stores C into A.Next. T2's transaction completes. T2 then calls free(B). T1 resumes in the midst of its commit and stores into B.Value. We have a classic modify-after-free pathology. To avoid such problems T2 calls quiesce(B) after the commit finishes but before free( )ing B. This allows T1's latent transactional ST to drain into B before B is free( )ed and potentially reused. Note, however, that TL (using quiescing) did not admit any outcomes that were not already possible under a simple coarse-grained lock. Any thread that attempts to write into B will, at commit-time, acquire the lock covering B, validate A.Next and then store into B. Once B has been unlinked there can be at most one thread that has successfully committed and is in the process of writing into B. Other transactions attempting to write into B will fail read-set validation at commit-time as A.Next has changed.
  • Consider another following problematic lifecycle scenario based on the A,B,C linked list, above. Lets say we're using TL in the C language to moderate concurrent access to the list, but with either PO or PW mode where the lock word(s) are embedded in the node. Thread T1 calls put(2, 2002). The TL-based put( ) method traverse the list and locates node B having a key value of 2. Thread T2 then calls delete(2). The delete( ) operator commits successfully. T2 waits for B to quiesce and then calls free(B). The memory underlying B is recycled and used by some other thread T3. T1 attempts to commit by acquiring the lock covering B.Value. The lock-word is collocated with B.Value, so the CAS operation transiently change the lock-word contents. T2 then validates the read-set, recognizes that A.Next changed (because of T1's delete( )) and aborts, restoring the original lock-word value. T1 has cause the memory word underlying the lock for B.value to “flicker”, however. Such modifications are unacceptable; we have a classic modify after-free error.
  • Finally, consider the following pathological scenario admitted by PS/Encounter. T1 calls put(2,2002). Put( ) traverses the list and locates node B. T2 then calls delete(2), commits successfully, calls quiesce(B) and free(B). T1 acquires the lock covering B.Value, saves the original B.Value (1002) into its private write undo log, and then stores 2002 into B.Value. Later, during read-set validation at commit time, T1 will discover that its read-set is invalid and abort, rolling back B.Value from 2002 to 1002. As above, this constitutes a modify-after-free pathology where B recycled, but B.Value transiently “flickered” from 1002 to 2002 to 1002. We can avoid this problem by enhancing the encounter protocol to validate the read-set after each lock acquisition but before storing into the shared variable. This confers safety, but at the cost of additional performance.
  • As such, we advocate using PS/Commit for normal C code as the lock-words (metadata) are stored separately in typestable memory distinct from the data protected by the locks. This provision can be relaxed if the C-code uses some type of garbage collection (such as Boehm-style conservative garbage collection for C, Michael-style hazard pointers or Fraser-stye Epoch-Based Reclamation) or type-stable storage for the nodes.
  • 4.5 Mechanical Transformation of Sequential Code
  • As we discussed earlier, the algorithm we describe can be added to code in a mechanical fashion, that is, without understanding anything about how the code works or what the program itself does. In our benchmarks, we performed the transformation by hand. We do however believe that it may be feasible to automate this process and allow a compiler to perform the transformation given a few rather simple limitations on the code structure within a transaction.
  • We note that hand-crafted data structures can always have an advantage over TL, as TL has no way of knowing that prior loads executed within a transaction might no longer have any bearing on results produced by transaction.
  • Consider the following scenario where we have a TL-protected hash table. Thread T1 traverses a long hash bucket chain searching for a the value associated with a certain key, iterating over “next” fields. We'll say that T1 locates the appropriate node at or near the end of the linked list. T2 concurrently deletes an unrelated node earlier in the same linked list. T2 commits. At commit-time T1 will abort because the linked-list “next” field written to by T2 is in T1's read-set. T1 must retry the lookup operation (ostensibly locating the same node). Given our domain-specific knowledge of the linked list we understand that the lookup and delete operations didn't really conflict and could have been allowed to operate concurrently with no aborts. A clever “hand over hand” ad-hoc hand-coded locking scheme would have the advantage of allowing this desired concurrency. Nevertheless, as our empirical analysis later in the paper shows, in the data structure we tested the beneficial effect of this added concurrency on overall application scalability does not seem to be as profound as one would think.
  • 4.6 Software-Hardware Inter-Operability
  • Though we have described TL as a software based scheme, it can be made inter-operable with HTM systems on several levels.
  • On a machine supporting dynamic hardware, transactions executed in hardware need only verify for each location that they read or write that the associated versioned-write-lock is free. There is no need for the hardware transaction to store an intermediate locked state into the lock word(s). For every write they also need to update the version number of the associated stripe lock upon completion. This suffices to provide inter-operability between hardware and software transactions. Any software read will detect concurrent modifications of locations by a hardware writes because the version number of the associated lock will have changed. Any hardware transaction will fail if a concurrent software transaction is holding the lock to write. Software transactions attempting to write will also fail in acquiring a lock on a location since lock acquisition is done using an atomic hard-ware synchronization operation (such as CAS or a single location transaction) which will fail if the version number of the location was modified by the hardware transaction.
  • One can also think of using a static bounded size obstruction-free hardware transaction to speed up software TL. This may be done variously by attempting to complete the entire commit operation with a single hardware transaction, or, alternately, by using hardware transactions to acquire the write locks “in bulk”. This latter approach is beneficial if bulk acquisition of the write-locks via hardware transactions is faster (has lower latency) than acquiring one write lock at a time with CAS. Since the write set is know in advance, we require only static hardware transactions. Because for many data structures the number of writes is significantly smaller than the number of reads, it may well be that in most cases these hardware transactions can be bounded in size. If all write locks do not fit in a single hardware transaction, one can apply several of them in sequence using the same scheme we currently use to acquire individual locks. However, as we report above, we found the relative contribution of the lock acquisition time to latency to be small, so it is not clear how much of a saving a hardware transaction will provide over the use of GAS operations.
  • One can also use TL as a hybrid backup mechanism to extend bounded size dynamic hardware transactions to arbitrary size. Again, our empirical testing suggests that there is not much of a gain in this approach.
  • While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application as defined by the appended claims. Such variations are covered by the scope of this present disclosure. As such, the foregoing description of embodiments of the present application is not intended to be limiting. Rather, any limitations to the invention are presented in the following claims. Note that the different embodiments disclosed herein can be combined or utilized individually with respect to each other.
  • We claim:

Claims (20)

1. A method comprising:
executing a transaction defined by a corresponding set of instructions to produce a respective transaction outcome based on use of at least one shared variable;
in lieu of locking and modifying a given shared variable during execution of the transaction, initiating a lock on the given shared variable after producing the respective transaction outcome via use of locally modified data values, the lock preventing other processes from modifying a data value associated with the given shared variable; and
after obtaining the lock, initiating a modification of the data value associated with the given shared variable even though at least one of the other processes performed a computation using the data value associated with the given shared variable before the lock and during execution of the transaction.
2. A method as in claim 1, wherein executing the transaction includes:
maintaining version information in a locally managed read set associated with the transaction, the read set not being accessible by the other processes using the shared variables, the read set identifying versions associated with each of multiple shared variables used to generate the respective transaction outcome, the version information indicating respective versions of the multiple shared variables at a time when the transaction retrieves respective data values associated with the multiple shared variables from a globally accessible repository.
3. A method as in claim 2, wherein executing the transaction further includes:
after acquiring the lock associated with the given shared variable and before modifying the data value associated with the given shared variable, verifying that newly read version information associated with each of the multiple shared variables used to generate the respective transaction outcome matches the version information in the locally managed read set associated with the transaction.
4. A method as in claim 3, wherein the newly read version information indicates that the data values associated with the multiple shared variables used to generate the transaction outcome have not been changed by the other processes during execution of the transaction to produce the respective transaction outcome.
5. A method as in claim 2, wherein initiating the lock includes:
identifying that another process has a respective lock on the given shared variable; and
utilizing a specified backoff time to acquire the lock on the given shared variable, the backoff time being a random value relative to the other processes that also attempt to acquire the lock associated with the given shared variable.
6. A method as in claim 1, wherein executing the transaction includes:
complying with a respective rule indicating size limitations associated with the transaction to enhance efficiency of multiple processes executing different transactions using a same set of shared variables including the given shared variable to produce respective transaction outcomes.
7. A method as in claim 1 further comprising:
maintaining version information associated with each of multiple shared variables, the version information indicating occurrences of data value changes associated with each of the multiple shared variables; and
wherein initiating the lock on the given shared variable includes:
if the given shared variable was read at any time during execution of the transaction, atomically: i) acquiring the lock on the shared variable, and ii) validating that a present version value associated with the given shared variable matches a previous version value of the given shared variable when read during execution of the transaction.
8. A method as in claim 1 further comprising:
in response to identifying that a corresponding data value associated with the at least one shared variable was modified during execution of the transaction, aborting the transaction in lieu of modifying the data value associated with the given shared variable; and
initiating execution of the transaction again to produce the respective transaction outcome.
9. A method as in claim 1 further comprising:
maintaining a locally managed and accessible write set of data values associated with each of multiple shared variables that are locally but not globally modified during execution of the transaction, the local write set representing data values: i) not yet globally committed and ii) not yet globally accessible by the other processes.
10. A method as in claim 9 further comprising:
after completing execution of the transaction, initiating locks on each of the multiple shared variables specified in the write set which were modified during execution of the transaction, the locks preventing the other processes from changing data values associated with the multiple shared variables.
11. A method as in claim 10 further comprising:
utilizing a hash-based filter function during execution of the transaction to identify whether a corresponding data value associated with a respective globally accessible variable already exists locally in the write set and should be modified in lieu of performing a respective read to globally accessible shared data.
12. A method as in claim 1 further comprising:
after the modification of the data value associated with the given shared variable in a global environment accessible by the other processes, incrementing globally accessible version information associated with the shared variable to indicate that the given shared variable has been modified.
13. A method as in claim 1 further comprising:
initiating a compare function to verify that the at least one shared variable has not been modified during execution of the corresponding set of instructions prior to initiating the lock on the given shared variable; and
aborting execution of the transaction if the at least one shared variable has been modified.
14. A method as in claim 1, wherein steps of executing the transaction, initiating the lock, and initiating the modification are carried out in software, the method further comprising
utilizing hardware transactional memory as an accelerator for executing the transaction.
15. A method as in claim 1 further comprising:
maintaining a locally managed and accessible write set of data values associated with each of multiple shared variables that are locally but not globally modified during execution of the transaction, the local write set representing data values: i) not yet globally committed and ii) not yet globally accessible by the other processes;
initiating locks on each of the multiple shared variables specified in the write set which were modified during execution of the transaction to prevent the other processes from changing data values associated with the multiple shared variables;
verifying that respective data values associated with the multiple shared variables accessed during the transaction have not been globally modified by the other processes during execution of the transaction by checking that respective version values associated with the multiple shared variables have not changed during execution of the transaction; and
after modifying data values associated with the multiple shared variables, releasing the locks on each of the multiple shared variables.
16. A method comprising:
maintaining segments of information that are shared by multiple processes executing in parallel;
for each of at least two of the segments, maintaining a corresponding location to store a respective version value representing a relative version of a respective segment, the relative version being changed each time contents of the respective segment is modified; and
enabling the multiple processes to compete and secure an exclusive access lock with respect to each of the at least two segments to prevent other processes from modifying a respective locked segment.
17. A method as in claim 16 further comprising:
for each of at least two of the segments, maintaining a corresponding location to store globally accessible lock information indicating whether one of the multiple processes executing in parallel has locked a respective segment for: i) changing a respective data value therein, and ii) preventing other processes from reading respective data values from the respective segment; and
enabling the multiple processes to retrieve version information associated with the respective at least two segments to identify whether contents of a respective segment have changed over time.
18. A method comprising:
in a given process of multiple processes executing in parallel:
maintaining a locally managed write set of data values associated with globally accessible shared variables, the locally managed write set accessible only by the given process, the globally accessible shared variables accessible by the multiple processes;
while executing a transaction including multiple instructions, modifying data values associated with the locally managed write set in lieu of modifying the globally accessible shared variables; and
after completion of execution of the transaction, initiating locks on each of the globally accessible shared variables specified in the write set in order to: i) prevent other processes from changing data values associated with respective locked shared variables and ii) commit data values in the locally managed write set to the globally accessible shared variables.
19. A method comprising:
performing at least one transactional access to segments of information in transactional memory that are shared by multiple processes executing in parallel; and
competing amongst multiple other processes to secure an exclusive access lock with respect to a segment in the transactional memory to prevent other processes from modifying a respective locked segment, use of respective access locks enabling transactional memory to interoperate with any malloc and free operations.
20. A method as in claim 19 further comprising:
utilizing a hash-based filter function during execution of a respective transaction to identify whether a corresponding data value associated with a respective globally accessible variable already exists locally in a write set, the write set being a scratchpad for temporarily maintaining data values locally in lieu of modifying the data values in the transactional memory.
US11/475,716 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions Abandoned US20070198979A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/475,716 US20070198979A1 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US77558006P 2006-02-22 2006-02-22
US77556406P 2006-02-22 2006-02-22
US78948306P 2006-04-05 2006-04-05
US11/475,716 US20070198979A1 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions

Publications (1)

Publication Number Publication Date
US20070198979A1 true US20070198979A1 (en) 2007-08-23

Family

ID=38429749

Family Applications (5)

Application Number Title Priority Date Filing Date
US11/475,262 Active 2028-03-20 US8065499B2 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions
US11/475,604 Abandoned US20070198978A1 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions
US11/475,716 Abandoned US20070198979A1 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions
US11/475,814 Active 2028-01-05 US7669015B2 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions
US11/488,618 Active 2027-04-25 US7496716B2 (en) 2006-02-22 2006-07-18 Methods and apparatus to implement parallel transactions

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US11/475,262 Active 2028-03-20 US8065499B2 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions
US11/475,604 Abandoned US20070198978A1 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions

Family Applications After (2)

Application Number Title Priority Date Filing Date
US11/475,814 Active 2028-01-05 US7669015B2 (en) 2006-02-22 2006-06-27 Methods and apparatus to implement parallel transactions
US11/488,618 Active 2027-04-25 US7496716B2 (en) 2006-02-22 2006-07-18 Methods and apparatus to implement parallel transactions

Country Status (1)

Country Link
US (5) US8065499B2 (en)

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070299864A1 (en) * 2006-06-24 2007-12-27 Mark Strachan Object storage subsystem computer program
US20080005498A1 (en) * 2006-06-09 2008-01-03 Sun Microsystems, Inc. Method and system for enabling a synchronization-free and parallel commit phase
US20080092140A1 (en) * 2006-10-03 2008-04-17 Doninger Cheryl G Systems and methods for executing a computer program in a multi-processor environment
US20080098374A1 (en) * 2006-09-29 2008-04-24 Ali-Reza Adl-Tabatabai Method and apparatus for performing dynamic optimization for software transactional memory
US20090031309A1 (en) * 2007-07-27 2009-01-29 Yosef Lev System and Method for Split Hardware Transactions
US20090133032A1 (en) * 2007-11-21 2009-05-21 Stuart David Biles Contention management for a hardware transactional memory
US20090132563A1 (en) * 2007-11-19 2009-05-21 Sun Microsystems, Inc. Simple optimistic skiplist
US20090172306A1 (en) * 2007-12-31 2009-07-02 Nussbaum Daniel S System and Method for Supporting Phased Transactional Memory Modes
US20090183159A1 (en) * 2008-01-11 2009-07-16 Michael Maged M Managing concurrent transactions using bloom filters
US20100004930A1 (en) * 2008-07-02 2010-01-07 Brian Strope Speech Recognition with Parallel Recognition Tasks
US20100100885A1 (en) * 2008-10-20 2010-04-22 Microsoft Corporation Transaction processing for side-effecting actions in transactional memory
US20100100689A1 (en) * 2008-10-20 2010-04-22 Microsoft Corporation Transaction processing in transactional memory
US20100169895A1 (en) * 2008-12-29 2010-07-01 David Dice Method and System for Inter-Thread Communication Using Processor Messaging
US20100228929A1 (en) * 2009-03-09 2010-09-09 Microsoft Corporation Expedited completion of a transaction in stm
US20100332765A1 (en) * 2009-06-29 2010-12-30 Sun Microsystems, Inc. Hierarchical bloom filters for facilitating concurrency control
US20100332807A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Performing escape actions in transactions
US20100332808A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Minimizing code duplication in an unbounded transactional memory system
US20100332771A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Private memory regions and coherence optimizations
US20100332721A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Operating system virtual memory management for hardware transactional memory
US20100332538A1 (en) * 2009-06-30 2010-12-30 Microsoft Corporation Hardware accelerated transactional memory system with open nested transactions
US20110145498A1 (en) * 2009-12-15 2011-06-16 Microsoft Corporation Instrumentation of hardware assisted transactional memory system
US20110145304A1 (en) * 2009-12-15 2011-06-16 Microsoft Corporation Efficient garbage collection and exception handling in a hardware accelerated transactional memory system
US20110145516A1 (en) * 2007-06-27 2011-06-16 Ali-Reza Adl-Tabatabai Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata
US20110145553A1 (en) * 2009-12-15 2011-06-16 Microsoft Corporation Accelerating parallel transactions using cache resident transactions
US20110246428A1 (en) * 2010-04-01 2011-10-06 Research In Motion Limited Method for communicating device management data changes
US8161247B2 (en) 2009-06-26 2012-04-17 Microsoft Corporation Wait loss synchronization
US20120117317A1 (en) * 2009-08-20 2012-05-10 Rambus Inc. Atomic memory device
US20120246662A1 (en) * 2011-03-23 2012-09-27 Martin Vechev Automatic Verification of Determinism for Parallel Programs
US20120254139A1 (en) * 2009-04-22 2012-10-04 Microsoft Corporation Providing lock-based access to nodes in a concurrent linked list
US20120310987A1 (en) * 2011-06-03 2012-12-06 Aleksandar Dragojevic System and Method for Performing Memory Management Using Hardware Transactions
US20130018860A1 (en) * 2007-09-18 2013-01-17 Microsoft Corporation Parallel nested transactions in transactional memory
US8370577B2 (en) 2009-06-26 2013-02-05 Microsoft Corporation Metaphysically addressed cache metadata
US8402061B1 (en) 2010-08-27 2013-03-19 Amazon Technologies, Inc. Tiered middleware framework for data storage
US8412691B2 (en) 2010-09-10 2013-04-02 International Business Machines Corporation Maintenance and access of a linked list
US8510304B1 (en) 2010-08-27 2013-08-13 Amazon Technologies, Inc. Transactionally consistent indexing for data blobs
US8510344B1 (en) 2010-08-27 2013-08-13 Amazon Technologies, Inc. Optimistically consistent arbitrary data blob transactions
US8539465B2 (en) 2009-12-15 2013-09-17 Microsoft Corporation Accelerating unbounded memory transactions using nested cache resident transactions
US8555161B2 (en) 2010-05-27 2013-10-08 Microsoft Corporation Concurrent editing of a document by multiple clients
US8621161B1 (en) 2010-09-23 2013-12-31 Amazon Technologies, Inc. Moving data between data stores
US8688666B1 (en) 2010-08-27 2014-04-01 Amazon Technologies, Inc. Multi-blob consistency for atomic data transactions
US8838908B2 (en) * 2007-06-27 2014-09-16 Intel Corporation Using ephemeral stores for fine-grained conflict detection in a hardware accelerated STM
US20140289739A1 (en) * 2013-03-20 2014-09-25 Hewlett-Packard Development Company, L.P. Allocating and sharing a data object among program instances
US8856089B1 (en) * 2010-08-27 2014-10-07 Amazon Technologies, Inc. Sub-containment concurrency for hierarchical data containers
US8862561B1 (en) * 2012-08-30 2014-10-14 Google Inc. Detecting read/write conflicts
US20150113535A1 (en) * 2012-05-31 2015-04-23 Hitachi, Ltd. Parallel data processing system, computer, and parallel data processing method
US20150277993A1 (en) * 2012-12-14 2015-10-01 Huawei Technologies Co., Ltd. Task Processing Method and Virtual Machine
US20160011912A1 (en) * 2014-07-10 2016-01-14 Oracle International Corporation Process scheduling and execution in distributed computing environments
US20160098294A1 (en) * 2014-10-01 2016-04-07 Red Hat, Inc. Execution of a method at a cluster of nodes
US20160147827A1 (en) * 2010-04-08 2016-05-26 Microsoft Technology Licensing, Llc In-memory database system
US9542164B1 (en) * 2011-03-02 2017-01-10 The Mathworks, Inc. Managing an application variable using variable attributes
US20170083237A1 (en) * 2015-09-23 2017-03-23 Hanan Potash Computing device with frames/bins structure, mentor layer and plural operand processing
WO2017053828A1 (en) * 2015-09-23 2017-03-30 Hanan Potash Computing device with frames/bins structure, mentor layer and plural operand processing
US9652440B2 (en) 2010-05-27 2017-05-16 Microsoft Technology Licensing, Llc Concurrent utilization of a document by multiple threads
US9690507B2 (en) 2015-07-15 2017-06-27 Innovium, Inc. System and method for enabling high read rates to data element lists
US9753660B2 (en) 2015-07-15 2017-09-05 Innovium, Inc. System and method for implementing hierarchical distributed-linked lists for network devices
US9767014B2 (en) 2015-07-15 2017-09-19 Innovium, Inc. System and method for implementing distributed-linked lists for network devices
US9785367B2 (en) 2015-07-15 2017-10-10 Innovium, Inc. System and method for enabling high read rates to data element lists
CN107844385A (en) * 2017-11-08 2018-03-27 北京潘达互娱科技有限公司 A kind of variable read-write method and device based on shared drive
US9977693B2 (en) 2015-09-23 2018-05-22 Hanan Potash Processor that uses plural form information
US10067878B2 (en) 2015-09-23 2018-09-04 Hanan Potash Processor with logical mentor
US10095641B2 (en) 2015-09-23 2018-10-09 Hanan Potash Processor with frames/bins structure in local high speed memory
US10140122B2 (en) 2015-09-23 2018-11-27 Hanan Potash Computer processor with operand/variable-mapped namespace
US10140021B2 (en) * 2015-12-23 2018-11-27 Netapp, Inc. Adaptive data-partitioning model that responds to observed workload
US10430187B2 (en) * 2017-08-15 2019-10-01 Oracle International Corporation Persistent transactional memory metadata-based buffer caches
US10467198B2 (en) * 2016-09-15 2019-11-05 Oracle International Corporation Network partition tolerance in a high available centralized VCS implementation
US10528479B2 (en) * 2017-06-02 2020-01-07 Huawei Technologies Co., Ltd. Global variable migration via virtual memory overlay technique for multi-version asynchronous dynamic software update
US11093286B2 (en) 2016-04-26 2021-08-17 Hanan Potash Computing device with resource manager and civilware tier
US11409559B2 (en) * 2019-10-24 2022-08-09 EMC IP Holding Company, LLC System and method for weak lock allowing force preemption by high priority thread
US11593275B2 (en) 2021-06-01 2023-02-28 International Business Machines Corporation Operating system deactivation of storage block write protection absent quiescing of processors

Families Citing this family (137)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7362762B2 (en) * 2003-11-12 2008-04-22 Cisco Technology, Inc. Distributed packet processing with ordered locks to maintain requisite packet orderings
US7626987B2 (en) * 2003-11-12 2009-12-01 Cisco Technology, Inc. Using ordered locking mechanisms to maintain sequences of items such as packets
US7551617B2 (en) 2005-02-08 2009-06-23 Cisco Technology, Inc. Multi-threaded packet processing architecture with global packet memory, packet recirculation, and coprocessor
US7739426B1 (en) 2005-10-31 2010-06-15 Cisco Technology, Inc. Descriptor transfer logic
US7877565B1 (en) * 2006-01-31 2011-01-25 Nvidia Corporation Constant versioning for multi-threaded processing
US7861093B2 (en) * 2006-08-30 2010-12-28 International Business Machines Corporation Managing data access via a loop only if changed locking facility
US7610448B2 (en) * 2006-12-27 2009-10-27 Intel Corporation Obscuring memory access patterns
US8719807B2 (en) * 2006-12-28 2014-05-06 Intel Corporation Handling precompiled binaries in a hardware accelerated software transactional memory system
US20080229062A1 (en) * 2007-03-12 2008-09-18 Lorenzo Di Gregorio Method of sharing registers in a processor and processor
JP4444305B2 (en) * 2007-03-28 2010-03-31 株式会社東芝 Semiconductor device
US7908255B2 (en) * 2007-04-11 2011-03-15 Microsoft Corporation Transactional memory using buffered writes and enforced serialization order
US7962456B2 (en) * 2007-06-27 2011-06-14 Microsoft Corporation Parallel nested transactions in transactional memory
US7991956B2 (en) * 2007-06-27 2011-08-02 Intel Corporation Providing application-level information for use in cache management
US7890707B2 (en) * 2007-06-27 2011-02-15 Microsoft Corporation Efficient retry for transactional memory
US7991967B2 (en) * 2007-06-29 2011-08-02 Microsoft Corporation Using type stability to facilitate contention management
US7698504B2 (en) * 2007-07-03 2010-04-13 Oracle America, Inc. Cache line marking with shared timestamps
US7890725B2 (en) * 2007-07-09 2011-02-15 International Business Machines Corporation Bufferless transactional memory with runahead execution
US7996621B2 (en) * 2007-07-12 2011-08-09 International Business Machines Corporation Data cache invalidate with data dependent expiration using a step value
US7840530B2 (en) * 2007-09-18 2010-11-23 Microsoft Corporation Parallel nested transactions in transactional memory
US8156314B2 (en) * 2007-10-25 2012-04-10 Advanced Micro Devices, Inc. Incremental state updates
US8055879B2 (en) * 2007-12-13 2011-11-08 International Business Machines Corporation Tracking network contention
US8706982B2 (en) * 2007-12-30 2014-04-22 Intel Corporation Mechanisms for strong atomicity in a transactional memory system
US7904668B2 (en) * 2007-12-31 2011-03-08 Oracle America, Inc. Optimistic semi-static transactional memory implementations
US20090187906A1 (en) * 2008-01-23 2009-07-23 Sun Microsystems, Inc. Semi-ordered transactions
CN104123239B (en) * 2008-01-31 2017-07-21 甲骨文国际公司 system and method for transactional cache
GB2457341B (en) * 2008-02-14 2010-07-21 Transitive Ltd Multiprocessor computing system with multi-mode memory consistency protection
US9128750B1 (en) * 2008-03-03 2015-09-08 Parakinetics Inc. System and method for supporting multi-threaded transactions
US8200947B1 (en) * 2008-03-24 2012-06-12 Nvidia Corporation Systems and methods for voting among parallel threads
US9225545B2 (en) 2008-04-01 2015-12-29 International Business Machines Corporation Determining a path for network traffic between nodes in a parallel computer
US8930644B2 (en) * 2008-05-02 2015-01-06 Xilinx, Inc. Configurable transactional memory for synchronizing transactions
US9367363B2 (en) * 2008-05-12 2016-06-14 Oracle America, Inc. System and method for integrating best effort hardware mechanisms for supporting transactional memory
US8533663B2 (en) * 2008-05-12 2013-09-10 Oracle America, Inc. System and method for utilizing available best effort hardware mechanisms for supporting transactional memory
US8139488B2 (en) * 2008-05-30 2012-03-20 Cisco Technology, Inc. Cooperative flow locks distributed among multiple components
US8589358B2 (en) * 2008-06-26 2013-11-19 Emc Corporation Mechanisms to share attributes between objects
US9047139B2 (en) * 2008-06-27 2015-06-02 Microsoft Technology Licensing, Llc Primitives for software transactional memory
US20100017581A1 (en) * 2008-07-18 2010-01-21 Microsoft Corporation Low overhead atomic memory operations
US8479166B2 (en) * 2008-08-25 2013-07-02 International Business Machines Corporation Detecting locking discipline violations on shared resources
US20100057740A1 (en) * 2008-08-26 2010-03-04 Yang Ni Accelerating a quiescence process of transactional memory
US7941616B2 (en) * 2008-10-21 2011-05-10 Microsoft Corporation System to reduce interference in concurrent programs
US8789057B2 (en) * 2008-12-03 2014-07-22 Oracle America, Inc. System and method for reducing serialization in transactional memory using gang release of blocked threads
US8914620B2 (en) * 2008-12-29 2014-12-16 Oracle America, Inc. Method and system for reducing abort rates in speculative lock elision using contention management mechanisms
US8103838B2 (en) * 2009-01-08 2012-01-24 Oracle America, Inc. System and method for transactional locking using reader-lists
US8627292B2 (en) * 2009-02-13 2014-01-07 Microsoft Corporation STM with global version overflow handling
US8688921B2 (en) * 2009-03-03 2014-04-01 Microsoft Corporation STM with multiple global version counters
US9418175B2 (en) * 2009-03-31 2016-08-16 Microsoft Technology Licensing, Llc Enumeration of a concurrent data structure
US20120110291A1 (en) * 2009-04-06 2012-05-03 Kaminario Technologies Ltd. System and method for i/o command management
US8413136B2 (en) 2009-05-08 2013-04-02 Microsoft Corporation Application virtualization
US8495103B2 (en) * 2009-05-29 2013-07-23 Red Hat, Inc. Method and apparatus for determining how to transform applications into transactional applications
US8429606B2 (en) 2009-05-29 2013-04-23 Red Hat, Inc. Transactional object container
US8973004B2 (en) * 2009-06-26 2015-03-03 Oracle America, Inc. Transactional locking with read-write locks in transactional memory systems
KR101370314B1 (en) * 2009-06-26 2014-03-05 인텔 코포레이션 Optimizations for an unbounded transactional memory (utm) system
US8302105B2 (en) * 2009-06-26 2012-10-30 Oracle America, Inc. Bulk synchronization in transactional memory systems
US8542247B1 (en) 2009-07-17 2013-09-24 Nvidia Corporation Cull before vertex attribute fetch and vertex lighting
US8564616B1 (en) 2009-07-17 2013-10-22 Nvidia Corporation Cull before vertex attribute fetch and vertex lighting
JP4917138B2 (en) * 2009-10-07 2012-04-18 インターナショナル・ビジネス・マシーンズ・コーポレーション Object optimum arrangement device, object optimum arrangement method, and object optimum arrangement program
US8976195B1 (en) 2009-10-14 2015-03-10 Nvidia Corporation Generating clip state for a batch of vertices
US8384736B1 (en) 2009-10-14 2013-02-26 Nvidia Corporation Generating clip state for a batch of vertices
US8595446B2 (en) * 2009-11-25 2013-11-26 Oracle America, Inc. System and method for performing dynamic mixed mode read validation in a software transactional memory
US9529839B2 (en) * 2009-12-07 2016-12-27 International Business Machines Corporation Applying limited-size hardware transactional memory to arbitrarily large data structure
US8375175B2 (en) * 2009-12-09 2013-02-12 Oracle America, Inc. Fast and efficient reacquisition of locks for transactional memory systems
US8396831B2 (en) * 2009-12-18 2013-03-12 Microsoft Corporation Optimistic serializable snapshot isolation
US8356007B2 (en) 2010-10-20 2013-01-15 Microsoft Corporation Distributed transaction management for database systems with multiversioning
US8645650B2 (en) * 2010-01-29 2014-02-04 Red Hat, Inc. Augmented advisory lock mechanism for tightly-coupled clusters
US8943502B2 (en) * 2010-03-15 2015-01-27 International Business Machines Corporation Retooling lock interfaces for using a dual mode reader writer lock
US8402227B2 (en) * 2010-03-31 2013-03-19 Oracle International Corporation System and method for committing results of a software transaction using a hardware transaction
US9965387B1 (en) * 2010-07-09 2018-05-08 Cypress Semiconductor Corporation Memory devices having embedded hardware acceleration and corresponding methods
US8782147B2 (en) * 2010-09-09 2014-07-15 Red Hat, Inc. Concurrent delivery for messages from a same sender
US8949453B2 (en) 2010-11-30 2015-02-03 International Business Machines Corporation Data communications in a parallel active messaging interface of a parallel computer
US8788474B2 (en) * 2010-12-17 2014-07-22 Red Hat, Inc. Inode event notification for cluster file systems
WO2012117389A1 (en) * 2011-02-28 2012-09-07 Dsp Group Ltd. A method and an apparatus for coherency control
US8898390B2 (en) 2011-03-08 2014-11-25 Intel Corporation Scheduling workloads based on cache asymmetry
JP5218585B2 (en) * 2011-03-15 2013-06-26 オムロン株式会社 Control device and system program
US8484235B2 (en) 2011-03-28 2013-07-09 International Business Machines Corporation Dynamically switching the serialization method of a data structure
US8924930B2 (en) 2011-06-28 2014-12-30 Microsoft Corporation Virtual machine image lineage
US8949328B2 (en) 2011-07-13 2015-02-03 International Business Machines Corporation Performing collective operations in a distributed processing system
US10048990B2 (en) * 2011-11-19 2018-08-14 International Business Machines Corporation Parallel access of partially locked content of input file
BR112014014414A2 (en) * 2011-12-14 2017-06-13 Optis Cellular Tech Llc Temporary storage resource management method and telecommunication equipment
JP5861445B2 (en) * 2011-12-21 2016-02-16 富士ゼロックス株式会社 Information processing apparatus and information processing program
US8930962B2 (en) 2012-02-22 2015-01-06 International Business Machines Corporation Processing unexpected messages at a compute node of a parallel computer
WO2013175858A1 (en) * 2012-05-23 2013-11-28 日本電気株式会社 Lock management system, lock management method, and lock management program
US8688661B2 (en) 2012-06-15 2014-04-01 International Business Machines Corporation Transactional processing
US8966324B2 (en) 2012-06-15 2015-02-24 International Business Machines Corporation Transactional execution branch indications
US10437602B2 (en) 2012-06-15 2019-10-08 International Business Machines Corporation Program interruption filtering in transactional execution
US20130339680A1 (en) * 2012-06-15 2013-12-19 International Business Machines Corporation Nontransactional store instruction
US9448796B2 (en) 2012-06-15 2016-09-20 International Business Machines Corporation Restricted instructions in transactional execution
US9336046B2 (en) 2012-06-15 2016-05-10 International Business Machines Corporation Transaction abort processing
US9367323B2 (en) 2012-06-15 2016-06-14 International Business Machines Corporation Processor assist facility
US8880959B2 (en) 2012-06-15 2014-11-04 International Business Machines Corporation Transaction diagnostic block
US9772854B2 (en) 2012-06-15 2017-09-26 International Business Machines Corporation Selectively controlling instruction execution in transactional processing
US8682877B2 (en) 2012-06-15 2014-03-25 International Business Machines Corporation Constrained transaction execution
US9317460B2 (en) 2012-06-15 2016-04-19 International Business Machines Corporation Program event recording within a transactional environment
US9384004B2 (en) 2012-06-15 2016-07-05 International Business Machines Corporation Randomized testing within transactional execution
US9436477B2 (en) 2012-06-15 2016-09-06 International Business Machines Corporation Transaction abort instruction
US9740549B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US9348642B2 (en) 2012-06-15 2016-05-24 International Business Machines Corporation Transaction begin/end instructions
US9361115B2 (en) 2012-06-15 2016-06-07 International Business Machines Corporation Saving/restoring selected registers in transactional processing
US9442737B2 (en) 2012-06-15 2016-09-13 International Business Machines Corporation Restricting processing within a processor to facilitate transaction completion
US9298632B2 (en) * 2012-06-28 2016-03-29 Intel Corporation Hybrid cache state and filter tracking of memory operations during a transaction
US9542237B2 (en) * 2012-09-04 2017-01-10 Red Hat Israel, Ltd. Shared locking for storage centric exclusive locks
US9547594B2 (en) 2013-03-15 2017-01-17 Intel Corporation Instructions to mark beginning and end of non transactional code region requiring write back to persistent storage
US9384257B2 (en) 2013-06-24 2016-07-05 International Business Machines Corporation Providing multiple concurrent transactions on a single database schema using a single concurrent transaction database infrastructure
US20150074219A1 (en) * 2013-07-12 2015-03-12 Brocade Communications Systems, Inc. High availability networking using transactional memory
WO2015009275A1 (en) * 2013-07-15 2015-01-22 Intel Corporation Improved transactional memory management techniques
US9817771B2 (en) 2013-08-20 2017-11-14 Synopsys, Inc. Guarded memory access in a multi-thread safe system level modeling simulation
US9588801B2 (en) 2013-09-11 2017-03-07 Intel Corporation Apparatus and method for improved lock elision techniques
US9292444B2 (en) 2013-09-26 2016-03-22 International Business Machines Corporation Multi-granular cache management in multi-processor computing environments
US9298623B2 (en) 2013-09-26 2016-03-29 Globalfoundries Inc. Identifying high-conflict cache lines in transactional memory computing environments
US9086974B2 (en) 2013-09-26 2015-07-21 International Business Machines Corporation Centralized management of high-contention cache lines in multi-processor computing environments
US9329890B2 (en) 2013-09-26 2016-05-03 Globalfoundries Inc. Managing high-coherence-miss cache lines in multi-processor computing environments
US9298626B2 (en) 2013-09-26 2016-03-29 Globalfoundries Inc. Managing high-conflict cache lines in transactional memory computing environments
US9367504B2 (en) 2013-12-20 2016-06-14 International Business Machines Corporation Coherency overcommit
US9971627B2 (en) 2014-03-26 2018-05-15 Intel Corporation Enabling maximum concurrency in a hybrid transactional memory system
US10013351B2 (en) 2014-06-27 2018-07-03 International Business Machines Corporation Transactional execution processor having a co-processor accelerator, both sharing a higher level cache
US9477481B2 (en) 2014-06-27 2016-10-25 International Business Machines Corporation Accurate tracking of transactional read and write sets with speculation
US9772944B2 (en) 2014-06-27 2017-09-26 International Business Machines Corporation Transactional execution in a multi-processor environment that monitors memory conflicts in a shared cache
US9703718B2 (en) 2014-06-27 2017-07-11 International Business Machines Corporation Managing read tags in a transactional memory
US9740614B2 (en) 2014-06-27 2017-08-22 International Business Machines Corporation Processor directly storing address range of co-processor memory accesses in a transactional memory where co-processor supplements functions of the processor
US9658961B2 (en) 2014-06-27 2017-05-23 International Business Machines Corporation Speculation control for improving transaction success rate, and instruction therefor
US10114752B2 (en) 2014-06-27 2018-10-30 International Business Machines Corporation Detecting cache conflicts by utilizing logical address comparisons in a transactional memory
US9720837B2 (en) 2014-06-27 2017-08-01 International Business Machines Corporation Allowing non-cacheable loads within a transaction
US10025715B2 (en) 2014-06-27 2018-07-17 International Business Machines Corporation Conditional inclusion of data in a transactional memory read set
US10073784B2 (en) 2014-06-27 2018-09-11 International Business Machines Corporation Memory performance when speculation control is enabled, and instruction therefor
GB2533415B (en) * 2014-12-19 2022-01-19 Advanced Risc Mach Ltd Apparatus with at least one resource having thread mode and transaction mode, and method
CN107466399B (en) * 2015-05-01 2020-12-04 慧与发展有限责任合伙企业 System and method for providing throttled data memory access
US10067960B2 (en) 2015-06-04 2018-09-04 Microsoft Technology Licensing, Llc Controlling atomic updates of indexes using hardware transactional memory
US9858189B2 (en) * 2015-06-24 2018-01-02 International Business Machines Corporation Hybrid tracking of transaction read and write sets
US9760494B2 (en) 2015-06-24 2017-09-12 International Business Machines Corporation Hybrid tracking of transaction read and write sets
US9811392B2 (en) * 2015-11-24 2017-11-07 Microsoft Technology Licensing, Llc Precondition exclusivity mapping of tasks to computational locations
US10733091B2 (en) 2016-05-03 2020-08-04 International Business Machines Corporation Read and write sets for ranges of instructions of transactions
US10042761B2 (en) * 2016-05-03 2018-08-07 International Business Machines Corporation Read and write sets for transactions of a multithreaded computing environment
US11120002B2 (en) * 2016-07-20 2021-09-14 Verizon Media Inc. Method and system for concurrent database operation
US10698802B1 (en) * 2018-03-21 2020-06-30 Cadence Design Systems, Inc. Method and system for generating a validation test
US10931450B1 (en) * 2018-04-27 2021-02-23 Pure Storage, Inc. Distributed, lock-free 2-phase commit of secret shares using multiple stateless controllers
CN110162488B (en) * 2018-11-15 2022-02-11 深圳乐信软件技术有限公司 Cache consistency checking method, device, server and storage medium
US11086779B2 (en) * 2019-11-11 2021-08-10 Vmware, Inc. System and method of a highly concurrent cache replacement algorithm
CN111831557B (en) * 2020-06-19 2023-10-20 北京华三通信技术有限公司 Deadlock detection method and device
CN112596877A (en) * 2020-12-18 2021-04-02 深圳Tcl新技术有限公司 Global variable using method, device, system and computer readable storage medium

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301290A (en) * 1990-03-14 1994-04-05 International Business Machines Corporation Method for minimizing lock processing while ensuring consistency among pages common to local processor caches and a shared external store
US5596754A (en) * 1992-10-29 1997-01-21 Digital Equipment Corporation Method for performing private lock management
US5649200A (en) * 1993-01-08 1997-07-15 Atria Software, Inc. Dynamic rule-based version control system
US5950199A (en) * 1997-07-11 1999-09-07 International Business Machines Corporation Parallel file system and method for granting byte range tokens
US5956731A (en) * 1997-04-23 1999-09-21 Oracle Corporation Sharing snapshots for consistent reads
US5987506A (en) * 1996-11-22 1999-11-16 Mangosoft Corporation Remote access and geographically distributed computers in a globally addressable storage environment
US6101590A (en) * 1995-10-10 2000-08-08 Micro Unity Systems Engineering, Inc. Virtual memory system with local and global virtual address translation
US6148377A (en) * 1996-11-22 2000-11-14 Mangosoft Corporation Shared memory computer networks
US6393437B1 (en) * 1998-01-27 2002-05-21 Microsoft Corporation Web developer isolation techniques
US6757893B1 (en) * 1999-12-17 2004-06-29 Canon Kabushiki Kaisha Version control system for software code
US6810470B1 (en) * 2000-08-14 2004-10-26 Ati Technologies, Inc. Memory request interlock
US20040227531A1 (en) * 2000-05-30 2004-11-18 Keizo Yamada Semiconductor device test method and semiconductor device tester
US6826570B1 (en) * 2000-07-18 2004-11-30 International Business Machines Corporation Dynamically switching between different types of concurrency control techniques to provide an adaptive access strategy for a parallel file system
US20050038961A1 (en) * 2003-08-11 2005-02-17 Chao-Wu Chen Cache and memory architecture for fast program space access
US20060106996A1 (en) * 2004-11-15 2006-05-18 Ahmad Said A Updating data shared among systems
US20060161919A1 (en) * 2004-12-23 2006-07-20 Onufryk Peter Z Implementation of load linked and store conditional operations
US20060236039A1 (en) * 2005-04-19 2006-10-19 International Business Machines Corporation Method and apparatus for synchronizing shared data between components in a group
US20070124546A1 (en) * 2005-11-29 2007-05-31 Anton Blanchard Automatic yielding on lock contention for a multi-threaded processor
US7313794B1 (en) * 2003-01-30 2007-12-25 Xilinx, Inc. Method and apparatus for synchronization of shared memory in a multiprocessor system
US7395263B2 (en) * 2005-10-12 2008-07-01 International Business Machines Corporation Realtime-safe read copy update with lock-free readers
US7467378B1 (en) * 2004-02-09 2008-12-16 Symantec Corporation System state rollback after modification failure
US7536517B2 (en) * 2005-07-29 2009-05-19 Microsoft Corporation Direct-update software transactional memory

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6678772B2 (en) 2000-12-19 2004-01-13 International Businesss Machines Corporation Adaptive reader-writer lock

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301290A (en) * 1990-03-14 1994-04-05 International Business Machines Corporation Method for minimizing lock processing while ensuring consistency among pages common to local processor caches and a shared external store
US5596754A (en) * 1992-10-29 1997-01-21 Digital Equipment Corporation Method for performing private lock management
US5649200A (en) * 1993-01-08 1997-07-15 Atria Software, Inc. Dynamic rule-based version control system
US6101590A (en) * 1995-10-10 2000-08-08 Micro Unity Systems Engineering, Inc. Virtual memory system with local and global virtual address translation
US5987506A (en) * 1996-11-22 1999-11-16 Mangosoft Corporation Remote access and geographically distributed computers in a globally addressable storage environment
US6148377A (en) * 1996-11-22 2000-11-14 Mangosoft Corporation Shared memory computer networks
US5956731A (en) * 1997-04-23 1999-09-21 Oracle Corporation Sharing snapshots for consistent reads
US5950199A (en) * 1997-07-11 1999-09-07 International Business Machines Corporation Parallel file system and method for granting byte range tokens
US6393437B1 (en) * 1998-01-27 2002-05-21 Microsoft Corporation Web developer isolation techniques
US6757893B1 (en) * 1999-12-17 2004-06-29 Canon Kabushiki Kaisha Version control system for software code
US20040227531A1 (en) * 2000-05-30 2004-11-18 Keizo Yamada Semiconductor device test method and semiconductor device tester
US6826570B1 (en) * 2000-07-18 2004-11-30 International Business Machines Corporation Dynamically switching between different types of concurrency control techniques to provide an adaptive access strategy for a parallel file system
US6810470B1 (en) * 2000-08-14 2004-10-26 Ati Technologies, Inc. Memory request interlock
US7313794B1 (en) * 2003-01-30 2007-12-25 Xilinx, Inc. Method and apparatus for synchronization of shared memory in a multiprocessor system
US20050038961A1 (en) * 2003-08-11 2005-02-17 Chao-Wu Chen Cache and memory architecture for fast program space access
US7467378B1 (en) * 2004-02-09 2008-12-16 Symantec Corporation System state rollback after modification failure
US20060106996A1 (en) * 2004-11-15 2006-05-18 Ahmad Said A Updating data shared among systems
US20060161919A1 (en) * 2004-12-23 2006-07-20 Onufryk Peter Z Implementation of load linked and store conditional operations
US20060236039A1 (en) * 2005-04-19 2006-10-19 International Business Machines Corporation Method and apparatus for synchronizing shared data between components in a group
US7536517B2 (en) * 2005-07-29 2009-05-19 Microsoft Corporation Direct-update software transactional memory
US7395263B2 (en) * 2005-10-12 2008-07-01 International Business Machines Corporation Realtime-safe read copy update with lock-free readers
US20070124546A1 (en) * 2005-11-29 2007-05-31 Anton Blanchard Automatic yielding on lock contention for a multi-threaded processor

Cited By (129)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080005498A1 (en) * 2006-06-09 2008-01-03 Sun Microsystems, Inc. Method and system for enabling a synchronization-free and parallel commit phase
US7797329B2 (en) * 2006-06-09 2010-09-14 Oracle America Inc. Method and system for enabling a synchronization-free and parallel commit phase
US20070299864A1 (en) * 2006-06-24 2007-12-27 Mark Strachan Object storage subsystem computer program
US20080098374A1 (en) * 2006-09-29 2008-04-24 Ali-Reza Adl-Tabatabai Method and apparatus for performing dynamic optimization for software transactional memory
US7913236B2 (en) * 2006-09-29 2011-03-22 Intel Corporation Method and apparatus for performing dynamic optimization for software transactional memory
US20080092140A1 (en) * 2006-10-03 2008-04-17 Doninger Cheryl G Systems and methods for executing a computer program in a multi-processor environment
US7979858B2 (en) * 2006-10-03 2011-07-12 Sas Institute Inc. Systems and methods for executing a computer program that executes multiple processes in a multi-processor environment
US8838908B2 (en) * 2007-06-27 2014-09-16 Intel Corporation Using ephemeral stores for fine-grained conflict detection in a hardware accelerated STM
US9280397B2 (en) 2007-06-27 2016-03-08 Intel Corporation Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata
US20110145516A1 (en) * 2007-06-27 2011-06-16 Ali-Reza Adl-Tabatabai Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata
US20090031310A1 (en) * 2007-07-27 2009-01-29 Yosef Lev System and Method for Executing Nested Atomic Blocks Using Split Hardware Transactions
US7516366B2 (en) * 2007-07-27 2009-04-07 Sun Microsystems, Inc. System and method for executing nested atomic blocks using split hardware transactions
US7516365B2 (en) * 2007-07-27 2009-04-07 Sun Microsystems, Inc. System and method for split hardware transactions
US20090031309A1 (en) * 2007-07-27 2009-01-29 Yosef Lev System and Method for Split Hardware Transactions
US20130018860A1 (en) * 2007-09-18 2013-01-17 Microsoft Corporation Parallel nested transactions in transactional memory
US9411635B2 (en) * 2007-09-18 2016-08-09 Microsoft Technology Licensing, Llc Parallel nested transactions in transactional memory
US8375062B2 (en) * 2007-11-19 2013-02-12 Oracle America, Inc. Simple optimistic skiplist
US20090132563A1 (en) * 2007-11-19 2009-05-21 Sun Microsystems, Inc. Simple optimistic skiplist
US20090133032A1 (en) * 2007-11-21 2009-05-21 Stuart David Biles Contention management for a hardware transactional memory
US9513959B2 (en) * 2007-11-21 2016-12-06 Arm Limited Contention management for a hardware transactional memory
US20090172306A1 (en) * 2007-12-31 2009-07-02 Nussbaum Daniel S System and Method for Supporting Phased Transactional Memory Modes
US20090183159A1 (en) * 2008-01-11 2009-07-16 Michael Maged M Managing concurrent transactions using bloom filters
US10699714B2 (en) 2008-07-02 2020-06-30 Google Llc Speech recognition with parallel recognition tasks
US10049672B2 (en) 2008-07-02 2018-08-14 Google Llc Speech recognition with parallel recognition tasks
US9373329B2 (en) 2008-07-02 2016-06-21 Google Inc. Speech recognition with parallel recognition tasks
US8364481B2 (en) * 2008-07-02 2013-01-29 Google Inc. Speech recognition with parallel recognition tasks
US20100004930A1 (en) * 2008-07-02 2010-01-07 Brian Strope Speech Recognition with Parallel Recognition Tasks
US11527248B2 (en) 2008-07-02 2022-12-13 Google Llc Speech recognition with parallel recognition tasks
US20100100689A1 (en) * 2008-10-20 2010-04-22 Microsoft Corporation Transaction processing in transactional memory
US20100100885A1 (en) * 2008-10-20 2010-04-22 Microsoft Corporation Transaction processing for side-effecting actions in transactional memory
US8001548B2 (en) 2008-10-20 2011-08-16 Microsoft Corporation Transaction processing for side-effecting actions in transactional memory
US8166481B2 (en) 2008-10-20 2012-04-24 Microsoft Corporation Transaction processing in transactional memory
US20150248310A1 (en) * 2008-12-29 2015-09-03 Oracle International Corporation Method and System for Inter-Thread Communication Using Processor Messaging
US9021502B2 (en) * 2008-12-29 2015-04-28 Oracle America Inc. Method and system for inter-thread communication using processor messaging
US20100169895A1 (en) * 2008-12-29 2010-07-01 David Dice Method and System for Inter-Thread Communication Using Processor Messaging
US10776154B2 (en) * 2008-12-29 2020-09-15 Oracle America, Inc. Method and system for inter-thread communication using processor messaging
US20100228929A1 (en) * 2009-03-09 2010-09-09 Microsoft Corporation Expedited completion of a transaction in stm
US9519524B2 (en) * 2009-04-22 2016-12-13 Microsoft Technology Licensing, Llc Providing lock-based access to nodes in a concurrent linked list
US20120254139A1 (en) * 2009-04-22 2012-10-04 Microsoft Corporation Providing lock-based access to nodes in a concurrent linked list
US20170235780A1 (en) * 2009-04-22 2017-08-17 Microsoft Technology Licensing, Llc Providing lock-based access to nodes in a concurrent linked list
US8370577B2 (en) 2009-06-26 2013-02-05 Microsoft Corporation Metaphysically addressed cache metadata
US20100332721A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Operating system virtual memory management for hardware transactional memory
US9767027B2 (en) 2009-06-26 2017-09-19 Microsoft Technology Licensing, Llc Private memory regions and coherency optimization by controlling snoop traffic volume in multi-level cache hierarchy
US8812796B2 (en) 2009-06-26 2014-08-19 Microsoft Corporation Private memory regions and coherence optimizations
US20100332807A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Performing escape actions in transactions
US20100332808A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Minimizing code duplication in an unbounded transactional memory system
US20100332771A1 (en) * 2009-06-26 2010-12-30 Microsoft Corporation Private memory regions and coherence optimizations
US8688951B2 (en) 2009-06-26 2014-04-01 Microsoft Corporation Operating system virtual memory management for hardware transactional memory
US8250331B2 (en) 2009-06-26 2012-08-21 Microsoft Corporation Operating system virtual memory management for hardware transactional memory
US8489864B2 (en) 2009-06-26 2013-07-16 Microsoft Corporation Performing escape actions in transactions
US8356166B2 (en) 2009-06-26 2013-01-15 Microsoft Corporation Minimizing code duplication in an unbounded transactional memory system by using mode agnostic transactional read and write barriers
US8161247B2 (en) 2009-06-26 2012-04-17 Microsoft Corporation Wait loss synchronization
US8484438B2 (en) * 2009-06-29 2013-07-09 Oracle America, Inc. Hierarchical bloom filters for facilitating concurrency control
US20100332765A1 (en) * 2009-06-29 2010-12-30 Sun Microsystems, Inc. Hierarchical bloom filters for facilitating concurrency control
US20100332538A1 (en) * 2009-06-30 2010-12-30 Microsoft Corporation Hardware accelerated transactional memory system with open nested transactions
US8229907B2 (en) 2009-06-30 2012-07-24 Microsoft Corporation Hardware accelerated transactional memory system with open nested transactions
US20120117317A1 (en) * 2009-08-20 2012-05-10 Rambus Inc. Atomic memory device
US11204863B2 (en) 2009-08-20 2021-12-21 Rambus Inc. Memory component that performs data write from pre-programmed register
US11748252B2 (en) 2009-08-20 2023-09-05 Rambus Inc. Data write from pre-programmed register
US9898400B2 (en) 2009-08-20 2018-02-20 Rambus Inc. Single command, multiple column-operation memory device
US11720485B2 (en) 2009-08-20 2023-08-08 Rambus Inc. DRAM with command-differentiated storage of internally and externally sourced data
US10552310B2 (en) 2009-08-20 2020-02-04 Rambus Inc. Single command, multiple column-operation memory device
US9658953B2 (en) 2009-08-20 2017-05-23 Rambus Inc. Single command, multiple column-operation memory device
US9658880B2 (en) 2009-12-15 2017-05-23 Microsoft Technology Licensing, Llc Efficient garbage collection and exception handling in a hardware accelerated transactional memory system
US8533440B2 (en) 2009-12-15 2013-09-10 Microsoft Corporation Accelerating parallel transactions using cache resident transactions
US8539465B2 (en) 2009-12-15 2013-09-17 Microsoft Corporation Accelerating unbounded memory transactions using nested cache resident transactions
US20110145498A1 (en) * 2009-12-15 2011-06-16 Microsoft Corporation Instrumentation of hardware assisted transactional memory system
US9092253B2 (en) 2009-12-15 2015-07-28 Microsoft Technology Licensing, Llc Instrumentation of hardware assisted transactional memory system
US8402218B2 (en) 2009-12-15 2013-03-19 Microsoft Corporation Efficient garbage collection and exception handling in a hardware accelerated transactional memory system
US20110145553A1 (en) * 2009-12-15 2011-06-16 Microsoft Corporation Accelerating parallel transactions using cache resident transactions
US20110145304A1 (en) * 2009-12-15 2011-06-16 Microsoft Corporation Efficient garbage collection and exception handling in a hardware accelerated transactional memory system
US20110246428A1 (en) * 2010-04-01 2011-10-06 Research In Motion Limited Method for communicating device management data changes
US9467338B2 (en) * 2010-04-01 2016-10-11 Blackberry Limited Method for communicating device management data changes
US20160147827A1 (en) * 2010-04-08 2016-05-26 Microsoft Technology Licensing, Llc In-memory database system
US10296615B2 (en) * 2010-04-08 2019-05-21 Microsoft Technology Licensing, Llc In-memory database system
US9830350B2 (en) * 2010-04-08 2017-11-28 Microsoft Technology Licensing, Llc In-memory database system
US10055449B2 (en) * 2010-04-08 2018-08-21 Microsoft Technology Licensing, Llc In-memory database system
US11048691B2 (en) * 2010-04-08 2021-06-29 Microsoft Technology Licensing, Llc In-memory database system
US8555161B2 (en) 2010-05-27 2013-10-08 Microsoft Corporation Concurrent editing of a document by multiple clients
US9652440B2 (en) 2010-05-27 2017-05-16 Microsoft Technology Licensing, Llc Concurrent utilization of a document by multiple threads
US8402061B1 (en) 2010-08-27 2013-03-19 Amazon Technologies, Inc. Tiered middleware framework for data storage
US8688666B1 (en) 2010-08-27 2014-04-01 Amazon Technologies, Inc. Multi-blob consistency for atomic data transactions
US8510344B1 (en) 2010-08-27 2013-08-13 Amazon Technologies, Inc. Optimistically consistent arbitrary data blob transactions
US8856089B1 (en) * 2010-08-27 2014-10-07 Amazon Technologies, Inc. Sub-containment concurrency for hierarchical data containers
US8510304B1 (en) 2010-08-27 2013-08-13 Amazon Technologies, Inc. Transactionally consistent indexing for data blobs
US8412691B2 (en) 2010-09-10 2013-04-02 International Business Machines Corporation Maintenance and access of a linked list
US8621161B1 (en) 2010-09-23 2013-12-31 Amazon Technologies, Inc. Moving data between data stores
US9542164B1 (en) * 2011-03-02 2017-01-10 The Mathworks, Inc. Managing an application variable using variable attributes
US10705806B1 (en) 2011-03-02 2020-07-07 The Mathworks, Inc. Managing an application variable using variable attributes
US9069893B2 (en) * 2011-03-23 2015-06-30 International Business Machines Corporation Automatic verification of determinism for parallel programs
US20120246662A1 (en) * 2011-03-23 2012-09-27 Martin Vechev Automatic Verification of Determinism for Parallel Programs
US9043363B2 (en) * 2011-06-03 2015-05-26 Oracle International Corporation System and method for performing memory management using hardware transactions
US20120310987A1 (en) * 2011-06-03 2012-12-06 Aleksandar Dragojevic System and Method for Performing Memory Management Using Hardware Transactions
US20150113535A1 (en) * 2012-05-31 2015-04-23 Hitachi, Ltd. Parallel data processing system, computer, and parallel data processing method
US9841989B2 (en) * 2012-05-31 2017-12-12 Hitachi, Ltd. Parallel data processing system, computer, and parallel data processing method
US8862561B1 (en) * 2012-08-30 2014-10-14 Google Inc. Detecting read/write conflicts
US20150277993A1 (en) * 2012-12-14 2015-10-01 Huawei Technologies Co., Ltd. Task Processing Method and Virtual Machine
US9996401B2 (en) * 2012-12-14 2018-06-12 Huawei Technologies Co., Ltd. Task processing method and virtual machine
US20140289739A1 (en) * 2013-03-20 2014-09-25 Hewlett-Packard Development Company, L.P. Allocating and sharing a data object among program instances
US20170161102A1 (en) * 2014-07-10 2017-06-08 Oracle International Corporation Process scheduling and execution in distributed computing environments
US9804887B2 (en) * 2014-07-10 2017-10-31 Oracle International Corporation Process scheduling and execution in distributed computing environments
US20160011912A1 (en) * 2014-07-10 2016-01-14 Oracle International Corporation Process scheduling and execution in distributed computing environments
US9600327B2 (en) * 2014-07-10 2017-03-21 Oracle International Corporation Process scheduling and execution in distributed computing environments
US20160098294A1 (en) * 2014-10-01 2016-04-07 Red Hat, Inc. Execution of a method at a cluster of nodes
US10489213B2 (en) * 2014-10-01 2019-11-26 Red Hat, Inc. Execution of a method at a cluster of nodes
US9767014B2 (en) 2015-07-15 2017-09-19 Innovium, Inc. System and method for implementing distributed-linked lists for network devices
US9841913B2 (en) 2015-07-15 2017-12-12 Innovium, Inc. System and method for enabling high read rates to data element lists
US9753660B2 (en) 2015-07-15 2017-09-05 Innovium, Inc. System and method for implementing hierarchical distributed-linked lists for network devices
US10740006B2 (en) 2015-07-15 2020-08-11 Innovium, Inc. System and method for enabling high read rates to data element lists
US9690507B2 (en) 2015-07-15 2017-06-27 Innovium, Inc. System and method for enabling high read rates to data element lists
US9785367B2 (en) 2015-07-15 2017-10-10 Innovium, Inc. System and method for enabling high read rates to data element lists
US10055153B2 (en) 2015-07-15 2018-08-21 Innovium, Inc. Implementing hierarchical distributed-linked lists for network devices
WO2017053828A1 (en) * 2015-09-23 2017-03-30 Hanan Potash Computing device with frames/bins structure, mentor layer and plural operand processing
US20170083237A1 (en) * 2015-09-23 2017-03-23 Hanan Potash Computing device with frames/bins structure, mentor layer and plural operand processing
US10061511B2 (en) * 2015-09-23 2018-08-28 Hanan Potash Computing device with frames/bins structure, mentor layer and plural operand processing
US9977693B2 (en) 2015-09-23 2018-05-22 Hanan Potash Processor that uses plural form information
US10140122B2 (en) 2015-09-23 2018-11-27 Hanan Potash Computer processor with operand/variable-mapped namespace
US10095641B2 (en) 2015-09-23 2018-10-09 Hanan Potash Processor with frames/bins structure in local high speed memory
US10067878B2 (en) 2015-09-23 2018-09-04 Hanan Potash Processor with logical mentor
US10140021B2 (en) * 2015-12-23 2018-11-27 Netapp, Inc. Adaptive data-partitioning model that responds to observed workload
US11093286B2 (en) 2016-04-26 2021-08-17 Hanan Potash Computing device with resource manager and civilware tier
US10467198B2 (en) * 2016-09-15 2019-11-05 Oracle International Corporation Network partition tolerance in a high available centralized VCS implementation
US11334530B2 (en) * 2016-09-15 2022-05-17 Oracle International Corporation Network partition tolerance in a high available centralized VCS implementation
US10528479B2 (en) * 2017-06-02 2020-01-07 Huawei Technologies Co., Ltd. Global variable migration via virtual memory overlay technique for multi-version asynchronous dynamic software update
US10430187B2 (en) * 2017-08-15 2019-10-01 Oracle International Corporation Persistent transactional memory metadata-based buffer caches
US10884741B2 (en) 2017-08-15 2021-01-05 Oracle International Corporation Persistent transactional memory metadata-based buffer caches
CN107844385A (en) * 2017-11-08 2018-03-27 北京潘达互娱科技有限公司 A kind of variable read-write method and device based on shared drive
US11409559B2 (en) * 2019-10-24 2022-08-09 EMC IP Holding Company, LLC System and method for weak lock allowing force preemption by high priority thread
US11593275B2 (en) 2021-06-01 2023-02-28 International Business Machines Corporation Operating system deactivation of storage block write protection absent quiescing of processors

Also Published As

Publication number Publication date
US20070198519A1 (en) 2007-08-23
US20070198781A1 (en) 2007-08-23
US7496716B2 (en) 2009-02-24
US7669015B2 (en) 2010-02-23
US8065499B2 (en) 2011-11-22
US20070198792A1 (en) 2007-08-23
US20070198978A1 (en) 2007-08-23

Similar Documents

Publication Publication Date Title
US20070198979A1 (en) Methods and apparatus to implement parallel transactions
US8028133B2 (en) Globally incremented variable or clock based methods and apparatus to implement parallel transactions
Dice et al. Understanding tradeoffs in software transactional memory
Dice et al. Transactional locking II
Harris et al. Language support for lightweight transactions
US7792805B2 (en) Fine-locked transactional memory
Harris et al. Transactional memory
Dice et al. What really makes transactions faster?
Saha et al. McRT-STM: a high performance software transactional memory system for a multi-core runtime
Harris et al. Transactional memory: An overview
AU2010337319B2 (en) Performing mode switching in an unbounded transactional memory (UTM) system
Shriraman et al. An integrated hardware-software approach to flexible transactional memory
US9529715B2 (en) Hybrid hardware and software implementation of transactional memory access
Makreshanski et al. To lock, swap, or elide: On the interplay of hardware transactional memory and lock-free indexing
US9280397B2 (en) Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata
Guerraoui et al. On obstruction-free transactions
Harris et al. Revocable locks for non-blocking programming
Grahn Transactional memory
Duan et al. Asymmetric memory fences: Optimizing both performance and implementability
Howard Extending relativistic programming to multiple writers
Fatourou et al. Algorithmic techniques in stm design
Moore et al. Log-based transactional memory
Guerraoui et al. Transactional memory: Glimmer of a theory
Shahid et al. Hardware transactional memories: A survey
David Universally Scalable Concurrent Data Structures

Legal Events

Date Code Title Description
AS Assignment

Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DICE, DAVID;SHAVIT, NIR N.;REEL/FRAME:018017/0723;SIGNING DATES FROM 20060620 TO 20060621

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION