WO2000036491A2

WO2000036491A2 - Programming system and thread synchronization mechanisms for the development of selectively sequential and multithreaded computer programs

Info

Publication number: WO2000036491A2
Application number: PCT/US1999/030274
Authority: WO
Inventors: John Thornley; K. Mani Chandy; Hiroshi Ishii
Original assignee: California Institute Of Technology
Priority date: 1998-12-17
Filing date: 1999-12-15
Publication date: 2000-06-22
Also published as: WO2000036491A9; WO2000036491A3; AU2590700A

Abstract

A structured multithreaded programming system is described for integrated use with existing and new programming languages and systems. The structured multithreaded programming system enables programs to be developed which include both multithreaded and multithreadable code constructs. The multithreaded code constructs require explicitly concurrent execution. The multithreadable code constructs can be executed either sequentially or concurrently, at the selection of the programmer or computer user (401). When executed concurrently, the different threads of execution in a multithreaded program developed with this system can be synchronized using innovative synchronization objects (404). One type of synchronization object is a special type of counter (400), which can be constrained to be monotonically increasing in value. Another related type of synchronization object is a special type of flag, which can be constrained to have its value set monotonically.

Description

PROGRAMMING SYSTEM AND THREAD SYNCHRONIZATION MECHANISMS FOR THE DEVELOPMENT OF SELECTIVELY SEQUENTIAL AND MULTITHREADED COMPUTER PROGRAMS

The present application claims priority under 35 U.S.C. 119(e) from provisional application number 60/112,817 filed December 17, 1998.

Background Many computer programs are computationally intensive, meaning that they require large amounts of computing power. As a consequence, these programs may execute more slowly than the computer user desires, even on the fastest computers . One way of increasing the execution speed of a computationally intensive computer program is to divide the program into multiple units, or loci, of concurrent execution. These units of execution are known as "threads" . A program with multiple threads of execution is known as a "multithreaded program" . A program with only a single thread of execution is known as a "sequential program" . The threads that make up a multithreaded program may be executed concurrently on multiple computer processors, allowing many operations in the program to be carried out simultaneously, thereby speeding up program execution. In a multithreaded program, the program or operating system must control the access of threads to data objects in the program, in order to the prevent the multiple threads from concurrently accessing the same data object in an undesirable manner. If multiple threads modify the same data object concurrently, or read and modify the same data object concurrently, the resulting state of the program is extremely difficult to determine. Developing a multithreaded program is significantly more difficult than developing a sequential program because of the problems of (1) expressing the division of a program into multiple threads and (2) structuring and controlling the access of those threads to data objects.

Summary The present application teaches a structured thread ("Sthread") system with thread synchronization and production mechanisms. Another aspect produces multithreadable code.

The multithreadable code can be annotated using information indicative of its multithreadability . The multithreadable code constructs are code constructs that can be executed in a multithreadable manner, or equivalently in a sequential manner. Multithreadable code constructs may be expressed by annotating sequential code constructs to indicate that their multithreaded execution is equivalent to sequential execution. The multithreadable code can be used along with multithreaded code. Specific instances of a multithreadable constructs: a multithreadable block, and a multithreadable for loop, and are disclosed.

The second aspect of the system is the integration of multithreadable code constructs with traditional explicitly multithreaded code constructs. Explicitly multithreaded code constructs must always be executed in a multithreaded manner. Examples of explicitly multithreaded code constructs include multithreaded block constructs, multithreaded for loop constructs, and library-based thread creation functions. Multithreadable code constructs and explicitly multithreaded code constructs may be intermingled within a program as required, with well-defined meaning. According to a first aspect, a special counter called an "s-counter", is used as a thread synchronization mechanism. Special "s-Flags" can also be used for thread synchronization, and flag synchronization is also described herein. Yet another aspect is the implementation of the programming system within an existing compiler environment using a special pre-processing system.

The embodiments of the invention describe additional details, including the following:

The s-counter synchronizing the access of threads to shared data objects. The mechanisms use "monotonic" synchronization objects, with operations that can be constrained to only move the value of the object in one direction. Monotonic synchronization objects can be used to synchronize the access of threads to shared data objects in multithreadable code constructs in a manner that guarantees the equivalence of sequential and multithreaded execution. Specific instances of monotonic synchronization objects are disclosed, namely a form of counter called an "s-counter" and a form of flag called an "s-flag" . The s-counter is a particularly powerful thread synchronization mechanism in many contexts, with its use in multithreadable code constructs being one example.

The application describes implementation of the multithreaded programming system within an existing program development and compilation environment using a special source-to-source preprocessing system and high- level thread library. This allows the system to be transparently and seamlessly integrated with existing programming systems such as Microsoft Visual Studio for the Microsoft Windows family of operating systems, the GNU program development tools for Linux and other versions of the Unix operating system, or on any version of the Java programming language, for example.

Brief Description Of The Drawings These and other aspects will now be described in detail with respect to the accompanying drawings, wherein:

Figure 1 is a process flowchart showing a prior art method for compiling multithreaded code; Figure 2 shows a computer system and its thread allocation system;

Figure 3 shows a flowchart of defining an s-counter; and

Figure 4 shows a flowchart of Sthreads execution of a program.

Detailed Description FIG. 1 is a process flowchart showing a prior art method for compiling multithreaded code. Source code text 300 including multithreaded code constructs is processed by a conventional compiler 302. The compiler communicates with a linker 304 which links pre-existing routines from a library 306 with the output of the compiler to create an executable module 308.

Existing operating systems, including the WIN32 API, often provide a general purpose thread library which may allow carrying out defined tasks like these. For example, a first thread may be defined for operating the CD ROM, and another for the modem.

Windows NT WIN32 thread creation is unstructured. A thread is created by passing a function pointer and an argument pointer to a CREATETHREAD call. The new thread then executes the given function with the given argument. There is no specific relationship between the created thread and the creating thread: the two threads are effectively asynchronous. One thread for example, can arbitrarily suspend, resume or terminate the execution of another thread. This is not a problem for unrelated tasks like

CD/modem tasks noted above. However, when two parts of a program are to be executed as threads, the synchronization operations are often complex and error prone. Unpredictable interactions among the multiple threads can induce problems including race conditions, and deadlock. Effectively, the user is left with the daunting task of using these thread libraries in a way that does not cause this problem. The present application discloses a specific embodiment operating using the Windows NT (TM) system. It should be understood, however, that this system is portable across many platforms and that the same concept described herein can be used in those systems, including Linux, and any other operating system.

While a process has its own address space, a thread is often simply a program counter and stack pointer. A process may have many threads but all the threads share the same address space. Figures 2 and 3 show this operation in a computer system. Figure 2 shows a computer system, with four processors 200, 202, 204, 206. The processors can be in a multiple processor system as shown. The pool 199 of processors is associated with an operating system 210, a user interface 215, auxiliary hardware 220 (e.g. memory, chipsets, etc), a display 225 and other computer components .

The operating system 210 includes multiple threads 212, 214, and others. Each thread is resident on the stack within the heap. The threads are associated with processors, which execute the threads. Figure 2 shows the pool of threads on the left and the pool of processors on the right. Each of the dynamically-created threads are peers . Of course there can be many more threads than processors. The operating system controls the threads to dynamically switch between the processors. The present application defines an entirely new way of creating, synchronizing, and handling the synchronization among threads. The system uses a new way of compiling code based on multithreadable code, either alone, or in conjunction with multithreaded code.

The operating system or programming language has a higher level system that uses special constructs called "equivalency annotations" . A lower level function call based system is used with special objects. Those special objects can synchronize among the threads in a way that prevents the objects within the threads from having ambiguous states. Many of these systems are based on the concept of equivalence annotations. Equivalence annotations can take many forms - pragmas, special keywords, special kinds of comments, special characters, textual modifications (such as boldfacing, underlining, or italics), and others. They could be part of the program text, or in a separate file i.e., an extra file that contains nothing but the annotations . The pragma can have meaning to a compiler. Pragmas often form a specified syntax, but usually convey nonessential information that is intended to help the compiler to optimize the program.

The present embodiment uses these pragmas as special equivalence annotations. Pragmas are convenient for annotations since many programming language already provide pragmas for other purposes. While a pragma is described as being used as the preferred annotation of the present application, the program can certainly be annotated in other manners. For example, Java does not support pragmas, so a special kind of comment line could be used. The equivalence annotations described throughout this specification should be understood to be interpretable in this way.

The MULTITHREADABLE equivalence annotation can be a pragma when embodied in the C programming language. This indicates that a block or loop can be executed in a multithreaded manner. This means that there is no timing dependent nondeterminacy, and the system can execute the instructions into a multithreaded system. The MULTITHREADED equivalence annotation means that a block or loop must be executed in a multithreaded manner. The multithreaded execution need not be equivalent to sequential execution. Lock synchronization can be used to introduce nondeterminacy if desired.

The equivalence annotation becomes part of the operating system. Special, monotonically increasing and otherwise constrained S-COUNTERS, and similarly constrained S-FLAGS are operated to synchronize the access of threads to shared memory in order to prevent unwanted interference.

A special SYNCHRONIZATION COUNTER, or S-COUNTER is defined as an object with three basic attributes. The s-COUNTER has a non-negative integer value. The object only allows an INCREMENT operation and a CHECK operation. An initial value of the S-COUNTER object is set to zero. An INCREMENT function automatically increases the value of the counter by a specified amount. The CHECK operation suspends the calling thread until the value of the counter becomes greater than or equal to a specified level.

The multi-threaded programming system has a higher level notation includes annotation objects in the program code. Using the example of the c language, this can be described as "multi -threaded c" . A lower level structured thread library is described as "Sthreads" . The special annotation objects are transformed into special Sthread calls by the Sthreads annotation objects pre-processor . The multithreaded model uses the thread synchronization/annotation objects disclosed above to synchronize among the threads.

Threads can be created in different ways. A first thread creation construct is the multithreaded block. This is indicated by the MULTITHREADED keyword placed immediately before an ordinary C block:

ULTITHREADED { statement statement

}

This notation specifies that the statements of the block should be executed as asynchronous threads. This is a conventional way of referring to these threads. For example, the operating system could create threads to read from CD, and threads to read from tape. The threads are executed and proceed concurrently. They all share the same address space as the parent program. Execution does not continue past the multithreaded block until all the threads have individually terminated. It is typically illegal for the program to contain any kind of jump between the individual statements of the block, from inside the block to outside the block, or from outside the block to inside the block.

A second thread creation construct is the multithreaded for-loop, indicated by the MULTITHREADED keyword placed immediately before an ordinary for-loop:

MULTITHREADED

FOR (i = expression; i comparison expression; i = i + expression) statement

This notation specifies that the iterations of the loop should be executed as asynchronous threads . The threads all share the same address space as the parent program. Each thread, however, has a local copy of the loop control -variable with a different value from the iteration range. The iteration scheme can restrict to a single control-variable and expressions that are not modified within the loop body. Execution does not continue past the multithreaded for-loop until all the threads have individually terminated. It is illegal for the program to contain any kind of jump from inside the loop to outside the loop or from outside the loop to inside the loop. In essence, a multithreaded for-loop is a quantified form of multithreaded block.

Multithreaded and ordinary blocks and for-loops can be arbitrarily nested. Traditional approaches have often been categorized as either being explicitly multithreaded or implicitly multithreaded. With explicit multithreading, the programmer expresses exactly how the operations of the program are executed by threads. Implicit multithreading is carried out when the programmer writes an ordinary sequential program. The programming system, e.g. the compiler, determines how the operations can be executed by separate threads . The present application goes beyond the multithreaded concepts described above into a concept of multithreadable code constructs. The multithreadable construct can be executed according to a specified sequential operational semantics. The most common operational semantics would be executed sequentially. An alternative, however, allows the multithreadable code construct to be operated according to multithreaded operational semantics.

Rules are defined that constrain the multithreaded execution such that its result is equivalent to sequential execution.

As disclosed herein, the multithreadable code construct is formed of: i) a syntactic description of the form of the construct,

(ii) a sequential operational semantics, that, when executed, defines how to execute the construct sequentially,

(iii) a multithreaded operational semantics, defining how to execute the construct by a set of threads , and

(iv) a set of implicit or explicit programming rules that are sufficient to ensure the equivalence of sequential and multithreaded execution of the construct. The MULTITHREADABLE pragma becomes an assertion by the programmer that the BLOCK or FOR loop can be executed in a multithreaded manner without changing the results of the program. The MULTITHREADABLE pragma can be applied to blocks and for loops in which the statements or iterations are independent of each other. The multithreaded execution is equivalent to sequential execution in this case. It is not a directive that the BLOCK or FOR LOOP must be executed in a multithreaded manner.

As a simple example, consider the following program to sum the elements of a two-dimensional array:

void SumElements (float A[N] [N] , float *sum, int rmmThreads)

{ int i, - float rowSum.N]; #pragma multithreadable mapping (blocked (numThreads ) ) for ( i = 0 ; i < N; i++) { int j ; rowSum [ i ] = 0 . 0 ; for (j = 0 ; j < N; j ++) rowSumfi] = ro Sum[i] + A[i] [j ] ;

}

*sum = 0.0; for (i = 0; i < N; i+4-)

*sum = *sum + rowSum[i];

Multithreaded execution of the FOR loop is equivalent to sequential execution because the iterations all modify different RO SUM[I] and J variables. The arguments following the pragma indicate that multithreaded execution should assign iterations to NUMTHREADS different threads using a blocked mapping. There is a rich set of options that control the mapping of iterations to threads.

Therefore, the Multithreaded C preprocessor has two modes: sequential mode in which the MULTITHREADABLE pragma is ignored, and multithreaded mode in which the MULTITHREADABLE pragma is transformed into Sthreads calls. Programs can be developed, tested, and debugged in sequential mode, then executed in multithreaded mode for performance. In addition, performance analysis and tuning can often be performed in sequential mode.

Determinacy of results is an important consequence of the equivalence of multithreaded and sequential execution. If sequential execution is deterministic (which is usually the case) , multithreaded execution must also be deterministic. Determinacy is usually desirable, since program development and debugging can be difficult when different runs produce different results. In other multithreaded programming systems, determinacy is difficult to ensure. For example, locks, semaphores, and many-to-one message passing almost always introduce race conditions and hence nondeterminacy. However, nondeterminacy is important for efficiency in some algorithms, e.g., branch-and-bound algorithms.

Multithreaded and multithreadable code constructs are integrated in this system. The programming system incorporates both explicitly multithreaded constructs which must be executed according to the multithreaded semantics, along with multithreadable constructs which may selectively executed according to their sequential or multithreaded semantics. The multithreaded constructs are generally used to express multithreaded algorithms that have no sequential equivalent. This can include controlling different hardware that haw no integration with one another, or controlling simultaneous different windows in a graphical user interface.

Multithreadable constructs are used to express the opportunity to use multithreading to speed up the execution of a computationally-intensive algorithm, by using multiple threads on multiple processors.

By using both multithreaded and multithreadable constructs, the operating system can use one thread to control each window with multithreaded constructs, and the output to each window, within a window, is computed faster with the multiple threads using multithreadable constructs.

As described above, the synchronization can be carried out by an entry S-COUNTER, or an S-FLAG. Each are defined to have certain constraints.

An S-COUNTER, defined in the context of the C programming language, is diagrammed in Figure 3. It can be defined as a type definition and a set of interface functions. The counters are encapsulated as a class in an object-oriented language such as C++ or Java. The definition of the fundamental programming interface for S-COUNTERS is as follows:

typedef counter type definition Counter; int InitializeCounter (Counter *c) ; /* Initializes value (c) to zero. */

/* Must be called only once, before all other operations on counter c. */ int FinalizeCounter (Counter *c) ;

/* Must be called only once, after all other operations on c . */ int Chec Counter (Counter *c, unsigned int level);

/* Suspends until value (c) greaterorequal level. */ int IncrementCounter (Counter *c, unsigned int amount); /* Increases value (c) by amount. */ An S-COUNTER object c implicitly has a nonnegative integer attribute value (c), which can only be accessed through the interface functions. The INITIALIZE function at 300 initializes value (c) to zero or some initial value. Importantly, the counter is monotonic, as illustrated in 310. No decrement function is defined. Its value monotonically increases.

Any attempt to CHECK the counter, shown as step 320, suspends the calling thread at 325. This prevents a condition which can catch or miss some action occurring during the check operation. Each S-COUNTER maintains a dynamic list of thread suspension queues 330, with one queue for each value on which at least one Check operation is suspended.

CHECK compares value (c) to level and suspends until value (c) becomes greater than or equal to level. This is generically shown as AWAKE in step 340. Increment at 310 atomically increases value (c) by amount, thereby reawakening all Check operations suspended on values less than or equal to the new value (c) .

All the functions can return an error code. Possible error conditions include invalid arguments, operations on an uninitialized counter, and counter overflow. The type definition described above is carefully selected to remove the possibility of race conditions occurring on counter synchronization. There is no DECREASE operation. Therefore, the value of an S-COUNTER is monotonically increasing. There is no possibility of a CHECK operation missing an INCREMENT operation since check suspends the thread. There is no PROBE or nonblocking CHECK operation. It is recognized by the inventor that any instantaneous value may depend on the relative timing of the individual threads. Therefore, no operation can be based on the instantaneous value of an S-COUNTER.

A RESET operation can also be used to efficiently reuse counters between different phases of a program.

Alternatively, the old counters can be deleted and recreated as new counters . RESET simply resets value (c) back to zero. However, to avoid the possibility of race conditions, RESET must not be called concurrently with any other operation on the same counter. RESET ends the process, and is not intended as a means of synchronization between threads.

Another thread synchronization object is a special flag, called an S-FLAG. S-FLAGS, like S-COUNTERS, have restricted allowed operations within the multithreadable code concept. S-FLAGS support SET and CHECK operations. Initially, an S-FLAG is not set. A SET operation on an s- FLAG atomically sets the flag. A CHECK operation on a flag suspends until the flag is set. Once an S-FLAG is set, it remains set . Flags and counters are provided to provide deterministic synchronization within multithreadable constructs, as previously described.

In summary of the above, an S-COUNTER object has the following operations (expressed in the C programming language) :

Initialize (Counter *c) Finalize (Counter *c)

Increment (Counter *c, unsigned int amount) Check (Counter *c, unsigned int value) Reset (Counter *c)

The Initialize operation initializes the Counter object and sets its value to zero. The Finalize operation destroys the Counter object. An Increment operation increases the value of the Counter object by amount. A Check operation suspends the calling thread until the value of the Counter object is at least value. A Reset operation resets the value of the Counter object to zero. In the following simple example, a "producer thread" produces items and writes them to a buffer. A group of one or more concurrently executed "consumer threads" each independently reads the items from the buffer. The key synchronization issue is to prevent the consumer threads from reading items from the buffer that have not yet been written by the producer thread. The following program fragment gives implementations of the producer thread and consumer threads (in the C programming language) using a counter for synchronization.

Counter count;

Item buffer [NUM_ITEMS] ;

ProducerThread(int blockSize)

{ int index = 0, c = 0; while (index < NUM_ITEMS) { buffer [index] = Produce (); index = index + 1; c = c + 1; if (c == blockSize) {

Incremen (count, blockSize); c = 0; }

} }

ConsumerThread(int blockSize)

{ int index = 0, c = blockSize; while (index < NUM_ITEMS) { if (c == blockSize) {

Check (count, index + blockSize); c = 0;

} Consume (buffer [index] ) ; index = index + 1; c = c + 1; } } After writing a block of items to the buffer, the producer thread increments the S-COUNTER. Before reading a block of items from the buffer, a consumer thread checks the counter. If the next block of items has not yet been written to the buffer, the consumer thread suspends until enough items have been written. The program does not require that the producer and consumer threads all use the same blockSize values. The monotonicity of counters helps guarantee deterministic synchronization and the equivalence of multithreaded and sequential execution.

If shared variables are guarded against concurrent operations, a program that uses only counter synchronization can produce deterministic results on all executions. Moreover, if sequential execution of the program (i.e., execution ignoring the MULTITHREADED keyword) does not deadlock, multithreaded execution is guaranteed not to deadlock and to produce the same results as sequential execution. These properties are extremely useful in the testing and debugging of multithreaded programs .

Even in the absence of concurrent operations on shared variables, traditional synchronization mechanisms can introduce nondeterminacy into a program through timing dependent race conditions between threads. For example, consider the following program that uses a lock:

multithreaded {

{ AcquireLock(&xLock) ; x = x+l; ReleaseLock (&xLock) ; } { AcquireLock(&xLock) ; x = x*2; ReleaseLock (&xLock) ; }

Even though there are no concurrent operations on x, the resulting value of x is nondeterministic because of the race condition on the order in which the two threads acquire the lock. In contrast, because counters are monotonic, once a synchronization condition is enabled it remains enabled, and there is no possibility of a race condition to catch or miss a particular counter value. For example, consider the following program that uses a counter:

multithreaded {

{ CheckCounter (&xCount, 0); x = x+l; IncrementCounter (&xCount, 1); } { CheckCounter (kxCount , 1); x = x*2;

IncrementCounter (kxCount, 1); }

}

The resulting value of x is deterministic, because the CHECKCOUNTER operations will succeed in the same order in all executions, therefore the operations on x will occur in the same order. Moreover, since sequential execution does not deadlock, multithreaded execution cannot deadlock and will always produce the same results as sequential execution. Programs that use only counter synchronization can still be erroneously nondeterministic if they do not guard against concurrent access to shared variables . For example , consider the following program using a counter :

multithreaded {

{ CheckCounter (&xCount , 0 ) ; x = x+l ; IncrementCounter (kxCount , 1) ; }

{ CheckCounter (kxCount , 0 ) ; x = x*2 ; IncrementCounter (ScxCount , 1) ; } }

The result of the program is nondeterministic because of the possibility of concurrent execution of the operations on x . The nondeterminacy is caused by concurrent access to a shared variable , not by a synchronization race condition .

As a simple example , consider the following program to sum the elements of a two-dimensional array :

void SumElements ( float [N] [N] , f loat *sum, int numThreads) { int i ;

SthreadCounter counter;

SthreadCounterlnitialize (&counter) ; #pragma multithreadable mapping (blocked (numThreads ) ) for (i = 0; i < N; i++) { int j ; float rowSum; rowSum = 0.0; for (j = 0; j < N; j++) rowSum = rowSum + A[i] [j] ; SthreadCounterCheck(&counter, i) ; *sum = *sum + rowSum; SthreadCounterlncrement (kcounter, 1) ; }

SthreadCounterFinalize (kcounter) ; Without the counter operations, multithreaded execution of the for loop would not be equivalent to sequential execution, because the iterations all modify the same *sum variable. However, the counter operations ensure that multithreaded execution is equivalent to sequential execution. In sequential execution, the iterations are executed in increasing order and the STHREADCOUNTERCHECK operations succeed without suspending. In multithreaded execution, the counter operations ensure that the operations on *sum occur atomically and in the same order as in sequential execution. Iteration i suspends at the STHREADCOUNTERCHECK operation until iteration i - 1 has executed the STHREADCOUNTERINCREMENT operation. Conditions are carved out to prevent concurrent access to shared variables using counters. Essentially, each pair of operations on a shared variable must be separated by a transitive chain of counter operations. If these conditions can be shown to hold in any one execution of the program, they must hold in all executions of the program. Therefore, if sequential execution satisfies the conditions, multithreaded execution is also guaranteed to satisfy the conditions, hence produce the same results as sequential execution. This result forms the basis of a powerful methodology for developing multithreaded programs using sequential reasoning, testing, and debugging techniques.

All the programs using counters so far discussed satisfy the conditions on shared variables, therefore are guaranteed to be deterministic. In addition, the program examples described herein have equivalent multithreaded and sequential execution. The cost of increased determinacy is decreased concurrency. Synchronization using counters provides an effective means of controlling this tradeoff between determinacy and concurrency.

Counters can also be used as a stronger form of lock synchronization, providing sequential ordering in addition to mutual exclusion on a critical section. With the traditional implementation of mutual exclusion using a pair of lock operations, the order in which threads enter the critical section is nondeterministic. This is desirable in terms of maximizing concurrency, but is undesirable in terms of reasoning, testing, and debugging, and simply might not satisfy the desired program specification. Replacing the pair of lock operations with a pair of counter operations can guarantee deterministic results, at the cost of decreased opportunities for concurrency. Consider the computation of a result object formed by accumulating a series of independent subresults that are computed concurrently. For example, the result object could be a linked list and the accumulate operation could be an append, or the result object could be a summation and the accumulate operation could be an addition. Mutual exclusion is required to prevent interference between multiple concurrent accumulate operations on the result object . The following program implements the computation with one thread computing each subresult, and a pair of lock operations to provide mutual exclusion:

Compositeltem result; Lock resultLock;

InitializeLock(&resultLock) ; multithreaded for (i = 0; i < N; i++) {

Singleltem subresult = compute (i);

AcquireLock(&resultLock) ; accumulate (&result, subresult);

ReleaseLoc (kresultLock) ;

}

FinalizeLock (&resultLock) ; Only one thread can hold RESULTLOCK at any given time, thereby ensuring mutual exclusion of the ACCUMULATE operations. However, if the accumulate operation is not associative and determinacy of results is desired, some other mechanism is required to ensure sequential (or at least deterministic) ordering, in addition to mutual exclusion. For example, neither appending an item to a linked list, nor floating point addition are associative operations . With both these examples , the above program may produce different results on repeated executions . The following program implements the computation with the pair of lock operations replaced with a pair of counter operations , to provide both mutual exclusion and sequential ordering :

Compositeltem result ; Counter resultCount ; InitializeCounter (&resultCount) ; multithreaded for ( i = 0 ; i < N; i++) {

Singleltem subresult = compute (i) ;

CheckCounter ( kresultCount , i) ; accumulate ( kresult , subresult) ; IncrementCounter (&resultCount , 1) ;

}

FinalizeCounter (&resultCount) ;

As with the lock program, only one ACCUMULATE operation can execute at any given time. However, the accumulate operations are now additionally constrained to execute in sequential order. RESULTCOUNT [i] = i indicates that thread i-l has completed its accumulate operation.

The counter program has greater determinacy at the cost of less concurrency. With the lock program, an ACCUMULATE operation can execute concurrently with compute operations in all other threads. With the counter program, an ACCUMULATE operation can only execute concurrently with compute operations in higher numbered threads.

The optimal tradeoff between determinacy and concurrency has to be made on a case by case basis. Counters are a powerful mechanism for providing sequential ordering on top of mutual exclusion in the many cases where determinacy is important and the performance consequences of less concurrency are not great .

The Sthreads Interface

The code produced according to the present application can be expressed using the Multithreaded C pragma notation. As described previously, there is a direct correspondence between the pragma notation for thread creation and the Sthreads library functions that support thread creation. As a simple example, the following is a program implemented using Sthreads:

typedef struct { float (*A) [N] ; float *sum;

SthreadCounter *counter; } LoopArgs ; void LoopBody(int i, int notusedl, int notused2, LoopArgs *args)

{ int j ; float rowSum; rowSum = 0.0; for (j = 0; j < N; j++) rowSum = rowSum + (args->A) [i] [j] ; SthreadCounterCheck (args->counter, i) ; *(args->sum) = *(args->sum) + rowSum;

SthreadCounterlncrement (args->counter, l) ; } void SumElements (float A[N] [N] , float *sum, int numThreads) { int i ;

SthreadCounter counter; LoopArgs args ;

SthreadCounterlnitialize (^counter) ; args.A = A; args. sum = sum; args. counter = &counter; SthreadRegularForLoop (

(void (*) (int, int, int, void *) ) LoopBody,

(void *) &LoopArgs, 0, STHREAD_C0NDITI0N_LT, N, 1,

1, STHREAD_MAPPING_BLOCKED, numThreads,

STHREAD_PRIORITY_PARENT, STHREAD_STACK_SIZE_DEFAULT) ; SthreadCounterFinalize (tcounter) ;

Although this program is syntactically more complicated than the Multithreaded C version, it is considerably less complicated than the same program expressed using Windows NT threads. The mechanics of creating threads, assigning iterations to threads, and waiting for thread termination is handled within the Sthreads library call.

The Sthreads multithreaded programming system is implemented as a transparent add-on to an existing program development system, e.g., a compiler or interpreter, or other program development environment. The notation and implementation allows multithreaded and multithreadable code constructs to be directly translated into a high-level structured thread library. This translation is implemented as a preprocessor that can be transparently called prior to the standard compilation phase in an existing program development system. For example, when integrated with the Microsoft Visual C++ programming system, the standard CL (Compiler- Linker) is replaced by a special Sthreads tool that calls the Sthreads preprocessor on program files, then calls the standard (renamed) Visual C++ CL.

Integration of Sthreads with existing programming systems allows programmers new flexibility without adopting new programming systems to use the power of multithreading. They can use their standard editor, debugger, compiler, etc., and simply add Sthreads to the system. It also means that Sthreads piggybacks on the quality of code generation and error analysis of the underlying development system.

Preprocessing had previously been used for many kinds of program "source-to-source" transformations. Sthreads in contrast, implements a full-fledged, sophisticated multithreaded programming system by using a preprocessing integrated with a standard program development environment . One implementation has been created in the ANSI C language, thereby defining a "Multithreaded C" language. A structured thread library (Sthreads) is called by the languages. In both Multithreaded C and Sthreads, thread creation constructs are multithreaded variants of sequential "block" (i.e., sections of program code distinctly defined by conventional program constructs) and "for loop" constructs. In the Multithreaded C implementation, these constructs are supported as pragma annotations to a sequential program. With Sthreads, exactly the same constructs are supported as library calls. At both levels, synchronization objects and operations are supported as Sthreads library calls. In this embodiment, the Sthreads library for Windows NT can be implemented as a very thin layer on top of the

Win32 thread API. As a consequence, there is essentially no performance overhead associated with using Sthreads or Multithreadable C, as compared to using Win32 threads directly. Multithreaded C is implemented as a portable source- to-source preprocessor that directly transforms annotated blocks and for loops into equivalent calls to the Sthreads library. The programmer has the option of either using the pragma annotations and preprocessor or making Sthreads calls directly. The Sthreads library and Multithreaded C preprocessor are integrated with Microsoft Developer Studio Visual C++. Building a project preferably automatically invokes the preprocessor where necessary and links with the Sthreads library. Multithreadable blocks and for loops are implemented as a sequence of CREATETHREAD calls followed by a WAITFOR- SINGLEOBJECT call on an event. Terminating threads perform an INTERLOCKEDDECREMENT call on a counter, and the last thread to terminate performs a SETEVENT call on the event. Flags are implemented directly as Win32 events. Counters are implemented as linked lists of Win32 events, with one event for every value on which some thread is waiting. Locks are implemented directly as Win32 critical sections. Barriers are implemented as a pair of Win32 events and a Win32 critical section.

An important issue of the multithreading operation comes about when considering multiple processors. The hardware and operating systems of modern technology allow for multiprocessor systems. Current operation in multiprocessor systems, however, have often simply operated on one but not the other processor. By multithreading in this way, the different threads can actually be executed on the different processors. In operation, when a multithreading indicator (such as a "compile as multithreaded" flag/button/environment- variable) is set, both multithreadable and multithreaded blocks/loops are compiled to multithreaded code. When the multithreading indicator is not set, the multithreaded blocks/loops are compiled to multithreaded code and the multithreadable blocks/loops are compiled into ordinary sequential code. This allows a programmer to mix constructs that only have a multithreaded meaning (e.g., real-time control and systems programming uses of threads) with constructs that can be compiled into threads for multiprocessor performance or compiled into equivalent sequential code when developing and debugging. The invention allows a program to run as fast as a sequential program on one processor, but significantly faster on multiprocessors, without recompilation, relinking, or reconfiguration. The invention thus allows a program to adapt dynamically to changing resources. Use of monotonic flags and monotonic counters makes embodiments of the invention reliable and timely.

The mapping of Statements/Iterations onto threads is relatively simple. One thread is used for each statement/chunk, or for a small number of statements/chunks . A Typical for loop may have thousands or millions of iterations. The Overhead associated with assigning units of work to threads is significant. The present application defines assigning the iterations in contiguous "chunks" . Significant unit of work should be performed by each chunk. For example :

#PRAGMA MULTITHREADABLE CHUNKSIZE ( 1000 ) , MAPPING (BLOCKED (T) )

FOR (i = 0; I < N; I + +)

A [i] = B[ι] ;

Interaction of Chunksize and Mapping is described in the following example:

#PRAGMA MULTITHREADABLE CHUNKSIZE (2) , MAPPING (BLOCKED (4 ) ) FOR (I = 50; I >= 10; I = I - 2)

DOSOMETHING ( I ) ;

The Complete Sthreads Library includes a number of statements :

Processor Management: STHREADSGETNUMPROCESSORS PRESENT,

STHREADSSETNUMPROCESSORSUSED , STHREADSGETPROCESSORSPRESENT , STHREADSSETPROCESSORSUSED . Thread Creation: STHREADSBLOCK , STHREADSREGULARFORLOOP .

Thread Scheduling: STHREADSGETCURRENTPRIORITY, STHREADSSETCURRENTPRIORITY .

Flags: STHREADSFLAGINITIALIZE, STHREADSFLAGFINALIZE, S THREADS FLAGS ET, STHREADSFLAGCHECK, STHREADSFLAGRESET . Counters: STHREADSCOUNTERINITIALIZE, STHREADSCOUNTERFINALIZE, STHREADSCOUNTERINCREMENT, STHREADSCOUNTERCHECK, STHREADSCOUNTERRESET.

Locks: STHREADSLOCKINITIALIZE, STHREADSLOCKFINALIZE, STHREADSLOCKACQUIRE, STHREADSLOCKRELEASE . Barriers: STHREADSBARRIERINITIALIZE, STHREADSBARRIERFINALIZE, STHREADSBARRIERPASS , STHREADSBARRIERRESET .

Examples of computer program code implementing each of these constructs are set forth in the appendix.

FIG. 4 is a process flowchart showing a method for compiling multithreadable code in accordance with one embodiment of the invention. The computer program source code text 400 includes annotations defining multithreadable code constructs (and, optionally, multithreaded code constructs) and any necessary processor management, thread creation, and synchronization constructs (such as monotonic flags and counters). If a multithreading indicator is set 401, the source code text 400 is processed by a pre-processor 402 that parses the source into an expanded computer program text. The expanded computer program text includes inserted calls to an Sthreads library 406 to invoke multithreaded program operations wherever a source code annotation called for multithreadable functionality. A conventional compiler 406 then communicates with a linker 410 which links pre-existing routines from the Sthreads library 406 with the output of the compiler to create an executable module 412.

If the multithreading indicator is not set 401, the original computer program source code text 400 is compiled and linked in conventional fashion, with each section of multithreadable code constructs compiled as sequentially executing code. Annotations not recognized by the compiler 408 are ignored. A convenient implementation shortcut that permits ready use of conventional compilers and linkers is to rename a pre-existing compiler-linker executable file to a new name, and assign the old name of the compiler- linker executable file to the pre-processor. The pre- processor then can call the compiler-linker executable file when needed by invoking the new name.

Synchronization Using Locks

Locks are provided to express nondeterministic synchronization, usually mutual exclusion, within multithreaded BLOCKS and FOR loops. Sthread locks support the usual ACQUIRE and RELEASE operations. The order in which concurrent ACQUIRE operations succeed is nondeterministic. Therefore, there is very little use for locks within multithreadable blocks and FOR loops . As a simple example , consider the following program to sum the elements of a two-dimensional array :

void SumElements (float A [N] [N] , float *sum, int numThreads) { int i ; SthreadLock lock;

SthreadLocklnitialize (Sclock) ; #pragma multithreaded mapping (blocked (numThreads) ) for (i = 0 ; i < N; i++) { int j ; float rowSum; rowSum = 0 . 0 ; for (j = 0 ; j < N; j ++) rowSum = rowSum + A [i] [j ] ; SthreadLockAcquire (&lock) ; *sum = *sum + rowSum; SthreadLockRelease ( &lock) ; }

SthreadLockFinalize (&lock) ;

Like the flag operations in the program, the lock operations in this program ensure that the operations on *SUM occur atomically . However , unlike the flag operations , the lock operations do not ensure that the operations on *SUM occur in the same order as in sequential execution, or even in the same order each time the program is executed . Therefore , since floating-point addition is not associative , the program may produce different results each time it is executed . However, because execution order is less restricted, this program allows more concurrency than the program described above . This is an example of the commonly-occurring tradeoff between determinacy and efficiency.

Synchronization Using Barriers S-thread barriers are provided to express collective synchronization of a group of threads in cases when thread termination and recreation is too expensive. The barriers described herein support the usual PASS operation. All the threads in a group must enter the PASS operation before all the threads in the group are allowed to leave the Pass operation. In current systems, the cost of N threads executing a PASS operation is less than the cost of creating and terminating N threads. Therefore, a typical use of barriers is to replace a sequence of multithreadable loops with a single multithreaded loop containing a sequence of barrier PASS operations. However, with modern lightweight thread systems such as Windows NT, we are discovering that barriers are required for efficiency in very few circumstances. A number of examples are described herein.

Trivial Example: Independent Iterations

INT ARRAYSUM (FLOAT A[N] [N] )

/* SUMS THE ELEMENTS OF A 2 -DIMENSIONAL ARRAY. */

{ FLOAT SUM, ROWSUM [N] ;

INT I; SUM = 0.0; #PRAGMA MULTITHREADABLE FOR (i = 0 ; I < N; I++) { INT J; R0WSUM [I] = 0.0;

FOR (J = 0 ; J < N; J++)

ROWSUM[I] = ROWSUM [I] + A[ι] [J] ;

}

FOR (i = 0; I < N; I++) SUM = SUM + ROWSUM [I] ;

RETURN SUM;

A more difficult example is shown in the following.

INCORRECT EXAMPLE : NONDETERMINACY

INT ARRAYSUM (FLOAT A [N] [N] )

/* SUMS THE ELEMENTS OF A 2 -DIMENSIONAL ARRAY. */ { FLOAT SUM;

INT I ; STHREADSLOCK SUMLOCK;

STHREADSLOCKINITIALIZE (&SUMLOCK) ; SUM = 0.0 ;

#PRAGMA MULTITHREADABLE FOR (i = 0; I < N; I + +) { INT J; FLOAT ROWSUM = 0.0; FOR (J = 0; J < N; J++)

ROWSUM = ROWSUM + A[l] [j] ; STHREADSLOCKACQUIRE (&SUMLOCK) ; SUM = SUM + ROWSUM; STHREADSLOCKRELEASE (&SUMLθCK) ; }

STHREADSLOCKFINALIZE (&SUMLOCK) ; RETURN SUM;

}

INT ARRAYSUM ( FLOAT A [N] [N] ) /* SUMS THE ELEMENTS OF A 2 -DIMENSIONAL ARRAY. */

{

FLOAT SUM; INT I ;

STHREADS COUNTER SUMCOUNT; STHREADSCOUNTERINITIALIZE ( &SUMCOUNT ) ; SUM = 0 . 0 ;

#PRAGMA MULTITHREADABLE FOR ( I = 0 ; I < N ; I + + ) { INT J ;

FLOAT ROWSUM = 0 . 0 ;

FOR ( j = 0 ; J < N ; J++ )

ROWSUM = ROWSUM + A [ l ] [j] ;

STHREADSCOUNTERCHECK ( SCSUMCOUNT , i ) ; SUM = SUM + ROWSUM ;

STHREADSCOUNTERINCREMENT ( &SUMCOUNT , 1 )

}

STHREADSCOUNTERFINALIZE ( &SUMCOUNT) ;

RETURN SUM;

As can be seen, iterations cannot be executed as separate threads because of nondeterminacy in the top. However, the counters allow determinacy between the system therefore enabling the system to be multithreaded.

Single-Writer Multiple-Reader Broadcast

Counters can be used to provide elegant, flexible, and efficient dataflow synchronization between a single writer and multiple readers of a sequence of items written to an array. In this synchronization pattern, reading an item does not remove it from the sequence—each reader independently reads the entire shared array. Because a counter has multiple thread suspension queues, a single counter object can be used to synchronize the writer thread and any number of completely independent reader threads, with each thread potentially having a different granularity of synchronization. The writer thread incrementing the counter broadcasts the availability of data to the entire set of reader threads . The following program demonstrates the single-write multiple-reader broadcast pattern with synchronization on every item:

void Writer (Item *data, int n, Counter *dataCount)

{ int i ; for (i = 0; i < n; i++) { data[ij = Generateltem(i) ;

IncrementCounter (dataCount , 1); i ' void Reader (Item *data, int n, Counter *dataCount) { int i ; for (i = 0; i < n; i++) {

CheckCounter (dataCount , i + 1) ,- Useltem (data [i] ) ; }

Item data [N] ; Counter dataCount; int r;

InitializeCounter (kdataCount) ; multithreaded {

Writer(data, N, dataCount); multithreaded for (r = 0; r < numReaders; r++)

Reader(data, N, dataCount);

}

FinalizeCounter (SdataCount) ;

One WRITER thread and an arbitrary number of Reader threads are executed concurrently, with communication through the shared data array, and synchronization through the DATACOUNT counter. At any point, some Reader threads may be suspended in their CHECKCOUNTER operation, waiting for the Writer thread to increment DATACOUNT, while other Reader threads may be reading data items that have previously been written. The READER threads execute independently of each other and do not synchronize their actions in any manner . The synchronization pattern is strictly a one-to-many broadcast from the WRITER thread to the READER threads . Synchronization on every item that is written and read may be too expensive if the time taken to generate and use an item is too small . The single-reader multiple- writer broadcast pattern can be generalized to allow the writer and each reader thread to synchronize on a block of items instead of on individual items . The following program adds an individual granularity of blocked synchronization to the writer and each reader thread :

void Writer (Item *data, int n, Counter dataCount , int blockSize) { int i ; for (i = 0 ; i < n; i++) { data [i] = Generateltem (i) ; if ( (i + 1) %blockSize == 0)

IncrementCounter (dataCount , blockSize) ;

} IncrementCounter (dataCount , n- (n/blockSize) blockSize) ;

} void Reader (Item *data, int n, Counter *dataCount, int blockSize)

{ int i ; for (i = 0 ; i < n; i++) { if ( i%blockSize == 0 )

CheckCounter (dataCount , min ( i+blockSize , n) ) ; Useltem (data [i] ) ; }

}

The WRITER and READER threads now increment and check the DATACOUNT counter in multiples of BLOCKSIZE and write and read the data array in blocks of items . There is no requirement that BLOCKSIZE be the same in all threads .

Dif ferent threads can be passed different BLOCKSIZE based on their individual performance characteristics and requirements. This pattern is now extremely flexible and easily adaptable with regard to practical performance tuning . The single-writer multiple-reader broadcast pattern is a dataflow synchronization pattern that occurs in many diverse applications of threads to multiprocessing. For example, in the Paraffins Problem, an array of molecules of a certain size can be generated by one thread and concurrently read by other threads that in turn generate arrays of larger molecules. The pattern is very different from, for instance, the multiple-writers multiple-readers bounded-buffer problem, which is elegantly solved using semaphores. Just as counters are not well suited to implementing bounded buffers, semaphores and other traditional synchronization mechanisms are not well suited to implementing the single-writer multiple-reader broadcast pattern.

Another Example Application: Aircraft Route Optimization The Aircraft Route Optimization Problem is part of the U.S. Air Force Rome Laboratory C3I Parallel Benchmark Suite. For this application, we achieved better performance using Sthreads on a quad-processor Pentium Pro system running Windows NT than the best reported results for message-passing programs running on expensive Cray and SGI supercomputers with up to 64 processors. The flexibility of shared-memory, lightweight multithreading, and sequential development methods allowed us to develop a much more sophisticated and efficient algorithm than would be possible on a message-passing supercomputer. The C3I Parallel Benchmark Suite

The U.S. Air Force Rome Laboratory C3I Parallel Benchmark Suite consists of eight problems chosen to represent the essential elements of real C3I (Command, Control, Communication, and Intelligence) applications. Each problem consists of the following:

A problem description giving the inputs and required outputs.

An efficient sequential program (written in C) to solve the problem.

The benchmark input data.

A correctness test for the benchmark output data. For some of the problems, a parallel message-passing program is also provided. Rome Laboratory maintains a publicly accessible database of reported performance results . The C3I Parallel Benchmark Suite provides a good framework for evaluating our structured multithreaded programming system. The problems are computationally intensive and involve a variety of complex algorithms and data structures. The sequential program provides us with a good starting point and a fair basis for performance comparison. The performance database allows us to compare our results with those of other researchers. For these reasons, we are developing multithreaded solutions to several of the C3I Parallel Benchmark Suite problems.

The task in the Aircraft Route Optimization Problem is to find the lowest -risk path for an aircraft from an origin point to a set of destination points in the airspace over an uneven terrain. The risk associated with each transition in the airspace is determined by its proximity to a set of threats. The problem involves realistic constraints on aircraft speed and maneuverability. The aircraft is also constrained to fly above the underlying terrain and beneath a given ceiling altitude.

The problem is essentially the single-source, multiple-destination shortest path problem with a large, sparsely connected graph. The airspace for the benchmark is 100 km by 100 km in area and 10 km in altitude, discretized at 1 km intervals. The 100,000 positions in space correspond to 2,600,000 nodes in the graph, since each position can be reached from 26 different directions. Because of aircraft speed and maneuverability constraints, each node is connected to only nine or ten geographically adjacent nodes. Therefore, the graph consists of approximately 2.6 million nodes and 26 million edges.

The sequential algorithm to solve the Aircraft Route Optimization Problem is based on a queue of nodes.

Initially the queue is empty except for the origin node. At each step, one node is removed from the queue. Valid transitions from this source node to all adjacent destination nodes are considered. For each destination node, if the path to the node via the source node is shorter than the current shortest path to the node, the path to the node is updated and the node added to the queue. The algorithm continues until the queue is empty, at which stage the shortest paths to all reachable nodes have been computed.

The queue is ordered on path length so that shorter paths are expanded before longer paths. This has a significant effect on performance. Without ordering, longer paths are expanded, then discarded when shorter paths to the same points are expanded later in the computation. However, whether the queue is ordered, partially ordered, or unordered does not affect the results of the algorithm. The most straightforward approach to obtaining parallelism in the Aircraft Route Optimization Problem is to geographically partition the airspace into blocks, with one thread (or process) responsible for each block. Each thread runs the sequential algorithm on its own block using its own local queue and periodically exchanges boundary values with neighboring blocks. This approach is particularly appealing on distributed-memory, message-passing platforms, because memory can be permanently distributed according to the blocking pattern. If the threads execute a reasonably large number of iterations between boundary exchanges, good load balance can be achieved.

The problem with this algorithm is that, as the number of blocks/threads is increased the total amount of computation also increases. Therefore, any speedup is based on an increasingly inefficient underlying algorithm. At any time, the local queues in most blocks contain paths that are too long and are irrelevant to the actual shortest paths . The processors are kept busy performing computation that is later discarded. At any given time, it is only productive to work on an irregular and unpredictable subset of the graph. However, irregular and adaptive blocking schemes do not solve the problem, since there is usually equal work available in all blocks. The issue is the distinction between productive and unproductive work.

Our solution is to statically partition the airspace into a large number of blocks and to use a much smaller number of threads. A measure of the average path length is maintained with each local queue. At each step, the blocks with local queues containing the shortest paths are assigned to the threads. Therefore, the subset of blocks that are active and the assignment of blocks to threads change dynamically throughout program execution. This algorithm takes advantage of the symmetric multiprocessing model, in which all threads can access the entire memory space with uniform cost. It also takes advantage of the lightweight multithreading model to achieve good load balance, since the workload within each thread at each step is highly variable.

The ability to develop, test, and debug using sequential methods was crucial in the development of this sophisticated multithreaded algorithm. The entire program was tested and debugged in sequential mode before multithreaded execution was attempted. In particular, development of the complex boundary exchange and queue update algorithms would have been considerably more difficult in multithreaded mode.

The ability to analyze and tune performance using sequential methods was also very important. Good performance depended on exposing enough parallelism without significantly increasing the total amount of computation. We determined efficient values for the number of blocks, the number of threads, and the number of iterations between boundary exchanges by measuring computation times and operation counts of the multithreaded program in running in sequential mode. This detailed analysis would have been very difficult to perform in multithreaded mode. We avoided memory contention in multithreaded mode by avoiding cache misses in sequential mode. The analysis of memory access patterns in sequential mode is much simpler than in multithreaded mode.

All Pairs Shortest Paths Example

This example describes the algorithmic and performance advantages of counter synchronization. In the example, a counter is used as a less restrictive, and consequently more efficient, replacement for a barrier. The example program is a multithreaded solution to the all-pairs shortest -path problem using the Floyd-Warshall algorithm. Using traditional synchronization mechanisms, this problem can be solved using one barrier or, more efficiently, an array of condition variables. We show how the efficient solution can be implemented using a single counter instead of an array of condition variables. We give timing measurements comparing the performance of the barrier, condition variable, and counter algorithms.

The all -pairs shortest-path problem takes as input the edge-weight matrix of a weighted directed graph, and returns the matrix of shortest-length paths between all pairs of vertices in the graph. The graph is required to have no cycles of negative length, and the weight of the edge from a vertex to itself is required to be zero.

The following program solves the all -pairs shortest- path problem using the sequential Floyd-Warshall algorithm: VOID SHORTESTPATHSI ( INT EDGE [N] [N] , INT PATH [N] [N] )

INT K, I , J ;

Initially, PATH[I] [J] is assigned EDGE[I] [J] , for all i and J. (For brevity, we use a notational shorthand for array assignment.) After the kth iteration, PATH[I] [J] is the shortest path from vertex i to vertex J with intermediate vertices only in vertices 0 to k. Therefore, after N iterations, PATH[I] [J] is the shortest path from vertex i to vertex j with no restrictions on the intermediate vertices.

The following program solves the all-pairs shortest path problem using a multithreaded version of the Floyd- Warshall algorithm, with a barrier for thread synchronization:

void ShortestPaths2 (int edge [N] [N] , int path [N] [N] , int numThreads)

{ int t ; Barrier b ; path.O..N-1] [0..N-1] = edge[0..N-1] [0..N-1] ; InitializeBarrier (&b, numThreads), - multithreaded for (t = 0; t < numThreads; t++) { int k, i , j ; for (k = 0; k < N; k++) { for (i = t*N/numThreads; i < (t+1) *N/numThreads; i++) for (j = 0; j < N; j++) { int newPath = path[i] [k] + path [k] [j]; if (newPath < path[i] [j]) pathfi] [j] = newPath;

} PassBarrier (&b) ;

}

FinalizeBarrier (&b)

The multithreaded outer loop creates NUMTHREADS threads. Each thread executes the N iterations of the Floyd-Warshall algorithm on a subset of the rows of the path matrix. To keep the iterations synchronized, the threads pass through an N-way barrier at the end of each iteration. There are no sharing violations on the concurrent accesses to PATH across the threads, because the algorithm will never assign to PATH[I] [K] or PATH [K] [J] during iteration k.

The barrier algorithm successfully divides the work among an arbitrary number of threads. However, in requiring that all threads complete each iteration before any thread begins the next iteration, the algorithm does not express the full opportunities for concurrency inherent in the data dependencies. As a consequence, the program is less than optimally efficient. N-way synchronization at the barrier is a bottleneck that creates delays on entry and exit, and processor load imbalance can occur if all threads do not reach the barrier simultaneously.

A More Efficient Multithreaded Solution Using Condition Variable Synchronization

The following program solves the all -pairs shortest path problem using a more efficient multithreaded version of the Floyd-Warshall algorithm, with an array of N condition variables for thread synchronization:

void ShortestPaths3 (int edge [N] [N] , int pat [N] [N] , int numThreads ) int k, t;

Condition kDone [N] ; int kRow [N] [N] ; path[0..N-1] [0..N-1] = edge[0..N-1] [0..N-1] ; for (k = 0; k < N; k++) InitializeCondition(&kDone [k] ) ; kRow[0] = path[0] [0..N-1] ; SetCondition(&kDone [0] ) ; multithreaded for (t = 0; t < numThreads; t++) { int k, i, j ; for (k = 0; k < N; k++) {

CheckCondition(&kDone [k] ) ; for (i = t*N/numThreads; i < (t+1) *N/numThreads; i++) { for (j = 0; j < N; j++) { int newPath = pathfi] [k] + kRowfk] [j]; if (newPath < pathfi] [j]) path[i] [j] = newPath;

} if (i == k+1) { kRow [k+1] [0..N-1] = path [k+1] [0..N-1] _;

SetCondition(&kDone [k+1] ) ;

}

for (k = 0; k < N; k++) FinalizeCondition(SckDone [k] ) ; As with the barrier algorithm, each thread executes the N iterations of the Floyd-Warshall algorithm on a subset of the rows of the PATH matrix. However, each thread can individually continue with its next iteration as soon as the necessary data is available, instead of waiting for the previous iteration to complete in all the other threads . Condition variable KDONE [K] is set when row k of the PATH matrix has been computed in iteration κ-1. Each thread waits on KDONE [K] before executing iteration k. To avoid sharing violations, row k of the PATH matrix computed in iteration κ-1 is stored in κRow[κ] .

The condition variable algorithm avoids the inefficiencies associated with barrier synchronization. Threads synchronize individually, rather than in an N-way bottleneck, and faster threads can execute many iterations ahead of slower threads. Potentially, the N threads can be executing in up to N different iterations. One extra cost of this algorithm is the storage for the KROW matrix. However, the most significant extra cost is allocation of N condition variables.

The following program solves the all -pairs shortest path problem using the efficient multithreaded version of the Floyd-Warshall algorithm, with a single counter for thread synchronization in place of N condition variables: void ShortestPaths3 ( int edge [N] [N] , int path [N] [N] , int numThreads )

{ int k, t ; Counter kCount; int kRow [N] [N] ; path[0..N-1] [0..N-1] = edge[0..N-1] [0..N-1] ; InitializeCounter (&kCount) ; kRow[0] = path[0] [0..N-1] ; multithreaded for (t = 0; t < numThreads; t++) { int k, i, j; for (k = 0; k < N; k++) {

CheckCounter (&kCount , k) ; for (i = t*N/numThreads; i < (t+1) *N/numThreads; i++) { for (j = 0 ; j < N; j ++) { int newPath = path [i ] [k] + kRow [k] [j ] if (newPath < pat ϊi ] [j ] ) path [i ] [j ] ^■■ newPath; if (i == k+1) { kRo [k+1] [0..N-1] = path [k+1] [0..N- IncrementCounter (SkCount , 1) ;

FinalizeCounter (kkCount) ; }

Operations on N different values of the single counter replace operations on N different elements of the array of condition variables. The algorithm has the same performance advantages over the barrier algorithm, without the cost of statically allocating and maintaining N synchronization objects. Internally, the counter may create synchronization objects for the distinct counter values on which threads are suspended. However, in practice, the number of these objects in existence at any given time is likely to be a small fraction of N.

Three Synchronization Patterns Example Three examples of practical synchronization patterns are described that can be expressed more elegantly (and often more efficiently) using counters than with traditional synchronization mechanisms. For each of these synchronization patterns, a small example program is provided to demonstrate the pattern and a description of the importance of the pattern to real problems. This is far from an exhaustive list of patterns to which counters can usefully be applied. Counters are equally applicable to many other situations, particularly dataflow style synchronization patterns arising in the application of threads to multiprocessing.

Counters can often be used to replace traditional barrier synchronization with a less restrictive form of "ragged" barrier. With a ragged barrier, each thread waits at the barrier point only until its own individual data dependencies have been satisfied, instead of until the data dependencies of all threads have been satisfied, as with a traditional barrier. We have already given one example of this pattern in Section 0, with the multithreaded Floyd-Warshall algorithm to solve the all- pairs shortest-path problem. In this section, we give another more straightforward example, based on boundary exchange in a time-stepped simulation. Consider a time-stepped simulation of a one- dimensional object subdivided into N cells . The state of internal cell i at time t is a function of the states of cells i-1 , i , and i+1 at time t-1 . The states of the leftmost and rightmost cells remain constant over time . An example is simulation of heat transfer along a metal rod . Similar boundary exchange requirements occur in most multithreaded simulations of physical systems in one or more dimensions . These requirements are traditionally satisfied using barrier synchronization .

The following program implements the simulation using one thread for each cell , with traditional barrier synchronization between threads before cell state exchanges and updates at each time step :

float state [N] ;

Barrier b ; state [ o . . N- 1 ] = initial cell states, - InitializeBarrier (&b , N-2 ) : multithreaded for (i = 1 ; i < N-1 ; i++) { float leftState , rightState ; for (t = 1 ; t <= numSteps ; t++) { PassBarrier (&b) ; leftState = state [i-1] ; rightState = state [i+1] ;

PassBarrier (&b) ; state [i] = f (leftState , state [i] , rightState) ;

} ^} FinalizeBarrier (&b) :

All threads synchronize at the barrier twice every time step : once before exchanging cell states , and again before updating cell states . However, complete barrier synchronization between all threads is unnecessarily restrictive. The conditions for safely exchanging and updating the cell states involve dependencies between pairs of neighboring cells, not across all cells. As a consequence of using barriers, the performance of the program is potentially subject to synchronization bottleneck and load imbalance problems.

The following program implements the same simulation using an array of counters to provide ragged barrier synchronization between threads:

float state [N] ;

Counter c [N] ; state [o . . N- 1] = initial cell states,- for (i = 0 ; i < N; i++) Counterlnitialise (&c [i] ) ; IncrementCounter (&c [0] , 2*numSteps) ;

IncrementCounter (&c [N-1] ) , 2*numSteps) ; multithreaded for (i = 1 ; i < N-1 ; i++) { float leftState, rightState, myState = state [i] ; for (t = 1 ; t <= numSteps ; t++) { CheckCounter (&c [i-1] , 2*t-2) ) ; leftState = state [i-1] ;

CheckCounter (&c [i+1] , 2*t-2 ) ) ; rightState = state [i+1] ; IncrementCounter (&c [i] , 1) ; myState = f (leftState , myState , rightState) ; CheckCounter (&c [i-1] , 2*t-l) ; CheckCounter (&c [i+1] , 2*t-l) ; state [i] = myState,- IncrementCounter (&c [i] , 1) ;

for (i = 0 ; i < N; i++) FinalizeCounter ( &c [i] ) ;

As with the traditional barrier algorithm, the threads synchronize every time step before exchanging cell states , and again before updating cell states . However, the synchronization is between pairs of neighboring threads via an array of counters . c [ι] = 2*τ-l indicates that thread i has finished reading both neighboring cell states in time step T , and c [ι] = 2*τ indicates that thread i has completed time step T.

Pairwise synchronization removes the synchronization bottleneck of a traditional barrier and reduces load imbalance by allowing some threads to execute ahead of other threads . The barrier could be made even more ragged using separate counters to synchronize with left and right neighbors .

The major cost in the implementation of ragged barriers using counters is the need for N counter objects instead of one barrier object. However, the number of counters needed is proportional to the number of threads, not to the problem size. This cost is unlikely to be a practical problem on modern computer systems.

The present application can be used in multithreaded programming system, with any single or multiprocessor computers. Example multithreaded programming systems include Windows NT, UNIX/Pthreads and Java.

Other examples than those discussed above can of course be used. While the three examples discussed above are computationally intensive, other computationally intensive systems include volume rendering, terrain masking, threat analysis, protein folding, and molecular dynamics simulation. As can be seen from the above, the system of the present application is highly advantageous and produces significant advantages.

Although only a few embodiments have been disclosed in detail above, other modifications are possible, and would understood by those having ordinary skill in the art reading the application. For example, although this application has only described certain operating systems which capable of handling multiple threads, it should be understood that other operating systems could be provided. A non-exhaustive list of operating systems includes Windows NT, Windows 2000, Java, UNIX, Linux or any other type system.

All such modifications are intended to be encompassed within the following claims, in which:

#ifndef STHREADS_H #define STHREADS_H

#ifndef _WIN32

♦error ERROR: Win32 sthreads. h included in non-Win₃₂ program.

#endif tifndef _MT terror ERROR: Sthreads program must be linked with multithreaded libraries.

#endif

#ifdef cplusplus extern "C" {

#endif

/_*

/* Sthreads: A Structured Thread Library for Shared-Memory Multiprocessing

/* Version 1.0 for Windows NT

/*

/* Author: John Thornley, Computer Science Dept . , Caltech.

/* Date: September 1998.

/'

/.

/* Error codes

/"

Sdefine STHREADS_ERROR_NONE 0

#define STHREADS_ERROR_INPUTVALUE 1

#define STHREADS_ERROR_MEMORYALLOC 2

#define STHREADS_ERROR_THREADCREATE 3

#define STHREADS_ERROR_SYNCCREATE 4

#define STHREADS_ERROR_INITIALIZED 5

#define STHREADΞ_ERROR_UNINITIALIZED 6

#define STHREADS_ERROR_FINALIZED 7

#define STHREADS_ERROR_INUSE 8

#define STHREADS_ERROR_LOCKHELD 9 ttdefine STHREADS_ERROR_LOCKNOTHELD 10 #define STHREADS_ERROR_COUNTEROVERFLOW 11

#define STHREADS ERROR_UNSPECIFIED 12

/ * Requirements : */ /* - STHREADS_ERROR. .NONE == 0. */ /* - STHREADS_ERROR. .INPUTVA UE > STHREADΞ_ERROR. NONE . */ /* - ΞTHREADS_ERROR. MEMORYALLOC > ΞTHREADΞ_ERROR. .INPUTVALUE . */ /* - STHREADS_ERROR. THREADCREATE > STHREADS_ERROR_.MEMORYALLOC . /* - ΞTHREADS_ERROR. SYNCCREATE > STHREADΞ_ERROR~ THREADCREATE . */ /* - STHREADS_ERROR. .INITIALIZED > STHREADS_ERROR_ SYNCCREATE . */ /* - STHREADS_ERROR. .UNINITIALIZED > STHREADS_ERROR..INITIALIZED. */ /* - STHREADS_ERROR. .FINALIZED > ΞTHREADS_ERROR~.UNINITIALIZED. */ /* - STHREADS_ERROR. .INUΞE > STHREADS_ERROR..FINALIZED. */ /* - STHREADS_ERROR. LOCKHELD > STHREADS_ERROR. INUΞE . */ /* - STHREADS_ERROR. .LOCKNOTHELD > ΞTHREADΞ_ERROR..LOCKHELD . */ /* - STHREADS_ERROR. COUNTEROVERFLOW > STHREADS_ERROR_.LOCKNOTHELD . */ /* - STHREADS_ERROR. .UNSPECIFIED > STHREADΞ_ERROR..COUNTEROVERFLOW . */ /* - STHREADS ERROR..UNSPECIFIED < INT_MAX. */

/* Error string maximum length

/*

#define STHREADS_ERROR_STRING_MAX 100

/* Requirements:

/* - STHREADS_ERROR_STRING_MAX >= 1.

/* - STHREADS ERROR_STRING_MAX <= INT_MAX .

/* Processors

/*

#define STHREADS_PROCESSORS_MAX 32 #define STHREADS_PROCESSOR_YEΞ 1000 #define STHREADS_PROCESΞOR_NO 1001

/ * Requirements :

/* - STHREADS_PROCESSORS_MAX >= 1.

/* - STHREADS_PROCESSORS_MAX <= INT_MAX.

/* - STHREADS_PROCESSOR_YES >= INT_MIN.

/* - STHREADS_PROCESSOR_YES <= INT_MAX.

/* - STHREADS_PROCESSOR_NO >= INT_MIN.

/* - STHREADS_PROCESSOR_NO <= INT_MA .

/* - STHREADS_PROCESSOR_YES != STHREADS_PROCESΞOR_NO .

/* Definitions: */

/* - ValidProcessorStatus (p) = */

/* p == STHREADS_PR0CESSOR_PREΞENT | | */

/* p == STHREADΞ_PROCESΞOR_NOT_PRESENT . */

/*

/* Mappings of statements/iterations to threads

#define STHREADΞ_MAPPING_SIMPLE ₃000 tdefine STHREADS_MAPPING_DYNAMIC 001 #define ΞTHREADS_MAPPING_BLOCKED ₃002 tdefine ΞTHREADΞ_MAPPING_INTERLEAVED 00

/* Requirements:

/* - STHREADS_MAPPING_SIMPLE > 0.

/* - STHREADS_MAPPING_DYNAMIC == STHREADS_MAPPING_ΞIMPLE + 1.

/* - STHREADS_MAPPING_BLOCKED == STHREADΞ_MAPPING_DYNAMIC + 1.

/* - STHREADΞ_MAPPING_INTERLEAVED == ΞTHREADS_MAPPING_BLOCKED ^■

/* - ΞTHREADS_MAPPING_INTERLEAVED < INT_MAX .

/* Definitions: */

/* - ValidMapping(m) = '/

/* m == ΞTHREADS_MAPPING_SIMPLE | | */

/* m == STHREADS_MAPPING_DYNAMIC | |

/* m == STHREADS_MAPPING_BLOCKED j j

/* m == STHREADΞ_MAPPING_INTERLEAVED .

/* Conditions testable in regular for loop control

#define STHREADS_CONDITION_LT 4000 Sdefine ΞTHREADS_CONDITION_LE 4001 #define STHREADS_CONDITION_GT 4002 #define ΞTHREADΞ_CONDITION_GE 400₃

/* Requirements:

/ * - STHREADS_CONDITION_LT > 0.

/* - STHREADS_CONDITION_LE == STHREADS_CONDITION_LT 1.

/* - STHREADS_C0NDITION_GT == STHREADS_CONDITION_LE 1.

/* - STHREADS_CONDITION_GE == STHREADS_CONDITION_GT 1.

/* - STHREADS_C0NDITION_GE < INT_MAX.

/* Definitions:

/* - ValidCondition(c) =

/* c == STHREADS_CONDITION_LT | |

/* C == STHREADS_CONDITION_LE j j

/* C == STHREADS_CONDITION_GT | j

/* c == STHREADS_CONDITION_GE .

/* Stack sizes (in bytes) /_*

#define STHREADΞ_STACK_SIZE_MINIMUM 16₃84 #define STHREADS_STACK_SIZE_DEFAULT 262144

/ * Requirements :

/* - STHREADS_STACK_SIZE_MINIMUM >=

/* - STHREADS_STACK_ΞIZE_DEFAULT >= STHREADΞ_STACK_ΞIZE_MINIMUM.

/* - STHREADΞ_ΞTACK_SIZE_DEFAULT <- UINT_MAX . /* Definitions: */

/* - ValidStackSize(s) = */

/* s >= STHREADS_STACK_SIZE_MINIMUM. */

/*

/* Priorities

/*

#def e STHREADS_PRIORITY_LOWEST -2 #defιne STHREADS_PRIORITY_HIGHEST +2 #defιne STHREADΞ_PRIORITY_PARENT 10000 /* Inherit priority of parent thread

/ * Requirements :

I * - STHREADS_PRIORITY_LOWEST > INT_MIN.

/* - STHREADS_PRIORITY_HIGHEΞT >= ΞTHREADS_PRIORITY_LOWEST

/* - STHREADΞ_PRIORITY_HIGHEΞT < INT_MAX

/* - ΞTHREADS_PRIORITY_PARENT < STHREADΞ_PRIORITY_LOWEΞT | |

/* ΞTHREADΞ_PRIORITY_PARENT > STHREADS_PRIORITY_HIGHEΞT

/* Definitions. */

/* - ValidPrioπty (p) = ^«/

/* STHREADS_PRIORITY_LOWEST <= p && p <= STHREADS_PRIORITY_HIGHEST */

_/* _* /

/* Print error message to string *'

/. « void SthreadsWriteErrorMessage (int errorCode, char errorStrmg ( J ) ,

/* Input Arguments. */

/* - errorCode : error code returned by an Sthreads function call */

/* Output Arguments. */

/* - errorStr g : error message as a char string */

/* Preconditions: */

/* - errorStrmg ' = NULL && ^«/

/* errorStrmg is a string of at least STHREADS_ERROR_STRING_MAX chars */

/* Postconditions: */

/* - errorStr g is '\0' terminated string of chars in the range ' ' '- */

/* - 1 <= strlen (errorStrmg) < ΞTHREADS_ERROR_STRING_MAX */

/* Atomicity: */

/* - Atomic with respect to all operations. */

/. -*/

/* Handle errors. */

/" -*/ void SthreadsErrorHandler (int errorCode),

/* Input Arguments:

/* - errorCode ^• error code returned by an Sthreads function call.

/* Operation:

/* - error handler function is called with errorCode as argument.

/* Default Error Handler Function:

/* - Displays error message and terminates normal program execution.

/* Atomicity:

/* - Not atomic with respect to SthreadsSetErrorHandler operations.

/* - Atomic with respect to all other operations.

/*

/* Set error handler function.

/* int SthreadsSetErrorHandler (void (*errorHandler) (int errorCode));

/ * Input Arguments : * /

/* - errorHandler : function to handle errors */

/* Preconditions: */

/* - errorHandler == NULL || */

/* errorHandler is valid void Cldntl function */

/* Postconditions: */

/* - if (errorHandler == NULL) */

/* error handler function is set to default error handler function */

/* - if (errorHandler '= NULL) */ /* error handler function ι_ set to Errornanαier ^«,

/* Atomicity: */

/* - Not atomic with respect to */

/* SthreadsHandleError and SthreadsSetErrorHandler operations */

/* - Atomic with respect to all other operations. */

/« */

/* Control the processors used by program execution. */

/_{* *}/ int SthreadsGetSystemProcessors (int processor!]),

/* Output Arguments: */

/* - processors : processors that exist on the system */

/* Function Return: */

/* - error code. */

/* Preconditions */

/* - processor '= NULL && */

/* processor is an array of at least STHREADS_PROCEΞSORS_MAX mts */

/* Postconditions */

/* - forall (p = 0; p < STHREADS_PROCEΞΞORΞ_MAX, p++) V

/* ValidProcessorStatus (processor [p] ) && * '

/ * (if (processor [p] == STHREADΞ_PROCESSOR_YES ) */

/* a processor numbered p exists on the system) _ * / f (if (processor(p) == STHREADS_PROCESΞOR_NO) ^»,

/* a processor numnered p does not exist on the system) */

/* Atomicity. */

/* - Atomic with respect to all operations */ int SthreadsSetProgramProcessors tint processor []) ,

/ * Input Arguments . * / /* - processor processors on which the threads of the program may execute */

/* Function Return- */

/* - error code. */

/* Preconditions. * /

/* - processor '= NULL && */

/* processor is an array of at least STHREADS_PROCEΞΞORS_MAX mts */

/* - forall (p = 0 ; p < ΞTHREADS_PROCEΞΞORS_MAX, p++) */

/* ValidProcessorStatus (processor [p] ) && */

/* if (processortp] == ΞTHREADΞ_PROCESSOR_YEΞ ) */

/* a processor numbered p exists on the system */

/* - exists (p = 0, p < ΞTHREADΞ_PROCEΞΞORΞ_MAX, p++) »/

/* processor [pi == ΞTHREADS_PROCEΞΞOR_YEΞ . */

/* Atomicity */

/* - Must be called when program execution consists of a single threaα */ int SthreadsGetProgramProcessors (int processor []) ,

/* Output Arguments: */

/* - processors : processors on which the program may execute */

/* Function Return: */

/* - error code. */

/* Preconditions: */

/* - processor '= NULL && */

/* processor is an array of at least ΞTHREADS_PROCEΞSORS_MAX mts. */

/* Postconditions: */

/* - forall (p = 0; p < ΞTHREADΞ_PROCEΞSORΞ_MAX, p++) */

/* ValidProcessorStatus (processor [p] ) && */

/' (if (processor[p] == ΞTHREADΞ_PROCEΞSOR_YES ) */

/* the program may execute on processor number p) && »/

/* (if (processortp] == STHREADS_PR0CESSOR_NO) */

/* the program may not execute on processor number p) */

/* Atomicity: */

/* - Not atomic with respect to */

/* ΞetProgramProcessors and SetNumProgramProcessors operations. */

/* - Atomic with respect to all other operations. */ int SthreadsSetThreadProcessors (int processor []) ,

/* Input Arguments: */

/* - processor : processors on which the thread may execute */

/* Function Return: */

/* - error code. * / /* Preconditions

/* - processor '= NULL &&

/* processor is an array of at least STHREADS_PROCEΞSORS_MAX ints

/* - forall (p = 0, p < STHREADS_PROCEΞSORS_MAX, p++)

/* ValidProcessorStatus (processor [p] ) &&

/* if (processortp] == STHREADS_PROCESS0R_YES )

/* the program may execute on processor number p

/* - exists (p = 0, p < STHREADS_PR0CESSORS_MAX, p++)

/* processortp] == STHREADS_PROCESSOR_YES

/* Atomicity

/* - Not atomic with respect to

/* SetProgramProcessors and ΞetNumProgramProcessors operations

/* - Atomic with respect to all other operations mt SthreadsGetNumSystemProcessors (int *numProcessors )

/* Output Arguments */

/* - numProcessors number of processors that exist on the system */

/* Function Return */

/* - error code */

/* Preconditions */

/* - numProcessors '= NULL && numProcessors points to a valid int variable */

/* Postconditions */

/* - *numProcessors == number of processors that exist on the system *

/* Atomicity */

/* - Atomic witn respect to all operations * t SthreadsSetNumProgramProcessors (int numProcessors)

/* Input Arguments */

/* - numProcessors number of processors on which the threads of the program */

/* may execute */

/* Function Return */

/* - error code */

/* Preconditions */

/* - numProcessors >= 1 */

/* - numProcessors <= number of processors that exist on the system */

/* Atomicity */

/* - Must be called when program execution consists of a single thread */

/_*

/* Multithreaded block

/* t SthreadsBlockl mt numStatements vo d ( 'statement []) (void *args) void *args int mapping t numThreads int priority unsigned t stackSize)

/* Input Arguments /* - numStatements number of statements in block /* - statement functions representing statements /* - args pointer to arguments of the statements /* - mapping mapping of statements onto threads /* - numThreads number of threads /* - priority priority of threads /* - stackSize stack size of threads /* Function Return /* - error code /* Preconditions /* - numStatements >= 0 /* - statement '= NULL && /* statement is an array of at least numStatements functions /* - forall (s = 0, s < numStatements s++) /* statement [s] '= NULL && /* statementts] is a valid void (*) (void *) function /* - ValidMapp g (mapping) /* - if (mapping ' = STHREADS_MAPPING_ΞIMPLE) /* (numThreads > 0) | | (numThreads == 0 && numStatements == 0) /* - ValidPriority (priority) | | priority == STHREADS_PRIORITY_PARENT /* - ValidStackSize (stackSize) /* Atomicity /* - Atomic with respect to all operations . ,

/* Multithreaded regular for loop */

/* — */ int SthreadsRegularForLoop( void (*chunk) ( t initial, t bound mt step void *args) void *args, int initial, mt condition, int bound, int step, int chunkSize, t mapping, int numThreads, int priority, unsigned int stackSize) ,

/* Input Arguments

/* - chunk function to execute iterations of loop body

/* - args pointer to arguments of loop body

/* - initial initial value of control variable

/* - condition condition between control variable and bound value

/ * - bound bound value of control variable

/* - step step value of control variable

/* - chunkSize number of iterations per chunk

/* - mapping mapping of chunks onto threads

/ * - numThreads number of threads

/* - priority priority of threads

/* - stackSize stack size of threads

/* Function Return

/* - error code

/* Preconditions

/ * - chunk ' = NULL &&

/* chunk is a valid void (* tint int mt void function

/* - ValidCondition( condition)

/* - ' InfιnιteRange( initial condition bound step)

/* - (chunkSize > 0) | |

/* (chunkSize == 0 && NullRange (initial condition bound step) )

/* - ValidMapping (mapping)

/* - if (mapping '= STHREADS_MAPPING_SIMPLE)

/* (numThreads > 0) | |

/* (numThreads == 0 && NullRange (initial condition bound step))

/* - ValιdPrιoπty(prιoπty) | | priority == STHREADΞ_PRIORITY_PARENT

/* - ValidStackSizel stackSize)

/* Definitions

/* - InfιnιteRange( initial, condition bound step) =

/* (condition == STHREADΞ_CONDITION_LT &&

/* initial < bound && step <= 0) | |

/* (condition == STHREADS_C0NDITION_LE &&

/* initial <= bound && step <= 0) | |

/* (condition == STHREADS_C0NDITION_GT &5c

/* initial > bound && step >= 0) | |

/* (condition == STHREADS_CONDITION_GE &&

/* initial >= bound && step >= 0)

/* - NullRange (initial condition, bound step) = */

/* (condition == STHREADS_CONDITION_LT && initial >= bound) */

/* (condition == STHREADS_CONDITION_LE && initial > bound) I */

/* (condition == STHREADΞ_CONDITION_GT && initial <= bound) */

/* (condition == ΞTHREADS_CONDITION_GE && initial < bound) */

/* Atomicity */

/* - Atomic with respect to all operations */

/*

/* Flags /_* typedef struct { unsigned char value [16], ) SthreadsFlag, int SthreadsFlaglnitializef SthreadsFlag *flag)

/* Input-Output Arguments

/* - flag flag variable

/* Function Return

/* - error code

/* Preconditions

/* - flag '= NULL && flag points to a valid flag variable

/* - 'Initialized(flag)

/* Atomicity /* - Not atomic with respect to _J. otner operations on nay

/* - Atomic with respect to all other operations */ mt SthreadsFlagFinalize (SthreadsFlag *flag)

/* Input-Output Arguments */

/* - flag flag variable */

/* Function Return */

/* - error code */

/* Preconditions */

/* - flag '= NULL && flag points to a valid flag variable */

/* - Imtιalιzed(flag) && ' Fmalizedlflag) */

/* - NumWaιtmg(flag) == 0 */

/* Atomicity ^*/

/* - Not atomic with respect to all other operations on flag */

/* - Atomic with respect to all other operations */ t ΞthreadsFlagSet (SthreadsFlag 'flag)

/* Input-Output Arguments */

/* - flag flag variable ^♦/

/* Function Return */

/* - error code */ i * Preconditions */

/* - flag '= NULL && flag points to a valid flag variable *

/* - Inιtιalιzed(flag) && ' Fmalizedf rlag) */

/* Atomicity */

/* - Atomic with respect to Set and Check operations on flag */

/* - Not atomic with respect to other operations on flag */

/* - Atomic w th respect to all other operations */ mt ΞthreadsFlagCheck (SthreadsFlag 'flag)

/* Input-Output Arguments */

/* - flag flag variable */

/* Function Return */

/ * - error code * /

/* Preconditions */

/* - flag '= NULL && flag points to a valid flag variable */

/* - Imtιalιzed(flag) && 'Fmalizedlflag) */

/* Atomicity ^*/

/* - Atomic with respect to Set and Check operations on flag */

/* - Not atomic with respect to other operations on flag */

/* - Atomic with respect to all other operations */ t ΞthreadsFlagReset (SthreadsFlag *flag)

/* Input-Output Arguments */

/* - flag flag variable */

/* Function Return */

/* - error code */

/* Preconditions */

/* - flag '= NULL && flag points to a valid flag variable */

/* - Inιtιalιzed(flag) && ' Fmalizedlflag) */

/* - NumWaιtmg(flag) == 0 */

/* Atomicity */

/* - Not atomic with respect to other operations on flag */

/* - Atomic with respect to all other operations */

/* */

/* Counters */

/* */ typedef struct { unsigned char value [40] } SthreadsCounter mt SthreadsCounterlmtialize (SthreadsCounter 'counter)

/* Input-Output Arguments */

/* - counter pointer to counter variable */

/* Function Return */

/* - error code */

/* Preconditions */ /* - counter ' = NULL && counter points to a valid counter variable. '/

/* - ϋnitialized(counter) . */

/* Atomicity: */

/* - Not atomic with respect to all other operations on counter. */

/* - Atomic with respect to all other operations. */ int SthreadsCounterFinalize (SthreadsCounter 'counter);

/* Input-Output Arguments: */

/* - counter : pointer to counter variable. */

/* Function Return: */

/ * - error code . * /

/* Preconditions: */

/* - counter '= NULL && counter points to a valid counter variable */

/* - Inιtιalιzed(counter) && 'Finalized (counter) */

/* - NumWaiting (counter) == 0. */

/* Atomicity */

'* - Not atomic with respect to all other operations on counter */

/* - Atomic with respect to all other operations */ int SthreadsCounterIncrement (SthreadsCounter 'counter, unsiσned nt amount),

/* Input-Output Arguments

/* - counter pointer to counter variable

/* Function Return

/* - error code

/* Preconditions: */

/* - counter '= NULL && counter points to a valid counter variable */

/* - Inιtιalιzed(counter) && ' F alizedlcounter ) */

/* - Count (counter) <= UINT_MAX - amount

/* Atomicity. */

/* - Atomic with respect to Increment and Check operations on counter */

/* - Not atomic with respect to other operations on counter */

/* - Atomic with respect to all other operations */ mt SthreadsCounterCheck (SthreadsCounter 'counter, unsigned int value)

/* Input-Output Arguments.

/* - counter : pointer to counter variable.

/* Function Return:

/* - error code.

/ * Preconditions :

/ * - counter ' = NULL &. counter points to a valid counter variable

/ * - Inιtιalιzed ( counter) && ! Fmalιzed ( counter )

/ * Atomicity

/* - Atomic with respect to Increment and Check operations on counter

/* - Not atomic with respect to other operations on counter

/* - Atomic with respect to all other operations int SthreadsCounterReset (SthreadsCounter 'counter),

/* Input-Output Arguments: '/

/* - counter : pointer to counter variable. */

/* Function Return: */

/ * - error code . */

/* Preconditions: */

/* - counter '= NULL && counter points to a valid counter variable */

/* - Inιtιalιzed(counter) && 'Finalιzed(counter) */

/* - NumWaiting (counter) == 0. */

/* Atomicity: */

/* - Not atomic with respect to all other operations on counter */

/* - Atomic with respect to all other operations */

/ * Locks /. typedef struct { unsigned char value [36]; ) SthreadsLock; mt SthreadsLocklmtialize (SthreadsLock 'lock)

/* Input-Output Arguments: /* - lock pointer to lock variable •/

/* Function Return */

/* - error code */

/* Preconditions */

/* - lock '= NULL && lock points to a valid lock variable */

/* - 'Ιnιtιalιzed(lock) */

/* Atomicity */

/* - Not atomic w th respect to all other operations on lock */

/* - Atomic with respect to all other operations */ int SthreadsLockF alizel SthreadsLock 'lock)

/* Input-Output Arguments */

/* - lock pointer to lock variable */

/* Function Return */

/* - error code */

" Preconditions */

/* - lock '= NULL && lock points to a valid lock variable */

/* - Inιtιalιzed(lock) && ' F alιzed(lock) */

/* - 'AnyThreadHolds (lock) */

* Atomicity */

/* - Not atomic with respect to all other operations on lock */

/* - Atomic with respect to all other operations */ int SthreadsLockAcquire (SthreadsLock 'lock)

/* Input-Output Arguments */

/' - lock pointer to lock variable */

/* Function Return *

/* - error code */

/* Preconditions '/

/* - lock '= NULL SSe lock points to a valid lock variable */

/* - Inιtιalιzed(lock) && ' F alιzed(lock) */

/* - 'ThιsThreadHolds(lock) */

/* Atomicity */

/* - Atomic with respect to Acquire and Release operations on lock */

/* - Not atomic with respect to other operations on lock */

/* - Atomic with respect to all other operations */ t SthreadsLockRelease (SthreadsLock 'lock)

/* Input-Output Arguments */

/* - lock pointer to lock variable */

/* Function Return */

/* - error code *

/* Preconditions */

/* - lock '= NULL && lock points to a valid lock variable */

/* - Inιtιalιzed(lock) && ' Fιnalιzed(lock) '/

/* - ThisThreadHolds (lock) ^♦/

/* Atomicity */

/* - Atomic with respect to Acquire and Release operations on lock */

/* - Not atomic with respect to other operations on lock */

/* - Atomic w th respect to all other operations */

/* */

/* Barriers */

/_* »/ typedef struct { unsigned char value [52] } SthreadsBarrier int SthreadsBarrierlnitialize (SthreadsBarrier 'barrier mt numThreads)

/* Input-Output Arguments */

/* - barrier pointer to barrier variable */

/* - numThreads number of threads that cross barrier in each pass */

/* Function Return ^♦/

/* _ _error code */

/* Preconditions */

/* - barrier '= NULL && barrier points to a valid barrier variable */

/* - ' Inιtιalιzed(barπer) */

/* - numThreads >= 1 */

/* Atomicity */ /* - Not atomic with respect to all other operations on barrier */

/* - Atomic with respect to all other operations */ int SthreadsBamerFinalize (SthreadsBarrier 'barrier)

/* Input-Output Arguments- */

/* - barrier pointer to barrier variable */

/* Function Return */

/* - error code */

/* Preconditions */

/* - barrier '= NULL && barrier points to a valid barrier variable */

/* - Inιtιalιzed(bamer) && ' Fιnalιzed(bamer ) */

/* - NumWaiting (barrier) == 0 */

/* Atomicity */

/* - Not atomic with respect to all other operations on barrier */

/* - Atomic with respect to all other operations */ int SthreadsBamerPass (SthreadsBarrier 'barrier)

/* Input-Output Arguments */

/* - barrier pointer to barrier variable */

/* Function Return */

/* - error code */

/* Preconditions */

/* - barrier '= NULL && barrier points to a valid barrier variable *

/* - Initialized (barrier) && ' Fmalizedlbamer ) *

/* Atomicity */

/* - Atomic with respect to Pass operations on barrier */

/* - Not atomic with respect to other operations on barrier */

/* - Atomic with respect to all other operations '/ t SthreadsBarrierReset (SthreadsBarrier 'barrier int numThreads)

/* Input-Output Arguments */

/* - barrier pointer to barrier variable */

/* Function Return ^»/

/* - error code */

/* Preconditions */

/* - barrier '= NULL &£■ barrier points to a valid barrier variable */

/* - Initialized (barrier) && 'Fιnalιzed(bamer) */

/* - NumWaiting (barrier) == 0 */

/* - numThreads >= 1 */

/* Atomicity */

/* - Not atomic with respect to all other operations on barrier *

/* - Atomic with respect to all other oDerations */

/_* »_/

/* Priorities */

/* ,. i t SthreadsGetCurrentPrioπty (mt 'priority)

/* Output Arguments */

/* - priority scheduling priority of calling thread */

/* Function Return */

/* - error code */

/* Preconditions */

/* - priority '= NULL && priority points to a valid t variable */

/* Postconditions */

/* - 'priority == scheduling priority of calling thread */

/* Atomicity */

/* - Atomic with respect to all operations */ int SthreadsSetCurrentPrioπty (mt priority)

/* Input Arguments */

/* - priority scheduling priority for calling thread */

/* Function Return */

/* - error code */

/* Preconditions «/

/* - ValidPriority (priority) _*/

/* Atomicity »/

/* - Atomic with respect to all operations */ tifdef cplusplus

) #endif

#endif /* !ΞTHREADS_H */

/*^•

/* Sthreads: A Structured Thread Library for Shared-Memory Mul iprocessing */

/* Version 1.0 for Windows NT */

/* */

/* Author: John Thornley, Computer Science Dept , Caltech. */

/* Date: September 1998. */

/* * I

/* */

/* THINGS TO DO */

/* */

/* - Change names of CHECK tests, e g , to CHECKNOTINITIALIZED */

/* - Make Finalize operations set Initialized and Finalized flags to false */

- Counter for dynamic for loop should be unsigned t */

/* - Declarations of thread functions should be compatible with */

/* Win32 prototype see page 25 */

/* - Implement special case of BarrierPass when numThreads == 1 */

/* - Implement flags like counters for efficiency when flag is set' */

/* - Change priority low and high to THREAD_PRIORITY_IDLE and _TIME_CRITICAL

/* */

/*- -*/

#ιnclude <stddef h> ϊinclude <stdιo h> ♦include <stdlιb h> ♦include <assert h> ♦include <lιmιts h> ♦include <wmdows . h> ♦include "sthreads h'

/* Bool type definition /_* typedef t bool, ♦define false 0 ♦define true 1

/*

/* Miscellaneous utility definitions */

/. ,_/

♦define MIN , y) ( (x) < (y) ^•> (x) (y) ) ♦define MAX(x, y) ( (x) > (y) > (x) (y) )

/. , ,

/* Verify requirements beliefs, ana checks */

/» .,

♦define require (condition) assert (condition) /* require this input condition ♦define believ (condition) assert (condition) /' believe this must be true ♦define chec (condition) assert (condition) /* check this is true

/_*

/* Check for error conditions

/'

♦define CHECKINPUTVALUE (condition) \ if ('(condition)) { return ΞTHREADS_ERROR_INPUTVALUE , )

♦define CHECKMEMORYALLOC (condition) \ if ('(condition)) { return STHREADS_ERROR_MEMORYALLOC , }

♦define CHECKTHREADCREATE (condition) \ if ('(condition)) { return STHREADΞ_ERROR_THREADCREATE, )

♦define CHECKSYNCCREATE (condition) \ if ('(condition)) { return ΞTHREADS_ERROR_SYNCCREATE , }

♦define CHECKINITIALIZEDI condition) \ if ('(condition)) { return STHREADS_ERROR_INITIALIZED, )

♦define CHECKUNINITIALIZED (condition) \ if ( ' (condition) ) { return ΞTHREADS_ERROR_UNINITIALIZED, } ♦define CHECKFINALIZED (condition) \ if ('(condition)) ( return STHREADS_ERROR_FINALIZED }

♦define CHECKINUSE(condιtιon) \ if ('(condition)) { return STHREADS_ERROR_INUSE }

♦define CHECKLOCKHELD ( condition) \ if ( ' (condition) ) { return STHREADS_ERROR_LOCKHELD )

♦define CHECKLOCKNOTHELD (condition) \ if ('(condition)) { return ΞTHREADS_ERROR_LOCKNOTHELD )

♦define CHECKCOUNTEROVERFLO ( condition) \ if ('(condition)) ( return STHREADΞ_ERROR_COUNTEROVERFLOW }

♦define CHECKOTHER (condition) \ if ('(condition)) { return ΞTHREADS_ERROR_UNSPECIFIED }

/_* ._/

/* Is processor status value valid¹ */

/. . static bool ValidProcessorStatus ( t p) { return p == ΞTHREADΞ_PROCESSOR_YEΞ | | p == ΞTHREADS_PROCESSOR_NO )

/* ./

/* Is mapping value valid' */

/* . static bool ValidMapp gdnt m) { return m == STHREADΞ_MAPPING_SIMPLE | | m == ΞTHREADΞ_MAPPING_DYNAMIC ] | m == STHREADS_MAPPING_BLOCKED j | m == ΞTHREADS_MAPPING_INTERLEAVED )

/.

/* Is condition value valid' */

/. . static bool ValidCondition (int c) { return c == STHREADΞ_CONDITION_LT | | c == STHREADS_CONDITION_LE j j c == STHREADS_CONDITION_GT | | c == STHREADΞ_CONDITION_GE, }

/* 7, I

I * Is stack-size value valid' */

/* , / static bool ValidStackSize (unsigned int s) { return

S >= STHREADS_STACK_SIZE_MINIMUM }

/*

/* Is priority value valid' */

/« static bool ValidPrioπtydnt p) { return

STHREADS_PRIORITY_LOWEΞT <= p && p <= STHREADS_PRIORITY_HIGHEΞT /.

/* Print error message to string */

/* void SthreadsWriteErrorMessage (int errorCode, char errorString! ] ) { switch (errorCode) { case STHREADS_ERROR_NONE: sprintf (errorStrmg, "no error" ) ; break ; case ΞTHREADS_ERROR_INPUTVALUE sprintf (errorStrmg,

"input value precondition violation ) , break, case STHREADΞ_ERROR_MEMORYALLOC sprintf (errorStrmg,

"memory allocation failure') break, case STHREADS_ERROR_THREADCREATE sprintf (errorStrmg,

'system thread creation failure ) break, case ΞTHREADS_ERROR_SYNCCREATE sprintf (errorStrmg,

"system synchronization creation failure ) break, case STHREADΞ_ERROR_INITIALIZED sprintf (errorStrmg,

"initialization on previously initialized object ) break, case ΞTHREADΞ_ERROR_UNINITIALIZED sprintf (errorStrmg,

"operation on uninitialized object ) break , case STHREADS_ERROR_FINALIZED sprintf (errorStrmg,

"operation on finalized object' ) , brea ; case STHREADS_ERROR_INUSE sprintf (errorStr g,

"f alization/reset on in-use object') brea , case ΞTHREADΞ_ERROR_LOCKNOTHELD sprintf (errorStr g,

"release on lock not held') break; case STHREADS_ERROR_COUNTEROVERFLOW sprintf (errorStrmg,

"counter overflow"), break ; case STHREADS_ERROR_UNΞPECIFIED sprintf (errorStrmg,

"unspecified error"), break; default: sprintf (errorStrmg,

">»» unknown error code <«<< ^• ) , break, } }

/* »/

/* Default error handler function: */

/* displays error message and terminate normal program execution _*/

/* */ static void DefaultErrorHandler (mt errorCode) C char errorStrmg [STHREADS_ERROR_ΞTRING_MAX] if (errorCode '= STHREADS_ERROR_NONE ) ( SthreadsWriteErrorMessag (errorCode, errorώtrmy j , fprintf (stderr, "\n%s\n", errorString) ; exit(EXIT_FAILURE) ;

)

}

/_*

/ * Error handler function . * /

_/* » _/ static void CerrorHandlerFunction) ( int errorCode ) = Def aul tErrorHandler ;

/* ./

/* Handle errors. »/

/_*

♦define UNLOCKED 0 ♦define LOCKED 1 static LONG lock = UNLOCKED; void SthreadsErrorHandler (int errorCode) ( while (InterlockedExchange( (LPLONG) klock, LOCKED) != UNLOCKED);

CerrorHandlerFunction) (errorCode) ,-

InterlockedExchange( (LPLONG) &lock, UNLOCKED); )

♦undef UNLOCKED ♦undef LOCKED

/_* ,_/

/* Set error handler function. */

_/* int SthreadsSetErrorHandle (void ( 'errorHandler) ( int errorCode)) { if (errorHandler == NULL) errorHandlerFunction = DefaultErrorHandler ; else errorHandlerFunction = errorHandler; return STHREADS_ERROR_NONE ;

}

/» * j

/* Control the processors used by program execution. */

/« int SthreadsGetSystemProcessors (int processor!]) (

DWORD processAffinity, systemAffinity, processorBit; int p; require (STHREADS_PROCESSORS_MAX == 32); GetProcessAffinityMask (

GetCurrentProcess ( ) ,

(LPDWORD) kprocessAffinity, (LPDWORD) ksystemAffinity) ;

CHECKINPUTVALUE (processor != NULL); processorBit = (DWORD) 1; for (p = 0; p < STHREADS_PROCEΞSORΞ_MAX; p++) { if (systemAffinity & processorBit) processortp] = STHREADS_PROCESSOR_YES ; else processor [p] = STHREADS_PROCEΞΞOR_NO; processorBit = processorBit << 1; ) return STHREADS_ERROR_NONE; )

/. ,, mt SthreadsSetProgramProcessors (int processor!]) {

DWORD processAffmity, systemAffmity, processorBit, int p; require (STHREADS_PROCESSORS_MAX == 32), GetProcessAffinityMask (

GetCurrentProcess ( ) ,

(LPDWORD) kprocessAffinity, (LPDWORD) isystemAff ity) ,

CHECKINPUTVALUE (processor '= NULL), processorBit = (DWORD) 1, for (p = 0, p < STHREADS_PROCEΞΞORΞ_MAX, p++) {

CHECKINPUTVALUE (ValidProcessorStatus (processor [p] ) ) if (processortp] == STHREADΞ_PROCEΞΞOR_YEΞ )

CHECKINPUTVALUE (systemAffinity & processorBit), processorBit = processorBit << 1, } for (p = 0, p < STHREADS_PROCEΞΞORΞ_MA , p++) if (processortp] == STHREADS_PROCEΞSOR_YEΞ) break, CHECKINPUTVALUE (p < STHREADS_PROCESSORS_MAX) , processAffmity = (DWORD) 0 processorBit = (DWORD) 1, for (p = 0, p < STHREADS_PROCESΞORΞ_MAX p++) ( if (processortp] == STHREADΞ_PROCESSOR_YEΞ) processAffmity = processAf ity | processorBit, processorBit = processorBit << 1 )

SetProcessAffinityMask (GetCurrentProcess ( ) processAff ity) SetThreadAffinityMas (GetCurrentThread ( ) processAffmity) , return STHREADS_ERROR_NONE ,

mt SthreadsGetProgramProcessors ( t processor!]) {

DWORD processAffmity, systemAffmity, processorBit, int p, require ( STHREADS_PR0CEΞΞORΞ_MAX == 32), GetProcessAffinityMask (

GetCurrentProcess ( )

(LPDWORD) tprocessAffm ty (LPDWORD) ksystemAffinity )

CHECKINPUTVALUE (processor '= NULL), processorBit = (DWORD) 1, for (p = 0, p < ΞTHREADΞ_PROCEΞSORΞ_MA , p++) ( if (processAffmity & processorBit) processortp] = ΞTHREADΞ_PROCESΞOR_YEΞ else processortp] = STHREADΞ_PROCEΞΞOR_NO , processorBit = processorBit << 1, } return STHREADΞ_ERROR_NONE , }

/. int SthreadsSetThreadProcessors (int processor!] ) {

DWORD threadAff ity, processAff ity systemAffinity, processorBit require (STHREADΞ_PROCEΞSORS_MAX == ₃₂), GetProcessAffinityMask (

GetCurrentProcess ( ) ,

(LPDWORD) fcprocessAffinity, (LPDWORD) ksystemAf mity) , CHECKINPUTVALUE (processor != ULL); processorBit = (DWORD) 1; for (p = 0; p < STHREADS_PROCESS0RS_MAX; p++) {

CHECKINPUTVALUE (ValidProcessorStatus (processor [p] ) ) , if (processortp] == STHREADS_PROCEΞΞOR_YES)

CHECKINPUTVALUE (processAffinity & processorBit); processorBit = processorBit << 1; } for (p = 0; p < STHREADS_PROCESΞORS_MAX, p++) if (processortp] == ΞTHREADS_PROCESSOR_YES) break; CHECKINPUTVALUE (p < STHREADS_PROCEΞSORS_MAX) , threadAffinity = (DWORD) 0; processorBit = (DWORD) 1; for (p = 0; p < ΞTHREADΞ_PROCESSORS_MAX; p++) { if (processortp] == ΞTHREADΞ_PROCESΞOR_YEΞ) threadAffinity = threadAff ity | processorBit, processorBit = processorBit << 1, } SetThreadAffinityMas (GetCurrentThread ( ) , threadAffinity) return STHREADS_ERROR_NONE,

}

t SthreadsGetNumSystemProcessors (int 'numProcessors) {

DWORD processAff ity, systemAffmi y, processorBit, t p, count, require ( STHREADS_PROCEΞΞORS_MAX == 32), GetProcessAffinityMask! GetCurrentProcess ( ) , (LPDWORD) kprocessA finity, (LPDWORD) ksystemAffmity) ,

CHECKINPUTVALUE (numProcessors '= NULL), count = 0 ; processorBit = (DWORD) 1 ; for (p = 0; p < STHREADS_PROCESSORΞ_MAX; p++) ( if (systemAffinity & processorBit) count = count + 1; processorBit = processorBit << 1, } 'numProcessors = count; return STHREADΞ_ERROR_NONE ,

/* mt SthreadsSecNumProgramProcessors (mt numProcessors) {

DWORD processAffinity, systemAffinity, processorBit, int p, numSystemProcessors ; require (STHREADS_PROCESSORS_MAX == 32); GetProcessAffmityMask(

GetCurrentProcess ( ) ,

(LPDWORD) fcprocessAffinity, (LPDWORD) ScSystemAffmity)

CHECKINPUTVALUE (numProcessors >= 1), numSystemProcessors = 0; processorBit = (DWORD) 1; for (p = 0; p < ΞTHREADS_PROCESS0RΞ_MAX; p++ ) { if (systemAffinity & processorBit) numSystemProcessors = numSystemProcessors + 1, processorBit = processorBit << 1; } CHECKINPUTVALUE (numProcessors <= numSystemProcessors), processAf inity = (DWORD) 0,- processorBit = (DWORD) 1, for (p = 0 ; p < STHREADS_PR . .j-uh-.t «« iiLnlifl ϋ -c -υL - _^ u , if (systemAffinity & processorBit) { processAffmity = processAffinity | processorBit; numProcessors = numProcessors - 1 ; } processorBit = processorBit << 1; } believe (numProcessors == 0) ; SetProcessAffinityMask (GetCurrentProcess ( ) , processAffinity) ; return STHREADΞ_ERROR_NONE;

/* Arguments for multithreaded block thread /_* typedef struct { int numStatements; void ("statement) (void *args); void *args; int first, last, step; int 'counter;

LPCRITICAL_ΞECTION counterLock;

LPLONG threadCount;

HANDLE threadsFinished; } MTBargs;

/* Simple multithreaded block thread */

/_* ._/ static void SMTBthread (MTBargs *args) {

BOOL returnOK; require (args != NULL) ; require (args->numStatements > 0); require (args->statement != NULL) ; require (0 <= args->first && args->first < args->numStatements ) ; require Cargs->statement [args->first] != NULL) ;

Cargs->statement [args->first] ) (args->args) ; if (InterlockedDecrement (args ->threadCount) == 0) ( returnOK = SetEvent (args->threadsFinished) ; check (returnOK) ;

}

/* */

/* Dynamic multithreaded block thread */

/* * static void DMTBthread(MTBargs *args) { int s; bool finished;

BOOL returnOK; require (args != NULL); require (args->numStatements > 0); require (args->statement != NULL) ; require(0 <= args->first && args->first < args->numΞtatements) ; require (args->counter != NULL) ; require (args->counterLock != NULL) ; s = args->first; while (true) { require (args->statement [s] != NULL) ;

Cargs->statement [s] ) (args->args) ;

EnterCriticalSection(args->counterLock) ; finished = Cargs->counter == args->numStatements - 1); if (Ifinished) { 'args->counter = *at_5J->counter + 1 s = *args->counter,

}

LeaveCπticalSection (args->counterLock) , if ( finished) break,

} if (InterlockedDecrement (args->threadCount) == 0) { returnOK = SetEvent (args->threadsFιnιshed) , check (returnOK) ,

)

/. _*/

/* Blocked and interleaved multithreaded block thread ^»/

/. »/ static void BIMTBthread (MTBargs *args) { int s,

BOOL returnOK requirelargs '= NULL), require! args->numStatements > 0) require (args->statement '= NULL) requιre(0 <= args->last && args->last < args->numStatements ) requιre(0 <= args->fιrst && args->fιrst <= args->last) require (args->step > 0), require ( (args->last - args->first) %args->step == 0) s = args->fιrst, while (true) ( require (args->statement [s] '= NULL)

Cargs->statement [s] ) ( args->args ) , if (s == args->last) break, believe (args->last - s >= args->step) , s = s + args->step, ) if (InterlockedDecrement (args ->threadCount) == 0) { returnOK = SetEvent (args->threadsFmished) , check (returnOK) , )

)

/* .

/* Multithreaded block «/

_/* , _/ int SthreadsBloc t int numStatements, void ('statement []) (void *args) void 'args int mapping, mt numThreads, mt priority, unsigned mt stackSize)

{

HANDLE *thread,

MTBargs *threadArgs ,

LONG threadCount,

HANDLE threadsFimshed,

HANDLE parentThread, int parentPrioπty, void (*threadstart) (MTBargs *args) , int s, t,

DWORD threadID, t counter,

CRITICAL_SECTION counterLock, mt blockFirst, blockSize blockRemainder

BOOL returnOK,

DWORD returnCode,

CHECKINPUTVALUE (numStatements >= 0), CHECKINPUTVALUE (statement '= NULL), for (s = 0, s < numStatements, s++)

CHECKINPUTVALUE (statement [s] '= NULD, CHECKINPUTVALUE (ValidMapping (mapping) ) , if (mapping '= STHREADS_MAPPlNG_SIMPι-E;

CHECKINPUTVALUE ( (numThreads > 0) ||

(numThreads == 0 && numStatements == 0)); CHECKINPUTVALUE (

ValidPriority (priority) || priority == ΞTHREADΞ_PRIORITY_PARENT ) CHECKINPUTVALUE(ValidStackSιze(stackSιze) ) , if (numStatements == 0) return STHREADS_ERROR_NONE , if (mapping == STHREADS_MAPPING_SIMPLE) numThreads = numStatements, if (numThreads > numStatements) numThreads = numStatements, if (numThreads == 1) mapping = STHREADS_MAPPING_BLOCKED, if (numThreads == numStatements) mapping = STHREADS_MAPPING_SIMPLE,

CHECKMEMORYALLOC (numThreads <= INT_MAX/sιzeof (HANDLE) ) , thread = (HANDLE *) malloc (numThreads'sizeo (HANDLE) ) , CHECKMEMORYALLOC ( thread '= NULL),

CHECKMEMORYALLOC (numThreads <= INT_MAX/sιzeof (MTBargs) ) , threadArgs = (MTBargs *) malloc (numThreads'sizeof (MTBargs ) ) CHECKMEMORYALLOC ( threadArgs ' = NULL), parentThread = GetCurrentThread ( ) , believe (parentThread '= NULL), parentPriority = GetThreadPrioπty (parentThread) believe (parentPrioπty '= THREAD_PRIORITY_ERROR_RETURN) believe (ValidPriori y (parentPriority) ) if (priority '= ΞTHREADS_PRIORITY_PARENT) { returnOK = SetThreadPriority (parentThread priority) , believe (returnOK) ; } switch (mapping) { case STHREADΞ_MAPPING_ΞIMPLE. threadΞtart = SMTBthread, break, case STHREADS_MAPPING_DYNAMIC counter = numThreads - 1 ,

InitializeCriticalSection (SCounterLock) , threadΞtart = DMTBthread, break; case STHREADS_MAPPING_BLOCKED blockFirst = 0, blockSize = numStatements /numThreads , blockRemamder = numStatements%numThreads, threadStart = BIMTBthread, break ; case ΞTHREADΞ_MAPPING_INTERLEAVED blockΞize = numΞtatements/numThreads, blockRemamder = numStatements%numThreaαs , threadΞtart = BIMTBthread, brea ; default: assert ( false) , } threadCount = numThreads ; threadsFmished = CreateEvent (NULL TRUE, FALSE, NULL),

CHECKΞYNCCREATE( threadsFmished '= NULL), for (t = 0; t < numThreads; t++) ( threadArgs [ t] numΞtatements = numStatements, threadArgs [t] .statement = statement, threadArgs [t] .args = args, threadArgs [tj .threadCount = (LPLONG) &threadCount, threadArgs [t] . threadsFmished = threadsFmished, switch (mapping) { case STHREADS_MAPPING_ΞIMPLE threadArgs [t] . first = t, break; case STHREADΞ_MAPPING_DYNAMIC threadArgs [tj .first = t, threadArgs [ t] counter = ^counter, threadArgs [ t ] counterLock = tcounterLock, break; case STHREAD->_MΛpPINι._bj-uι_h-.-, threadArgs [t] . first = blockFirst, threadArgs [t] . last = blockFirst + (blockSize - 1), threadArgs [t] .step = 1, if (blockRemamder > 0) { threadArgs [t] .last = threadArgs [t] . last + 1; blockRemainder = blockRemamder - 1 ;

} blockFirst = threadArgs [t] . last + 1, break, case ΞTHREADΞ_MAPPING_INTERLEAVED threadArgs [t] . first = t, threadArgs [t] .last = blockΞize'numThreads + t, threadArgs [t] step = numThreads, if (blockRemamder == 0) threadArgs [ t ] last = threadArgs [ ] . last - numThreads, else blockRemamder = blockRemamder - 1, break, default. believe (false) , } thread! t] = CreateThreadfNULL stackSize (LPTHREAD_START_ROUTINE) threads art

(LPVOID) S=threadArgs[t] CREATE_SUΞPENDED ithreadID) CHECKTHREADCREATE ( thread [ t] ' = NULL) , if (priority == STHREADS_PRIORITY_PARENT ) returnOK = SetThreadPriority ( thread [ t] parentPriority) else returnOK = SetThreadPriority (threadf t] , priority), CHECKTHREADCREATE (returnOK) , returnCode = ResumeThread ( thread [ t ]) , CHECKTHREADCREATE (returnCode == 1), ) if (priority '= ΞTHREADΞ_PRIORITY_PARENT) ( returnOK = SetThreadPriority (parentThread, parentPriority) , believe (returnOK) ; } returnCode = WaitForSingleObject (threadsFmished, INFINITE), CHECKOTHERI returnCode '= WAIT_FAILED) , returnOK = CloseHandle (threadsFmished) ; CHECKOTHERI returnOK == TRUE), for (t = 0, t < numThreads, t++) { returnOK = CloseHandle (thread! t] ) ,

CHECKOTHER (returnOK == TRUE), } if (mapping == STHREADΞ_MAPPING_DYNAMIC )

DeleteCriticalSection (StCounterLock) , free (thread) , free ( threadArgs ) ; return STHREADS_ERROR_NONE ;

}

/» ,/

/* Is regular for loop range infinite' */

_/« . static bool InfiniteRange ( t initial, int condition, mt bound, int step) { require (ValιdCondιtιon( condition) ) , switch (condition) { case STHREADΞ_CONDITION_LT return initial < bound && step <= 0, case STHREADS_CONDITION_LE: return initial <= bound && step <= 0, case STHREADS_CONDITION_GT: return initial > bound &=& step >= 0, case STHREADS_CONDITION_GE : return initial >= bound && step >= 0, default- believe (false) , return false; /* This return should never be executed */

/* »/

/* Is regular for loop range null' */

/* static bool NullRangednt initial, int condition int bound, int step) { require (ValidCondition ( condition) ) switch (condition) { case STHREADΞ_CONDITION_LT return initial >= bound, case STHREADS_CONDITION_LE return initial > bound, case STHREADΞ_CONDITION_GT return initial <= bound, case ΞTHREADS_CONDITION_GE return initial < bound, default believe ( false) return talse /* This return shoulα never be executed */ ) }

/. ./

/* Arithmetic operations on signed and unsigned mtegers */ static unsigned mt DIFF nt high, mt low) { requiredow <= high), return (unsigned mt) (high - low) , }

C static t ADD ( mt base unsigned t offset) { require (offset <= DIFF (INT_MAX, base)) return base + (mt) offset, )

/* static t SUBTRACT ( t base, unsigned mt offset) C require (offset <= DIFF(base, INT_MIN) ) , return base - (mt) offset, )

/* » ,

I * Split range 0 rangeLast into chunks numbered 0 chunkLast with */ /* chunks Return the first and last indices of chunk c */ static void SPLIT! unsigned mt rangeLast, unsigned mt chunkLast unsigned mt c unsigned t 'first, unsigned mt 'last)

{ unsigned mt smallerChunkSize, unsigned mt numLargerChunks , require (chunkLast <= rangeLast), requirelc <= chunkLast) , requιre(fιrst '= NULL && last '= NULL), if (chunkLast == 0) { ^•first = 0,

*last = rangeLast, } else if (chunkLast == rangeLast) {

'first = c;

*last = c; } else { smallerChunkSize = (rangeLast - chunkLast) / (chunkLast + 1) + 1, numLargerChunks = (rangeLast - chunkLast) % (chunkLast + 1),

'first = c'smallerChunkSize + MIN(c, numLargerChunks),

'last = 'first + (smallerChunkSize - 1), if (c < numLargerChunks) 'last = 'last + 1, }

)

/* ./

/* Last iteration number m regular for loop range */

/* (iterations numbered 0, 1, 2, ) */

/_{* *}/ static unsigned int LAST_ITERATION_NUM ( mt initial, mt condition, t bounα mt step) { requιre(ValιdCondιtιon(condιtιon) ) , require (' InfiniteRange (initial condition bound step)) require ( 'NullRange ( initial conαition bound step)) switch (condition) { case ΞTHREADS_CONDITION_LT believe (initial < bound && step > 0), return DIFFfbound - 1, initial )/( (unsigned mt) step) case STHREADΞ_CONDITION_LE believe (initial <= bound &S: step > 0), return DIFFfbound, initial) /( (unsigned int) step), case STHREADS_CONDITION_GT believe (initial > bound && step < 0), return DIFF (initial, bound + 1 )/( (unsigned int) -step) case STHREADS_CONDITION_GE believe (initial >= bound && step < 0), return DIFF (initial, bound) /( (unsigned t) -step), default: assert! false) , return false, /' This return should never be executed */ } )

/« *

/* Last chunk number in regular for loop range (chunks numbered 0, 1 2 ) */ /_{* *}, static unsigned int LAΞT_CHUNK_NUM ( int initial, int condition, int bound t step mt chunkSize) { require (ValidCondition ( condition) ) , require ( 'InfiniteRange (initial, condition bound step)), require ( 'NullRange (initial, condition, bound, step)), require (chunkSize >= 1), return LAΞT_ITERATION_NUM( initial , condition, bound step)/ ( (unsigned int) chunkSize) ,

)

/. ,/

/* Control value on ith iteration of regular for loop range d = 0, 1 2 )*/

/. „ I static mt ControlValue (unsigned int mt initial, t step) { requιre(step '= 0), if (step > 0) return ADD (initial, ι*( (unsigned int) step)), else return SUBTRACT (initial, l* ( (unsigned int) -step)), } /* */

/* Does control value lie inside regular for loop range? */

/* */ static bool InRange ( int controlValue, int initial, int condition, int bound, int step) { require (ValidConditionf condition) ) ; require (! InfiniteRange (initial, condition, bound, step)); require (! NullRange (initial, condition, bound, step)); switch (condition) { case STHREADS_CONDITION_L : believefstep > 0); return initial <= controlValue && controlValue < bound; case STHREADS_CONDITION_LE : believefstep > 0) ,- return initial <= controlValue && controlValue <= bound; case STHREADS_CONDITI0N_GT: believetstep < 0); return initial >= controlValue && controlValue > bound; case STHREADS_CONDITI0N_GE : believe (step < 0); return initial >= controlValue && controlValue >= bound; default: believe (false) ; return false; /* This return should never be executed. */ ) )

/* ./

/* Execute cth chunk of regular for loop range (c = 0, 1 , 2 , ... ) */

_/* ._/ static void ExecuteChunkt int initial, int condition, int bound, int step, int chunkSize, unsigned int c, void ('chunk) (int, int, int, void *), void *args)

{ unsigned int iFirst, iLast; int chunklnitial , chunkLast, chunkBound; require (ValidCondition(condition) ) ; require (! InfiniteRange (initial, condition, bound, step) ) ,- require (! ullRange (initial, condition, bound, step)); require (chunkSize >= 1) ; requirelc <= LAST_CHUNK_NUM (initial , condition, bound, step, chunkSize)) require (chunk != NULL);

ΞPLIT(

LAΞT_ITERATION_NUM(initial, condition, bound, step),

LAΞT_CHUNK_NUM ( initial, condition, bound, step, chunkSize) , c,

SiFirst, SiLast) ; believe(0 <= iFirst); believe (iFirst <= iLast); believedLast <= LAΞT_ITERATION_NU ( initial, condition, bound, step) ) ; chunklnitial = ControlValue (iFirst, initial, step) ; believe(InRange(chunkInitial, initial, condition, bound, step) ) ; chunkLast = ControlValue (iLast, initial, step) ; believe (InRange (chunkLast, initial, condition, bound, step) ) ; switch (condition) { case ΞTHREADS_CONDITION_LT : chunkBound = chunkLast + 1; break; case STHREADS_CONDITION_LE : chunkBound = chunkLast; break; case ΞTHREADS_CONDITION_GT : chunkBound = chunkLast - 1; brea ; case STHREADΞ_CONDITION_GE : chunkBound = chunkLast ; break,- default: believe ( false) , ) Cchunk) ( chunklnitial , chunkBound, step , args ) ,

)

/* */

/* Arguments for multithreaded regular for loop thread */

/* _*/ typedef struct { void Cchunk) (mt initial, t bound, mt step void *args) mt initial condition bound step int chunkSize, unsigned int chunkFirst, chunkLast chunkStep unsigned mt 'counter,

LPCRITICAL_SECTION counterLock,

LPLONG threadCount,

HANDLE threadsFmished, } MTRFLargs,

/* ,/

'* Simple multithreaded regular for loop thread *,

/. . static void SMTRFLthread (MTRFLargs *args) {

BOOL returnOK, require (args '= NULL), require (args->chunk '= NULL), require (ValidCondition (args->condιtιon) ) require ( ' Inf niteRange ( args->mιtιal args->condιt on args->bound args->step)) require ( ' ullRange ( args->mιtιal, args->condιtιon args->bound args->step) ) require (args->chunkSιze >= 1) require (args->chunkFιrst <= LAST_CHUNK_NUM ( args->initial, args->condιtιon, args->bound args->step args->chunkΞιze) ) ,

ExecuteChunk ( args->ιnιtιal, args->condιtιon args->bound args->step args->chunkSιze, args->chunkFιrst args->chun args->args) if (InterlockedDecrement (args ->threadCount) == 0) { returnOK = SetEvent (args->tnreadsFmιshed) chec (returnOK) , ) )

/«

/* Dynamic multithreaded regular for loop thread */

/* ,/ static void DMTRFLthread (MTRFLargs *args) { unsigned mt c, last_c, bool finished,

BOOL returnOK, require (args '= NULL), require (args->chunk '= NULL), require (ValidCondition (args->condιtιon) ) , require ( ' InfiniteRange ( args->ιnιtιal, args->cond tιon args->bound args->step) ) require ( ' NullRange ( args->ιnιtιal, args->condιtιon args->bound, args->step) ) require (args->chunkSιze >= 1), require (args->chunkFιrst <= LAST_CHUNK_NUM ( args->mιtιal, args->condιtιon, args->bound args->step, args->chunkSιze) ) , require (args->counter '= NULL), require (args->counterLock '= NULL), c = args->chunkFirst; l st_c = LAST_CHUNK_NUM ( args->ιnιtιal, args->condιtιon, args->bound, args->step, args->chunkSιze) ; while (true) {

ExecuteChun ( args->inιtιal, args->condιtιon, args->bound, args->step, args->chunkSιze, c, args->chunk, args->args) ; EnterCriticalSection (args->counterLock) ; finished = (*args->counter == last_c) ; if (! finished) {

*args->counter = *args->counter + 1, c = *args->counter ; }

LeaveCπticalSection (args->counterLock) , if (finished) break, ) if (InterlockedDecrement (args->threadCount) == 0) { returnOK = SetEvent (args->threadsFιnιshed) , check (returnOK) , }

}

_/*

/* Blocked and interleaved multithreaded regular for loop thread

/. static void BIMTRFLthread (MTRFLargs *args) { unsigned int c,

BOOL returnOK; require (args •= NULL); require!args->chunk '= NULL), requιre(ValιdCondιtιon(args->condιt on) ) , require ( ' InfiniteRange ( args->ιnιtιal, args->condιtιon, args->bound, args->step) ) , require! 'NullRange! args->ιnιtial, args->condιtιon, args->bound, args->step) ) , require (args->chunkSιze >= 1) ,- require (args->chunkFιrst <= args->chunkLast) , require (args->chunkLast <= LAΞT_CHUNK_NUM ( args->ιnιtιal, args->condιtιon, args->bound, args->step, args->chunkSιze) ) , require ( (args->chunkLast - args->chunkFιrst) %args->chunkstep == 0) c = args->chunkFιrst, while (true) {

ExecuteChunk ( args->ιnιtιal, args->condιtιon, args->bound, args->step, args->chunkSιze, c, args->chunk, args->args); if (c == args->chunkLast) break; believe (args->chunkLast - c >= args->chunkstep) , c = c + args->chunkstep; } if (InterlockedDecrement (args->threadCount) == 0) { returnOK = SetEvent (args->threadsFιnιshed) , check (returnOK) ;

}

/*

/* Multithreaded regular for loop «/

/_* mt SthreadsRegularForLoop( void Cchunk) (int initial, int bound, mt step, void *args), void 'args, mt initial, int condition, mt bound, int step, mt chunkSize, int mapping, mt numThreads, int priority, unsigned int stackSize) { unsigned int lastChunkNum;

HANDLE "thread;

MTRFLargs 'threadArgs;

LONG threadCount;

HANDLE threadsFinished;

HANDLE parentThread,- int parentPriority; void (*thread_start) (MTRFLargs *args); int t;

DWORD threadID; int counter;

CRITICAL_SECTION counterLock; unsigned int blockFirst, blockSize, blockRemainder ;

BOOL returnOK;

DWORD returnCode;

CHECKINPUTVALUE (chunk != NULL); CHECKINPUTVALUE (ValidCondition (condition) ) ;

CHECKINPUTVALUE! ! InfiniteRange (initial, condition, bound, step)); CHECKINPUTVALUE! (chunkSize > 0) || (chunkSize == 0 &&

NullRange (initial, condition, bound, step))); CHECKINPUTVALUE (ValidMapping (mapping) ) ; if (mapping != ΞTHREADS_MAPPING_SIMPLE) CHECKINPUTVALUE! (numThreads > 0) || (numThreads == 0 &&

NullRange (initial , condition, bound, step))); CHECKINPUTVALUE (

ValidPriority (priority) || priority == ΞTHREADS_PRIORITY_PARENT) CHECKINPUTVALUE (ValidStackΞize (StackSize) ) ; if (NullRange (initial, condition, bound, step)) return STHREADΞ_ERROR_NONE; lastChunkNum = LAST_CHUNK_NU ( initial, condition, bound, step, chunkSize); CHECKMEMORYALLOC (! (mapping == STHREADS_MAPPING_SIMPLE && lastChunkNum >= INT_MAX) ) ^■ if (mapping == STHREADS_MAPPING_SIMPLE) numThreads = (int) (lastChunkNum + 1); if ((unsigned int) (numThreads - 1) > lastChunkNum) numThreads = (int) (lastChunkNum + 1); if (numThreads == 1) mapping = STHREADΞ_MAPPING_INTERLEAVED ; if ((unsigned int) (numThreads - 1) == lastChunkNum) mapping = STHREADΞ_MAPPING_ΞIMPLE;

CHECKMEMORYALLOCInumThreadε <= INT_MAX/sizeof (HANDLE) ) ; thread = (HANDLE *) malloc (numThreads'sizeof (HANDLE) ) ; CHECKMEMORYALLOC ( thread != NULL) ;

CHECKMEMORYALLOC (numThreads <= INT_MAX/sizeof (MTRFLargs) ) ; threadArgs = (MTRFLargs *) malloc (numThreads'sizeof (MTRFLargs) ) ; CHECKMEMORYALLOC ( threadArgs != NULL); parentThread = GetCurrentThread ( ) ; believe (parentThread != NULL) ; parentPriority = GetThreadPriority (parentThread) ; believe (parentPriority != THREAD_PRIORITY_ERROR_RETURN) ; believe (ValidPriority (parentPriority) ) ,- if (priority != ΞTHREADΞ_PRIORITY_PARENT) { returnOK= ΞetThreadPriority (parentThread, priority); believe (returnOK) ; } switch (mapping) { case STHREADS_MAPPING_ΞIMPLE: thread_start = SMTRFLthread; break; case ΞTHREADS_MAPPING_DYNAMIC : counter = numThreads - 1;

InitializeCriticalSection(ScCounterLock) ; thread_start = DMTRFLthread; break ; case STHREADS_MAPPING_BLOCKEL blockFirst = 0, blockSize =

(lastChunkNum - (((unsigned int) numThreads ) - 1 ) ) / ((unsigned int) numThreads) + 1, blockRema der =

(lastChunkNum - ( ( (unsigned int ) numThreads ) - 1 ) ) % ((unsigned int] numThreads), thread_start = BIMTRFLthread, break; case STHREADΞ_MAPPING_INTERLEAVED blockSize =

(lastChunkNum - ( ( (unsigned int) numThreads ) - 1 ) ) / ((unsigned mt) numThreads) + 1, blockRemamder =

(lastChunkNum - ( ( (unsigned int) numThreads) - ! ) ) '■ ((unsigned mt) numThreads), thread_start = BIMTRFLthread break, default assert ( false) , threadCount = numThreads , threadsFmished = CreateEvent (NULL TRUE F^LΞE 'IULD

CHECKΞYNCCREATE(threadsFmιsheα '= NULL) for (t = 0, t < numThreads, t++) { threadArgs [t] .chunk = chunk, threadArgs [ t ] args = args, threadArgs [t] initial = initial, threadArgs [ t] condition = condition, threadArgs [t] .bound = bound, threadArgs [t] step = step, threadArgs [t] chunkΞize = cnunkSize threadArgs [t] threadCount = (LPLONG) &threadCount , threadArgs [t] threadsF ished = threadsFmished, switch (mapping) { case STHREADS_MAPPING_ΞIMPLE threadArgs [t] chunkFirst = t, break; case STHREADΞ_MAPPING_DYNAMIC threadArgs [t] chunkFirst = t, threadArgs [ t] counter = -counter threadArgs [t] counterLock = &counterLock, break, case STHREADΞ_MAPPING_BLOCKED threadArgs [t] chunkFirst = blockFirst, threadArgs [ t] chunkLast = blockFirst + (blockSize - 1) threadArgs [t] . chunkstep = 1, if (blockRemamder > 0) { threadArgs [ t ] chunkLast = threadArgs [ ] chunkLast + 1 blockRemamder = blockRemamder - 1 ) blockFirst = threadArgs [t] chunkLast + 1, break, case STHREADS_MAPPING_INTERLEAVED threadArgs [t] .chunkFirst = t, threadArgs [t] .chunkLast = blockΞize* ( (unsigned mt) numThreads) + t, threadArgs [t] .chunkStep =

(unsigned int] numThreads, if (blockRema der == 0) threadArgs [t] chunkLast = threadArgs [ t ] chunkLast - ((unsigned mt) numThreads) else blockRemamder = blockRemamder - 1, break; default believe (false) , ) thread[ t] = CreateThread(NULL, stackSize (LPTHREAD_START_ROUTINE) thread_start (LPVOID) SthreaαArgs_^] , CREATE_ΞUΞPENDED, kthreadlD) CHECKTHREADCREATE (thread[t] ' = NULL) ; if (priority == STHREADΞ_PRIORITY_PARENT)

SetThreadPriority(threadtt] , parentPriority) , else

SetThreadPrιorιty(thread[t] , priority) , ResumeThread ( thread [ t] ) ,-

) if (priority '= STHREADS_PRIORITY_PARENT) {

SetThreadPriority (parentThread, parentPriority) , believe (returnOK) , } returnCode = WaitForΞmgleObject ( threadsFmished INFINITE) CHECKOTHERfreturnCode '= WAIT_FAILED) , returnOK = CloseHandle ( threadsFmished) , CHECKOTHER( returnOK == TRUE), for (t = 0, t < numThreads, t++) { returnOK = CloseHandle ( threadtt] )

CHECKOTHER( returnOK == TRUE), ) if (mapping == ΞTHREADΞ_MAPPING_DYNAMIC )

DeleteCriticalΞection (SCounterLock) , free( thread) , fre ( threadArgs ) return STHREADS_ERROR_NONE,

/*

/* Multithreaded nested regular for loop (for future release')

/' int ΞthreadsNestedRegularForLoop ( mt nesting, void Cchunk) (int first!], int last[], mt step(] void *args), void 'args, mt initial!], int conditionN, int bound!], int step!] mt chunkΞιze[], mt mapping!], int numThreads!], t priority, unsigned mt stackSize) Arguments . */

- nesting degree of nesting */

- chunk function to execute chunk of iterations of loop body */

- args pointer to arguments of loop body

- initial initial value of control variable at each nesting level */

- condition condition between control variable and bound value */ at each nesting level */

- bound bound value of control variable at each nesting level '/

- step step value of control variable at each nesting level

- chunkSize number of iterations per chunk at each nesting level ^«/

- mapping mapping of chunks onto threads at each nesting level */

- numThreads number of threads at each nesting level */

- priority priority of threads */

- stackSize stack size of threads */ Returns : */

- error code . */ Requirements : */

- nesting >= 1 */

- chunk '= NULL &£= */ chunk is a valid void C) (int *, t *, mt *, void *) function */

- initial '= NULL && */ initial is an array of at least nesting mts */

- condition ' = NULL S & */ condition is an array of at least nesting nts */

- forall (ι = 0, l < nesting, ι++) ValidCondition (condition [I] ) */

- bound ' = NULL && */ bound is an array of at least nesting mts */

- step ' = NULL S S */ step is an array of at least nesting mts */

- forall (ι = 0, I < nesting, ι++) */

' InfiniteRange (initial [i] , condιtιon[ ] bound[ι] step[ι]) */ exists (₃ = 0, ₃ < l, ₃++) */

NullRange (initial [₃] , condition!;) ] bound[₃], step!}]) */

- forall d = 0; l < nesting, ι++) */ /* (chunkSιze[ι] > 0) | |

/* (chunkSize [l] == 0 &&

/* NullRange (initial[i] , conditionti] , boundti], stepti]))

/* - forall d = 0, i < nesting, ι++) ValidMapping (mapping [l] )

/* - forall (ι = 0, l < nesting, ι++)

/* mapping [i] '= STHREADS_MAPPING_SIMPLE =>

/* (numThreads [i] > 0) ||

/* (numThreads [l] == 0 &&

/* NullRange (initial [l] , conditionti], boundti], stepti]))

/* - ValιdPrιorιty(prιorιty) | | priority == ΞTHREADS_PRIORITY_PARENT

/* - ValιdStackSιze( stackSize)

{

CHECKINPUTVALUE (nesting >= 1), CHECKINPUTVALUE ( Chunk '= NULL), CHECKINPUTVALUE (initial '= NULL) CHECKINPUTVALUE (condition '= NULL), for d = 0, l < nesting, ι++)

CHECKINPUTVALUE (ValidCondition (condition [ l] ) ) CHECKINPUTVALUE (bound '= NULL), CHECKINPUTVALUE (step '= NULL), for (ι = 0, l < nesting, ι++) { if (NullRange (initial [l] conditionti] bound! l] stepti])) break CHECKINPUTVALUE (

' In initeRange (initial [i condition [I] bound [l ] tepti] ) )

} for (ι = 0, l < nesting, ι++) CHECKINPUTVALUE ( ( chunksize [ l ] > 0 ) ( chunkSize [l] == 0 && NullRange (initial [l conditionti] boundti] stepfi]); for d = 0, l < nesting, ι++)

CHECKINPUTVALUE (ValidMapping (mapping [l ] ) ) for d = 0, l < nesting, ι++) if (mapping [i] '= STHREADΞ_MAPPING_SIMPLE) CHECKINPUTVALUE (

(numThreads [i] > 0) || (numThreads [i] == 0 && NullRange (initial [l] condition [l] boundli] stepti] ) ) ) CHECKINPUTVALUE (

ValidPriority (priority) | | priority == ΞTHREADS_PRIORITY. PARENT) CHECKINPUTVALUE (ValidStackSize (StackSize) ) return ΞTHREADS_ERROR_NONE ,

/.

/* Multithreaded general for loop (for future release') */

/» ,_/ int SthreadsGeneralForLoopf void Cbody) (void 'control, void *args) sιze_t controlΞize, void 'args, int ('test) (void *args), void ('increment) (void *args) void Ccopy) (void 'control, void *args) , mt mapping, int numThreads, at priority, unsigned mt stackSize) Arguments . */

- body function to execute one iteration of loop body */

- controlSize size (as returned by sizeof) of control variables '/

- args pointer to arguments of loop */

- test function to test loop termination condition */

- increment function to increment control variables withm arguments */

- copy function to copy control variables from arguments */

- mapping mapping of iterations onto threads */

- numThreads number of threads */

- priority priority of threads */

- stackSize stack size of threads '/ Returns */

- error code */ Requirements . */

- body '= NULL && */ body is a valid void (void * void *) function */

- test '= NULL && */ /* test is a valid int C) (voi- *) function. *,

/* - increment != NULL fct */

/* increment is a valid void (*)(void *) function. */

/* - copy != NULL && */

/* copy is a valid void (*) (void *, void *) function. */

/* - mapping == STHREADS_MAPPING_SIMPLE | | */

/* mapping == STHREADS_MAPPING_DYNAMIC . */

/* - mapping != STHREADS_MAPPING_SIMPLE => */

/* (numThreads > 0) || (numThreads == 0 && ! test (args) ) . */

/* - ValidPriority (priority) | | priority == STHREADS_PRIORITY_PARENT. */ /* - ValidStackSize (stackSize) . */

(

CHECKINPUTVALUE (body != NULL);

CHECKINPUTVALUE (test != NULL);

CHECKINPUTVALUE (increment != NULL);

CHECKINPUTVALUE (copy != NULL);

CHECKINPUTVALUE (mapping == STHREADS_MAPPING_ΞIMPLE | | mapping == STHREADS_MAPPING_DYNAMIC ) ; if (mapping != ΞTHREADΞ_MAPPING_ΞIMPLE)

CHECKINPUTVALUE ( (numThreads > 0) || (numThreads == 0 && ! test (args) )) ;

CHECKINPUTVALUE I

ValidPriority (priority) || priority == STHREADS_PRIORITY_PARENT ) ;

CHECKINPUTVALUE(ValidStackSize(stackSize) ) ; return ΞTHREADS_ERROR_NONE ; )

/*

/* Synchronization object status constants */

/* »/

♦define INITIALIZED 123456 ♦define FINALIZED 654321

/* ./

/* Flags */

_{/* *} , typedef struct { int initialized, finalized;

LONG numWaiting;

HANDLE signal; ) PrivateFlag;

♦define PRIVATE (flagPtr) ((PrivateFlag *) (flagPtr))

/. ./ int SthreadsFlaglnitialize (SthreadsFlag 'flag) t

CHECKINPUTVALUE ( flag != NULL);

PRIVATE ( flag) -initialized = INITIALIZED;

PRIVATE ( flag) ->finalized = -FINALIZED;

PRIVATE! flag) ->numWaiting = 0;

PRIVATE!flag) ->signal = CreateEven (NULL, TRUE, FALΞE, NULL) ;

CHECKΞYNCCREATE(PRIVATE(flag)->signal != NULL) ; return STHREADS_ERROR_NONE ;

int SthreadsFlagFinalize( SthreadsFlag 'flag) t

BOOL returnOK;

CHECKINPUTVALUE (flag != NULL) ;

CHECKUNINITIALIZED(PRIVATE (flag) ^initialized == INITIALIZED); CHECKFINALIZED(PRIVATE(flag)->finalized == -FINALIZED); CHECKINUSE ( PRIVATE (flag)->numWaiting == 0);

PRIVATE(flag)->finalized = FINALIZED; returnOK = CloseHandle (PRIVATE (flag) ->signal) ; CHECKOTHE IreturnOK == TRUEj , return STHREADS_ERROR_NONE; }

/» ./ int SthreadsFlagSe (SthreadsFlag 'flag) {

BOOL returnOK,-

CHECKINPUTVALUE (flag != NULL);

CHECKUNINITIALIZED (PRIVATE! flag) ->mιtιalιzed == INITIALIZED);

CHECKFINALIZED(PRIVATE(flag)->fmalιzed == -FINALIZED); returnOK = SetEvent (PRIVATEfflag) ->sιgnal) ; CHECKOTHERIreturnOK) ; return ΞTHREADΞ_ERROR_NONE ; )

/_* ._/ int ΞthreadsFlagChec (SthreadsFlag 'flag) {

DWORD returnCode;

CHECKINPUTVALUE(flag '= NULL) ;

CHECKUNINITIALIZED(PRIVATE(flag)->ιmtιalιzed == INITIALIZED), CHECKFINALIZED(PRIVATE(flag)->fmalιzed == -FINALIZED),

Interlockedlncrement (&PRIVATE ( flag) ->numWaιtmg) , returnCode = WaitForSingleObject (PRIVATE ( flag) ->sιgnal , INFINITE);

CHECKOTHERIreturnCode '= WAIT_FAILED) ;

InterlockedDecrement (^PRIVATE ( flag) ->numWaιtιng) ; return ΞTHREADS_ERROR_NONE; }

/. ./ int ΞthreadsFlagReset (SthreadsFlag 'flag) (

BOOL returnOK;

CHECKINPUTVALUE (flag != NULL) ;

CHECKUNINITIALIZED ( PRIVATE ( f lag ) ->ιnιtιalιzed == INITIALIZED) ; CHECKFINALIZED (PRIVATE ( flag) ->fmalιzed == -FINALIZED) , CHECKINUSE ( PRIVATE ( flag) ->numWaιtmg == 0 ) ;

PRIVATE! flag) ->numWaιting = 0; returnOK = ResetEvent (PRIVATEfflag) ->sιgnal) ,

CHECKOTHERIreturnOK) ; return STHREADΞ_ERROR_NONE; ) . „ I

♦undef PRIVATE

_{/* * /}

/* Counters * /

_{/ * 1 /} typedef struct node 'link; typedef struct node { unsigned int value; int numWaiting;

HANDLE signal ; link next; ) node,- typedef struct { mt initialized, nnaiizeα, unsigned at count; link waitmgList ; CRITICAL_SECTION lock; } PπvateCounter;

♦define PRIVATE (counter Ptr) ( ( PπvateCounter * ) ( counterPtr) )

_/* »_/ int ΞthreadsCounterlnitialize (SthreadsCounter 'counter) { link startΞentmel, endΞentmel,

CHECKINPUTVALUE ( counter ' = NULL ) ,

PRIVATE (counter) -initialized = INITIALIZED,

PRIVATE (counter) ->fιnalιzed = -FINALIZED,

PRIVATE (counter) ->count = 0 ; startΞentmel = (link) malloc (sizeof (node) ) ,

CHECKMEMORYALLOC (startΞentmel '= NULL), endSentinel = (link) malloc (sizeof (node) ) ,

CHECKMEMORYALLOC (endSentinel '= NULL), startSentmel->sιgnal = NULL, startSentιnel->next = endSentinel, startSentιnel->numWaιtmg = 0 endΞentιnel->sιgnal = NULL, endSentinel->next = NULL, endSentmel->numWaιtmg = 0,

PRIVATE (counter) ->waιtmgLιst = startSentinel ,

InitializeCπtlcalSectionf (LPCRITICAL_SECTION) &PRIVATE (counter) ->lock) , return STHREADS_ERROR_NONE, )

/* »/ int SthreadsCounterFmalize (SthreadsCounter 'counter) ( link p, next;

BOOL returnOK,

CHECKINPUTVALUE ( counter ' = NULL ) ,

CHECKUNtNITIALIZED(PRIVATE(counter) -initialized == INITIALIZED) , CHECKFINALIZED(PRIVATE(counter) ->fιnalιzed == -FINALIZED), CHECKINUΞE (PRIVATE (counter) ->waιtmgLιst->next->next == NULL),

PRIVATE ( counter) ->fιnalιzed = FINALIZED, p = PRIVATE (counter) ->waιtmgLιst, next = p->next, free(p) ,- p = next; while (p->next '= NULL) { returnOK = CloseHandle (p->sιgnal ) ,

CHECKOTHERtreturnOK == TRUE); next = p->next; free(p) ,- p = next; } free(p) ; DeleteCrιtιcalΞectιon( (LPCRITICAL_SECTION) &PRIVATE (counter)->lock) ; return STHREADS_ERROR_NONE ,

int SthreadsCounterIncrement (SthreadsCounter 'counter, unsigned int amount) { link start, p;

BOOL returnOK;

CHECKINPUTVALUE ( counter ' = NULL ) , CHECKUNINITIALIZED ( PRIVATE ( counter) -initialized == INITIALIZED ) , ι-HECKFINAι-IZ£D (pkiVΛit. n.uui .- ) - niiαι.-.u -- . _^..^._^--_^^ CHECKCOUNTEROVERFLOW (PRIVATE!counter) ->count <= UINT_MAX - amount);

EnterCriticalΞectionf (LPCRITICAL_SECTION) &PRIVATEI counter) ->lock) ;

PRIVATE (counter) ->count = PRIVATE (counter) ->count + amount; start = PRIVATE (counter) ->waitingList; p = start->next; while (p->next != NULL && p->value <= PRIVATE (counter) ->count) { returnOK = SetEvent (p->sιgnal) ;

CHECKOTHER (returnOK) ; start->next = p->next; p = start->next; } LeaveCπticalΞectiont (LPCRITICAL_ΞECTION) &PRIVATE (counter ) ->lock) ,- return STHREADΞ_ERROR_NONE;

int SthreadsCounterCheck (SthreadsCounter 'counter unsigned t value) { link prev, p; link waitmgNode;

BOOL returnOK;

DWORD returnCode;

CHECKINPUTVALUE (counter '= NULL);

CHECKUNINITIALIZED(PRIVATE(counter)->ιnιtιalιzed == INITIALIZED), CHECKFINALIZED (PRIVATE (counter) ->f alιzed == -FINALIZED);

EnterCritlcalSectιon( (LPCRITICAL_SECTION) ScPRIVATE (counter) ->lock) , if (PRIVATE (counter) ->count >= value)

LeaveCriticalΞectionf (LPCRITICAL_ΞECTION) &PRIVATE (counter ) ->lock) , else { prev = PRIVATE ( counter) ->waιtιngLιst; p = prev->next; while (p->next != NULL && p->value < value) ( prev = p; p = p->next; ) if (p->value == value) { waitmgNode = p; waιtιngNode->numWaitmg = waιtmgNode->numWaιtmg + 1, } else { waitmgNode = (link) malloc (sizeof (node) ) , waιtmgNode->value = value; waιtmgNode->sιgnal = CreateEvent (NULL, TRUE, FALSE, NULL), waιtιngNode->next = p; waιtιngNode->numWaitmg = 1 ; prev->next = waitingNode; }

LeaveCritιcalSectlon( (LPCRITICAL_SECTION) ScPRIVATE (counter ) ->lock) , returnCode = WaitForSmgleOb₃ect (waιtmgNode->sιgnal, INFINITE); CHECKOTHER (returnCode != WAIT_FAILED) ;

EnterCriticalSectionf (LPCRITICAL_SECTION) ScPRIVATE (counter) ->lock) ; waitιngNode->numWaiting = waιtιngNode->numWaitmg - 1, if (waιtingNode->numWaitmg == 0) { returnOK = CloseHandle (waιtmgNode->sιgnal ) ,-

CHECKOTHER ( returnOK == TRUE) ; free (waitmgNode) ; } LeaveCriticalΞectlont (LPCRITICAL_SECTION) ScPRIVATE (counter) ->lock) , return ΞTHREADS_ERROR_NONE;

mt SthreadsCounterReset (SthreadsCounter 'counter) { link p, q;

BOOL returnOK; CHECKINPUTVALUE (counter != NULL);

CHECKUNINITIALIZEDI PRIVATE (counter) -initialized == INITIALIZED); CHECKFINALIZED (PRIVATE (counter) ->finalized == -FINALIZED); CHECKINUSE(PRIVATE(counter)->waitingList->next->next == NULL) ;

PRIVATE(counter) ->count = 0; p = PRIVATE(counter)->waitingList; q = p->next; while (q->next != NULL) { p->next = q->next; returnOK = CloseHandle (q->signal) ;

CHECKOTHER (returnOK == TRUE) ; free(q) ; q = p->next; } return STHREADS_ERROR_NONE ; )

/_*

#undef PRIVATE

typedef struct { int initialized, finalized;

HANDLE holder;

CRITICAL_ΞECTION lock; ) PrivateLock;

#define PRIVATE (lockptr) ((PrivateLock *) (lockPtr))

/_* int ΞthreadsLocklnitializefΞthreadsLock 'lock) {

CHECKINPUTVALUE (lock != NULL) ;

PRIVATE(lock) -initialized = INITIALIZED;

PRIVATE (lock) ->finalized = -FINALIZED;

PRIVATE (lock) ->holder = NULL;

InitializeCriticalΞectionf (LPCRITICAL_ΞECTION) &PRIVATE (lock) ->lock) ; return STHREADΞ_ERROR_NONE; )

_/* ,_/ int SthreadsLockFinalize (SthreadsLock 'lock) {

CHECKINPUTVALUE (lock != NULL) ;

CHECKUNINITIALIZED(PRIVATE(lock) -initialized == INITIALIZED);

CHECKFINALIZED(PRIVATE(lock)->finalized == -FINALIZED);

CHECKINUSE (PRIVATE) lock) ->holder == NULL) ;

PRIVATE!lock) ->finalized = FINALIZED;

DeleteCriticalΞection! (LPCRITICAL_SECTION) ScPRIVATE (lock) ->lock) ; return STHREADS_ERROR_NONE; )

/_* int SthreadsLockAcquire (SthreadsLock 'lock) {

HANDLE thisThread; thisThread = GetCurrentThread 0 ,- believe ( thisThread! = NULL); int SthreadsBarrierFinalize (SthreadsBarrier 'barrier) {

BOOL re urnOK;

CHECKINPUTVALUE (barrier ! = NULL) ;

CHECKUNINITIALIZED (PRIVATE (barrier) ->initialized == INITIALIZED) ; CHECKFINALIZED (PRIVATE (barrier) ->f inalized == -FINALIZED) ; CHECKINUSE I PRIVATE (barrier) ->numWaiting == 0 ) ;

PRIVATE (barrier) ->finalized = FINALIZED,- returnOK = CloseHandle (PRIVATE (barrier) ->gate [ 0 ] ) ;

CHECKOTHER ( returnOK == TRUE) ; returnOK = CloseHandle (PRIVATE (barrier) ->gate [ l ] ) ;

CHECKOTHER (returnOK == TRUE) ;

DeleteCriticalSection ( (LPCRITICAL_SECTION) ScPRIVATE (barrier ) ->lock) ; return STHREADS_ERROR_NONE ;

/* int SthreadsBarrierPass (SthreadsBarrier 'barrier) { int currentGate, nextGate;

BOOL returnOK;

DWORD returnCode;

CHECKINPUTVALUE (barrier != NULL);

CHECKUNINITIALIZED(PRIVATE(barrier)->initialized == INITIALIZED) ; CHECKFINALIZED (PRIVATE (barrier) ->finalized == -FINALIZED);

EnterCriticalSectio f (LPCRITICAL_ΞECTION) ScPRIVATE (barrier) ->lock) ; currentGate = PRIVATE (barrier) ->currentGate;

PRIVATE (barrier) ->numWaiting = PRIVATE (barrier) ->numWaiting + 1 ; if (PRIVATE (barrier) ->numWaiting == PRIVATE (barrier) ->numThreads) ( nextGate = (currentGate + 1)%2; returnOK = ResetEvent (PRIVATE (barrier) ->gate [nextGate] ) ;

CHECKOTHER (returnOK) ;

PRIVATE (barrier) ->numWaiting = 0; returnOK = SetEvent (PRIVATE (barrier) ->gate [currentGate] ) ;

CHECKOTHER (returnOK) ;

PRIVATE (barrier) ->currentGate = nextGate;

LeaveCriticalSectionl (LPCRITICAL_ΞECTION) ScPRIVATE (barrier ) ->lock) ■ ) else {

LeaveCriticalSection( (LPCRITICAL_ΞECTION) ScPRIVATE (barrier ) ->lock) ; returnCode = WaitForSingleObject (

PRIVATE (barrier) ->gate[currentGate] , INFINITE) ;

CHECKOTHER (returnCode != WAIT_FAILED) ; } return STHREADS_ERROR_NONE;

int SthreadsBarrierRese (SthreadsBarrier 'barrier, int numThreads) {

BOOL returnOK;

CHECKINPUTVALUE (barrier != NULL);

CHECKUNINITIALIZED(PRIVATE(barrier) -initialized == INITIALIZED), CHECKFINALIZED (PRIVATE (barrier) ->finalized == -FINALIZED); CHECKINUΞE (PRIVATE (barrier)->numWaiting == 0); CHECKINPUTVALUE (numThreads >= 1);

PRIVATE (barrier) ->numThreads = numThreads;

PRIVATE (barrier) ->numWaiting = 0; returnOK = ResetEvent (PRIVATE (barrier) ->gate [0] ) ;

CHECKOTHER (returnOK) ; returnOK = SetEvent (PRIVATE (barrier) ->gate [1] ) ^■

CHECKOTHER (returnOK) ; _ .

PRIVATE (barrier) ->currentGate = 0; return STHREADS_ERROR_NONE; }

_{/ *} „_/

#undef PRIVATE

/ * - — - — ♦ /

/* Priorities _* /

/. int SthreadsGetCurrentPriority (int 'priority) {

HANDLE currentThread; int currentPriority;

CHECKINPUTVALUE (priority ! = NULL) ; currentThread = GetCurrentThread ( ) ,- believe (currentThread ! = NULL) ; currentPriority = GetThreadPriority ( currentThread) ; believe (currentPriority ! = THREAD_PRIORITY_ERROR_RETURN) ;

'priority = currentPriority; return STHREADΞ_ERROR_NONE ; }

_{/ *} int ΞthreadsSetCurrentPriority (int priority) (

HANDLE currentThread;

BOOL returnOK;

CHECKINPUTVALUE(ValidPriority(prιority) ) ; currentThread = GetCurrentThread ( ) ; believe (currentThread != NULL) ; returnOK = SetThreadPriority (currentThread, priority); believe (returnOK) ; return STHREADS_ERROR_NONE; )

/_*

Claims

What is claimed is:

1. A method of synchronizing threads in a multiple thread system, comprising: defining an entity which maintains a count of values which increases the value maintained by the object; and defining a check operation for said element in which, during the checking operation, a calling thread is suspended, and the check is suspended until the value maintained by the entity has reached or exceeded a given value .

2. A method as in claim 1 which said entity is allowed only to increment between allowable values, and not to decrement its value.

3. A method as in claim 1 wherein said entity is a counter that is only allowed to include integers.

4. A method as in claim 3 wherein an initial value of

the counter is zero.

5. A method as in claim 1, wherein said entity is a/are flags.

6. An apparatus comprising a machine -readable storage medium having executable instructions for managing threads in a multithreaded system, the instructions enabling the machine to: define an entity which maintains a count of values and which is allowed to increment between allowable values; determine a request for value of the element from a calling thread; and establish a check operation for said element in which said calling thread is suspended until the entity reaches a predetermined value.

7. An apparatus as in claim 6, wherein said entity is a monotonically increasing counter.

8. An apparatus as in claim 6, wherein said entity is a flag.

9. A apparatus as in claim 6 wherein said system has a plurality of processors therein, wherein each of said processors is running at least one different ones of said threads .

10. A method as in claim 1, further comprising defining an error for an operation that decreases the value maintained by the object to occur concurrently with any check operation on the object.

11. A method as in claim 1, wherein the value maintained by the object is a numeric value and the increment operation increases the value by a numeric amount .

12. A method as in claim 1, wherein the value maintained by the object is a Boolean value or a binary value and the increment operation is a "set" operation that changes the value from one state to the other state.

13. A method as in claim 2, wherein the value maintained by the object is a Boolean value or a binary value and the increment operation is a "set" operation that changes the value from one state to the other state.

14. A method as in claim 12, further comprising establishing an error for an increment operation on the object to occur more than once.

15. A method of defining program code, comprising: determining different parts of a program which can be executed either sequentially, or in multithreaded parallel by different threads, and which has equivalent results when executed in said sequential or multithreaded parallel; and defining said different parts as being multi- threadable.

16. A method as in claim 15 wherein said determining is based on a set of conditions that are sufficient to ensure the equivalence of sequential and multithreaded execution of a program construct.

17. A method as in claim 15 wherein said different parts are defined as being multithreadable using an equivalence annotation within the program code.

18. A method as in claim 17 wherein said annotation is a pragma.

19. A method as in claim 17 wherein said annotation is a code comment.

20. A method as in claim 15 further comprising, within said code, multithreaded constructs, in addition to said multithreadable parts.

21. A method as in claim 15 wherein said multithreadable parts includes information which, if executed as threads, will include the same result as if executed sequentially.

22. A method as in claim 15 wherein said part is a multithreadable block of information.

23. A method as in claim 22 wherein said part is a multihreadable FOR loop.

24. A method as in claim 15 further comprising synchronizing threads using a monotonically- increasing counter.

25. A method as in claim 15 further comprising synchronizing threads using a flag.

26. A method as in claim 16, wherein the equivalence annotation includes a new or existing keyword or reserved word in the program.

27. A method as in claim 16, wherein the equivalence annotation takes the form of a character formatting in the program, which can be such as boldface, italics, underlining, or other formatting.

28. A method as in claim 16, wherein the equivalence annotation takes the form of a special character sequence in the program.

29. A method as in claim 16, wherein the equivalence annotation is contained in a file or other entity separate from the program.

30. A method as in claim 16, wherein the sequential interpretation of the execution of the block construct is that statements are executed one at a time in their textual order, and the multithreaded interpretation of the execution of the block construct is that statements of are partitioned among a set of threads and executed ' concurrently by those threads.

31. A method as in claim 16 further comprising using monotonic thread synchronization to synchronize actions among threads.

32. A method as in claim 15 wherein: explicitly multithreaded program constructs are always executed according to a multithreaded interpretation multithreadable program constructs are either executed according a multithreaded interpretation or executed according to a sequential interpretation; and sequential or multithreaded execution of multithreadable program constructs is at user selection.

33. A method as in claim 32, wherein the sequential or multithreaded execution of multithreadable program constructs is signalled by a pragma in the program.

34. A method as in claim 32, wherein the method for selecting sequential or multithreaded execution of multithreadable code constructs is a variable that is dependent of the value of a variable defined in the program or in the environment of the program.

35. A method of claim 32 wherein said multiple threaded construct is a block or for loop.

36. A method of coding a program, comprising: defining a first portion of code which must always be executed according to multithreaded semantics, as a multithreaded portion of code; defining a second portion of code, within the same program as said first portion of code, which may be selectively executed according to either sequential or multithreaded techniques, as a multithreadable code construct; and allowing a program development system to develop said multithreadable code construct as either a sequential or multithreaded construct.

37. A method as in calim 36, wherein said program development system includes a compiler.

38. A method as in claim 36 wherein said multithreaded construct defines an operation which has no sequential equivalent.

39. A method as in claim 38 wherein said multithreaded construct is control of multiple windows in a graphical system.

40. A method as in claim 38 wherein said multithreaded construct is control of different operations of a computer.

41. A method as in claim 37 wherein said operation is executed on a multiple processor system, and different parts of said operation are executed on different ones of the processors.

42. A method as in claim 37 wherein said multithreadable constructs include a synchronization mechanism.

43. A method as in claim 42 wherein said synchronization mechanism is a monotonically increasing counter.

44. A method as in claim 43 wherein said synchronization mechanism is a special flag.

45. A method of integrating a structured multithreading program development system with a standard program development system, comprising: detecting program elements which include a specified annotation; calling a special program development system element which includes a processor that modifies based on the annotation to form a preprocessed file; and calling the standard program development system to compile the preprocessed file.

46. A method of operating a program language, comprising: defining equivalence annotations within the programming language which indicate to a program development system of the programming language information about sequential execution of said statement; and developing the programs as a sequential execution or as a substantially simultaneous execution based on contents of the equivalence annotations.

47. A method as in claim 46 wherein the equivalence annotation indicates that the statements are multithreadable .

48. A method as in claim 46 wherein the equivalence annotation indicates that the statements are either multithreaded or multithreadable.

49. A method as in claim 48 wherein said multithreaded statements must be executed in a multithreaded manner.

50. A method as in claim 48 wherein said multithreadable annotations indicate that the statements can be executed in either multithreaded or sequential manner .

51. A method as in claim 46 wherein said equivalence annotation is a pragma.

52. A method as in claim 46 wherein said equivalence annotation is a specially-defined comment line .

53. A method as in claim 47 further comprising synchronizing access of threads to shared memory using a specially defined synchronization element.

54. A method as in claim 53 wherein said synchronization element is a synchronization counter.

55. A method as in claim 54 wherein said synchronization counter is monotonically increasing, cannot be decreased, and prevents thread operation during its check operation.

56. A method as in claim 53 wherein said synchronization element is a synchronization flag.

57. A method as in claim 56 wherein said synchronization counter is monotonically increasing, cannot be decreased, and prevents thread operation during its check operation.

58. A method as in claim 54 wherein said s counter includes a check operation, wherein said check operation suspends a calling thread.

59. A method as in claim 58 further comprising maintaining a list of suspended threads.

60. A method of modifying an existing program development system and environment, comprising: detecting which components of a program contain multithreadable program constructs or explicitly multithreaded program constructs; transforming the components of the program that contain multithreadable program constructs or explicitly multithreaded program constructs into equivalent multithreaded components in a form that can be directly translated or executed by the existing program development system; and invoking the existing program development system to translate or execute the transformed components of the program.

61. A method as in claim 60, wherein said indicating comprises giving distinctive names to said component .

62. A method as in claim 59, wherein the transforming of the components of the program that contain multithreadable program constructs or explicitly multithreaded program constructs is by source-to-source program preprocessing.

63. A method as in claim 61, wherein the result of the source-to-source program preprocessing is a program component that incorporates thread library calls representing to the transformed multithreadable program constructs or explicitly multithreaded program constructs .

64. A method as in claim 63, wherein the thread library is a thread library designed in part or whole for the purpose of representing the transformed multithreadable program constructs or explicitly multithreaded program constructs.

65. A method as in claim 63, wherein the thread library is an existing thread library or a thread library designed for another purpose .

66. A method as in claim 61, wherein the result of the source-to-source program preprocessing is a program component that incorporates standard multithreaded program constructs supported by the existing programming system.

67. A method as in claim 59, further comprising renaming the standard compiler-linker and the standard compiler-linker name is used for a program component transformation tool that subsequently invokes the renamed standard compiler-linker .

68. A method as in claim 59, wherein the operating system is Linux or another variant of the Unix operating system and the existing program development environment is the GNU C or C++ compiler or any other C or C++ compiler that operates under the given variant of the Unix operating system.

69. A method as in claim 59, wherein the existing programming language is a variant of the Java programming language and the thread library is the standard Java thread library.

70. A method of operating a program operation, comprising: defining a block of code which can be executed either sequentially or substantially simultaneously via separate loci of execution; running the program during a first mode in said sequential mode, and running the program during a second mode in said substantially simultaneous mode.

71. A method as in claim 70 wherein said definition is an equivalence annotation.

72. A method as in claim 71 wherein said equivalence annotation is a pragma.

73. A method as in claim 70 wherein, during said sequential execution, variables are shared.

-Ill-

74. A method as in claim 73 wherein said shared variables can be checked, and operation of check does not suspend operations of the program.

75. A method as in claim 70 wherein during said substantially simultaneous operations, variables are shared.

76. A method as in claim 70 further comprising debugging a program in said sequential mode and running a debugged program in said substantially simultaneous mode.

77. An object for synchronizing among multiple threads , comprising : a special object constrained to have (1) an integer attribute value, (2) an increment function, but no decrement function, and (3) check function that suspends a calling thread.

78. A method as in claim 77 wherein said check function suspends a calling thread for a specified time.

79. An object as in claim 78 wherein said object includes a list of thread suspension queues.

80. An object as in claim 77 further comprising a reset function.

81. An object as in claim 77 wherein said object is a counter.

82. An object as in claim 77 wherein said object is a flag having only first and second values.

83. A method of integrating a thread management system with an existing program development system, comprising: first, running a pre-program development system that looks for special annotations which indicate multithreaded and multithreadable block of code; using said special layer as an initial linker; and then, passing the already linked program to the standard program development system.

84. A method as in claim 83 wherein said program is a C programming language.