WO1996023258A1

WO1996023258A1 - Tracking the state of transactions

Info

Publication number: WO1996023258A1
Application number: PCT/US1996/000666
Authority: WO
Inventors: Matthew C. Mccline; Srikanth Shoroff; James M. Lyon; William J. Carley; Charles S. Johnson; Sheldon J. Finkelstein
Original assignee: Tandem Computers Incorporated
Priority date: 1995-01-23
Filing date: 1996-01-17
Publication date: 1996-08-01
Also published as: EP0806008A1; EP0806008A4; JPH10512985A; CA2206302A1

Abstract

In one embodiment, the present invention provides a method of treating undo work for a transaction as it would treat original work. This method is used to ensure that log records have been fused to disk (22, 24) and that lost log records, if there are any, are reliably detected. The method can also determine when a newly arrived request for update has arrived too late such that aborted transactions are not operated upon. In another embodiment of the present invention, a transaction control block (40) 'fault in' is utilized. Feature deals with the ability to create Transaction Control Block data structure in a CPU (12, 14), if such a structure does not already exist. A Transaction Control Block (40) provides a description for its associated transaction.

Description

TRACKING THE STATE OF TRANSACTIONS

COPYRIGHT NOTICE A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xeroxographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

The present invention is directed to data processing systems, and more particularly to parallel processing systems which track and control data related to on-line transaction processing in order to provide transaction management and to protect the integrity of the user data.

On-line transaction processing ("OLTP") has found a variety of commercial applications in today's industry such as, for example, assisting with financial transactions (e.g., coordinating information from bank ATMs) , tracking data for the New York Stock Exchange, tracking billings for telephone companies, and tracking parts for manufacturing (e.g., automobile parts) . Many of the commercial applications available for OLTP require elaborate protection of integrity of user data along with continuous availability to the OLTP applications for end users. For example, ATMs for banks must have excellent integrity (i.e., make a minimum of, if any, errors) , and ATMs must be available to users for extended periods of time. ATM users would not tolerate mistakes associated with their transactions (e.g., a $500.00 deposit not being credited to a user's account). Moreover, ATMs are often preferably available to users 24 hours a day, seven days a week. To achieve continuous availability in OLTP applications, users must be able to rely on supporting system software. Parallel processing (processing which utilizes process pairs) assists in allowing OLTP to quickly handle numerous individual transactions or small tasks which distributed among multiple processors. Other parallel processing applications include maintaining and accessing larg data bases for record-keeping and decision-making operations, or as media servers that provide an accessible store of information to many users. Parallel processing's particular advantage resides in the ability to handle large amounts of diverse data such as, for example, in decision making operations which may require searches of diverse information that can be scattered among a number of storage devices. Furthermore, a parallel processor media server application could be in an interactive service environment such as "movies-on-demand," that will call upon the parallel processor to provide a vast number of customers with access to a large reservoir of motion pictures kept on retrievable memory (e.g., disk storage devices) . This latter application may well require the parallel processor to simultaneously service multiple requests by locating, selecting, and retrieving the requested motion pictures, and then forwarding the selections to the requesting customers. While parallel processing increases OLTP capabilities, a technique is needed for tracking and controlling data related to OLTP such that (1) transaction management is provided and (2) the integrity of the data is protected.

SUMMARY OF THE INVENTION In the preferred embodiment. Transaction Monitoring Facility (TMF) provides transaction management and protects th integrity of user data. The programmatic construct called a "transaction" is an explicitly delimited operation, or set of related operations, that changes the content of a database fro one consistent state to another. The database operations within a transaction are treated as a single unit. Either all of the changes performed by the transaction are committed and made permanent, or none of the changes is made permanent (the transaction is aborted) . If a failure occurs during the execution of a transaction, whatever partial changes were made to the database are undone automatically, thus leaving the database in a consistent state.

In one preferred embodiment, the present invention provides a method of treating undo work for a transaction as it would treat original work. In particular, the same methods are used to ensure that log records (or Audit Trail records) have been fused to disk and that lost log records, if there are any, are reliably detected. The method can also determine when a newly arrived request for update has arrived too late such that aborted transactions are not operated upon.

In another preferred embodiment of the invention, a transaction control block "fault-in" is utilized. This feature deals with the ability to create a Transaction Control Block (TCB) data structure in a CPU, if such a structure does not already exist. A TCB is a public container with private fields, and it provides a description for its associated transaction. The comments for each field specify which modules have access to which fields.

By utilizing the TCB data structure, the system improves linearly by not allocating a TCB in each CPU at the beginning of a transaction. Instead, each CPU has a table containing the highest sequence number and highest epoch field associated with that sequence number. When a TCB needs to be created in a CPU, the sequence number and epoch number in the table must have appropriate values to allow for the "faulting in" of the TCB. "Faulting in" occurs when a request arrives at a CPU and no TCB exists, so a TCB is created. If the values for a certain sequence number are not appropriate, the fault in is rejected because it is "too late". This typically occurs if the transaction is in the process of aborting due to a failure in some other CPU, or if the transaction is so old that it has been gone from the system for a long period of time. Fault-in of old transactions can be requested if a server has the transaction as its current transaction and attempts to use it after the transaction has been fully aborted.

By including the above identified design features, TMF improves its availability, is easier to manage, and has enhanced performance.

These and other advantages will become apparent to those skilled in this art upon a reading of the following detailed description of the invention, which should be taken conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a simplified representation of a processo system with 2 to 16 processors;

Fig. 2 illustrates how several CPU's may operate on the same transaction;

Fig. 3 illustrates corresponding stacks which are located in the CPUs within the system;

Fig. 4 provides the sequence flow chart for the process that occurs when a request is received by a CPU;

Fig. 5 provides the sequence flow chart for the process that takes place at Phase 1 time; Fig. 6 provides the sequence flow chart for the process that takes place at Phase 2 time; and

Fig. 7 is a graph which indicates how throughput in the present invention is improved as the number of CPUs in th system are increased.

DESCRIPTION OF PREFERRED EMBODIMENTS Turning now to the figures, and for the moment principally Fig. 1, illustrated in simplified form is a processor system with 2 to 16 processors, designated generall with the reference numeral 10. As shown, the processor system 10 comprises a plurality of processors 12, 14,... (CPUs). Each CPU 12, 14,... contains its own RAM 22, 24,... The 2 to 16 CPUs are connected by two high speed buses 26 and 28, and each CPU 12, 14,... has its own input/output line 30, 32,... (I\0) . I\0 lines 30, 32,... connect CPUs 12, 14,... to disk controller 34 and communication controller 36.

Disk Controller 34 translates between protocols from the I/O buses 30, 32,... to the actual electrical signals which are needed by the disks. In the preferred embodiment, one disk controller is provided for eight disks. Similarly, communication controller 36 translates between protocols from the I/O buses 30, 32,... to external communication lines and executes some of the communication protocol. Different types of communication controllers are used for handling different communication lines. This processor system arrangement 10 uses a process pair scheme. There are two CPUs in a process pair, one is the primary CPU and the other is a standby CPU which is only utilized if the other primary CPU is inoperable. The standby CPU is continually updated such that if an error occurs and the primary CPU cannot perform a transaction, the standby CPU will have access to any piece of the required information and, thus, will be able to independently complete the transaction.

This processor system 10 has many other features which apply to every transaction. For example, for each request that takes place in the system, a response must be sent to a transaction manager. In addition, if any mistake or failure takes place, system 10 returns to its prior state as though the transaction never started. Thus, part of a transaction or request may be "undone" if needed. When a transaction is handled by system 10, such a transaction may go through several of the 2 to 16 CPU's 12, 14,... If the operations associated with a transaction occur properly, then the transaction is committed and complete. If a mistake, failure, etc., takes place, system 10 returns to its prior state as though the transaction was never acted upon.

The information related to the transaction before and after it enters the system is recorded in the audit trail (which is sometimes referred to as the log) .

Fig. 2 illustrates how several CPU's may operate on the same transaction. For example, a transaction may enter CP 40, and CPUs 42 and 44 may perform changes 1 through 5, indicated by reference numeral 46. After all the operations related to a transaction have taken place, the changes 46 are sent to the audit trail, and if no problems have occurred, those changes are committed and the transaction is considered complete. In this scenario, there is no return to the prior state because no failure or other problems occurred. If any problem/error does occur, undos 1 through 5, indicated by numeral 48, are performed automatically in order to cause the transaction to return to its prior state. Undos are recorded in the audit trail in the same manner as recorded changes.

Table 1 provides a simple representation of how epoc numbers may be associated with changes and undos recorded in the audit trail. Epoch No. 0, refers to changes 1 through 5 which are recorded in the audit trail. Epoch No. 1 refers to undo numbers 1 through 5 which are also recorded in the audit trail. Similar to changes 46 in Fig. 2, undos 48 also need to be committed without problems for system 10 to return to its prior state. If the undos encounter a problem or error, an undo of the undos takes place.

TABLE 1

Epoch No. Audit Trail

0 1, 2, 5, 3, 4

1 1, 2, 5, 3, 4

Table 2 provides a more complex representation of ho epoch numbers may be associated with changes, undos, and undos of undos which are recorded in the audit trail. For example, Table 2 illustrates Epoch No. 0 referring to changes 1, 2 and 5 which were recorded in the audit trail without 3 and 4 because updates/changes 3 and 4 got lost. Thus, changes 3 and 4 are missing, and, therefore. Epoch No. 0 cannot be committed. Epoch No. 1 has only undos 2 and 5 because the update of undo 1 got lost. Because a problem or error occurred, Epoch No. 2 shows (1) that 2 and 5 are an undo of the undo which occurred in Epoch No. 1 and (2) that 1, 2 and 5 are undos of the changes which occurred in Epoch No. 0. Thus, nothing is lost in Epoch No. 2. When Epoch No. 2 is committed, system 10 will have returned to its prior state. The associated audit trail segment contains the results of the three epochs: changes 1, 2 and 5; undos 2 and 5; undos of undos 2 and 5; and undos 1, 2 and 5. The procedure illustrated in Table 2 continues until the transaction is completely undone. This ensures that the transaction completely occurred, or the system returned to its prior state. Moreover, because the undo is treated the same as a regular change for a transaction, multiple cycles of changes, undos, undos of undos, etc. can occur in a simple fashion until the transaction is completed or until the system is returned to its prior state. In a normal running system epoch numbers rarely reach 2 and almost never reach 3.

TABLE2

Epoch No. Audit Trail

0 1 , 2 , 5

1 2 , 5

2 2 , 5 , 1 , 2 , 5

Fig. 3 illustrates corresponding stacks 60, 62, 64 and 66 which are located in all of the CPU's 70, 72, 74 and 76 of system 10. Each of stacks 60, 62, 64 and 66, contain multiple sections 80, 82, 84 and 86, and each of sections 80, 82, 84 and 86, contain multiple slots. For example, ten slots (or indexes) labeled 0-9 are located in Fig. 3. Each slot contains a store pointer to a transaction control block (TCB) which provides a description of the transaction, a sequence number, and an epoch number. The sequence number is assigned to each incoming transaction sequentially as other transactio are completed. As suggested above. Epoch No. 0 refers to original work. Epoch No. 1 refers to the first undo attempt (which occurs if the transaction is aborted) , Epoch No. 2 refers to a second undo attempt, and so forth.

When a CPU 70, 72, 74 or 76 receives a sequence number for a transaction, that sequence number is divided by the number of slots and the remainder is used to determine th slot number for that transaction. The slot referred to by th calculated remainder contains the pointer showing the sequence number, epoch number and what most recently happened, if anything, to the transaction which was most recently in that slot. If the pointer points to a sequence number which is lower than the present sequence number, than the present sequence number is inserted into the slot. If the pointer points to a higher sequence number, then the present transaction is outdated and discarded because it is "too late" as shown by the slot containing a lower sequence number. As stated above, this typically occurs if the transaction is in the process of aborting due to a failure in some other CPU, or if the transaction is so old that it has been gone from the system for a long period of time. In addition, in the preferred embodiment, the received sequence number is divided by 4096, and then the remainder is taken for the pointer. Fig. 4 provides the sequence flow chart for the process that occurs when a request is received by a CPU. When a request enters block 90, the relevant slot is determined

(e.g. , the request is divided by the number of slots and the remainder is taken) . Decision block 92 then determines if the request is "too late". If the sequence number of the transaction in the slot is greater than the request's sequence number, the request is "too late", or if the sequence number of the transaction in the slot is equal to the sequence number of the request and the slot's epoch number is greater than the request's epoch number, than the request is "too late". A request is "too late" if it has already been completed or aborted, or it is in the process of being aborted. Thus, this decision making process ensures that stale requests will not be acted upon. If the request is not "too late", the activities shown in block 94 occur, such that the slot's sequence number is replaced by the request's sequence number and the slot's epoch number is replaced by the request's epoch number.

After the activities of block 94 are complete, block 96 causes the information about the transaction to be updated. If the slot TCB pointer is null (or contains nothing) than the TCB is created such that a description of the transaction is provided and the TCB pointer becomes valid. If the slot TCB pointer is full because it contains a valid description of the transaction, than a description of the transaction has already been created (this transaction has been seen by this particular CPU before) and no additional information is needed for the transaction's description. After this initial process, the request can be acted upon. In summary. Fig. 4 has three outcomes. First, it is "too late" to do the request and the request is not acted upon. Second, it is not "too late" but a description of the transaction must be created before the request can be acted upon. Third, it is not "too late" to do the request and the information about the transaction already exists so the request can be acted upon.

Fig. 5 provides the sequence flow chart for the process that takes place at Phase 1 time. Phase 1 can be considered a "flush". Phase 1 processing occurs during the commit period (after all the requests have been finished) when all of the information is sent to the audit trail. Phase 1 also indicates the end of an epoch; thus, the epoch number is increased by one. When a Phase 1 occurs, information is sent to all CPU's 70, 72, 74 and 76 in system 10. In each CPU 70, 72, 74 and 76, the correct slot is found as shown in block 100 and the sequence number carried in that Phase 1 is placed in the slot such that the slot's sequence number is replaced with the Phase l's sequence number. In addition, the slot's epoch number is replace by the Phase l's epoch number plus one. Therefore, the epoch number is increased by one after a Phase has been sent to every CPU. This process is set forth in block 102.

While Phase 1 indicates the end of an epoch, Phase 2 indicates the end (or completion) of a transaction. Fig. 6 provides the sequence flow chart for the process that takes place at Phase 2 time. During the Phase 1 process, each CPU returns an indication of whether it contains a valid TCB. At Phase 2 time, only CPU's with a valid TCB (as determined durin Phase 1) are acted upon. For example, if system 10 contained four CPU's and only CPU l in the four CPU's had a valid TCB, than only CPU 1 would be affected by the process set forth in Fig. 6. A TCB is valid if the TCB pointer does not equal null because a description of the transaction has already been created for that TCB. At Phase 2 time, the slot is found, as indicated in block 110, in each CPU with a valid TCB. The software then determines, as indicated by block 112, whether o not the TCB pointer equals null. If the TCB pointer does not equal null, than the TCB pointer is set to null, as indicated in block 114. If the TCB pointer already equals null, than the Phase 2 process is complete for that CPU.

To further illustrate the process set forth in Figs. 4 through 6, Table 3 indicates a possible sequence of events which could occur during a transaction. Please recall that there is a separate TCB for each transaction, and the TCB pointer points to a description for each transaction. Table 3 provides an example of slot 3 in CPU 3, and sets forth the activities that occur within slot 3 of CPU 3, in (1) the TCB pointer, (2) the sequence number and (3) the epoch number for times 0 through 5. At time 0, which is before anything has occurred for this example transaction, (1) the TCB pointer is null because there is no description of the transaction; (2) the sequence number is 0; and (3) the epoch number is 0.

TABLE 3

Time TCB ptr. Seq. No. Epoch No.

0 null 0 0

1 valid 3 0

2 valid 3 1

3 null 3 1

4 null 13 1

5 valid 23 0

At time 1, a request for "3.3.0" is received by

CPU 3. This number, "3.3.0", represents "CPU number.sequence number.epoch number." Upon reception of request "3.3.0", the procedure set forth in Fig. 4 takes place. The correct slot is found, as shown in block 90, and the sequence number of the request is compared to the sequence number at time 0, as shown in block 92. In the present example, the Sequence No. 3 is greater than the slot's number 0; therefore, the request's sequence number and the request's epoch number replace the slot's sequence number and the slot's epoch number, as shown i block 94 and in Table 3 (at time 1) . Next, the slot TCB pointer is checked, as indicated in block 96. Because the slo TCB pointer is null, a TCB is created and placed in the slot such that a valid TCB pointer is indicated at time 1 in Table 3.

At time 2 in Table 3, a Phase 1 occurs for Request No. 3.3.0. Now referring to Fig. 5, the correct slot is found as indicated in block 100, in every CPU in system 10. The sequence number in the Phase 1 is used to replace the slot's sequence number and the Phase l's epoch number plus one replaces the slot's epoch number. This update is indicated in time 2 for Table 3.

At time 3, a Phase 2 for "3.3" is received. This number, "3.3", represents "CPU number.sequence number." Now turning to Fig. 6, the slot for all CPUs with a valid TCB (found during the Phase 1 of time 2) is determined as indicate by block 110. Next, the TCB pointer is examined, as indicated in block 112, and if the TCB pointer is not already set to null, then it is set to null, as indicated in block 114. This activity is indicated at time 3 in Table 3.

At time 4, a Phase 1 for "3.13.0" is received. Again, as set forth in Fig. 5, the sequence number in the slot is replaced by the Phase l's sequence number, and the epoch number in the slot is replaced by the Phase l's epoch number plus one, as indicated in block 102. This is indicated at time 4 in Table 3.

At time 5, a request is received for "3.23.0". Now turning to Fig. 4, the correct slot is found as shown in block 90. Next, because the request's sequence number is greater than the slot's sequence number, the slot's sequence number is replaced by the request's sequence number and the slot's epoch number is replaced by the request's epoch number, as indicated in blocks 92 and 94. The TCB is then created because the present TCB pointer is null, as indicated in block 96. This process results in time 5 having a valid TCB pointer, a sequence number of 23, and an epoch number of 0. The CPU which begins a transaction also ends it by controlling when the Phase 1 and Phase 2 occur. In addition, the CPU which begins a transaction does not have to be explicitly told by the overall system 10 to start that transaction. The CPU works independently and begins the transaction on its own. Once the CPU which begins the transaction has the TCB, then every other CPU has access to that TCB. The beginning CPU creates its TCB first and destroys that TCB last. Moreover, the CPU which begins the transaction is indicated by the first number in the request (as described above) . Thus, a full request number is set forth as: "beginning CPU number.TCB number.sequence number.epoch number." Each CPU in system 10 runs independently and is continuously updated. Finally, because Phase 2 only deletes the TCB, as indicated in Fig. 6, the other information (i.e., sequence number and epoch number) needs to be deleted and is deleted when a larger request sequence number or request epoch number is received in the pertinent slot, as set forth in block 92 of Fig. 4. This deletion (or update) occurs when the slot's sequence number is replaced by the request's sequence number and the slot's epoch number, is replaced by the request's epoch number, as set forth in block 94 of Fig. 4.

This communication scheme provides lower communication costs along with improved throughput because the number of messages sent between CPU's are greatly reduced. This reduction in the number of messages occurs for several reasons. First, in Phase 2, messages are only sent to CPU's with a valid TCB, as described above. Second, the CPU's do not have to be warned about what transactions are coming because they independently begin transaction and independently act on requests. Not every CPU receives a message when a transaction begins because only the beginning CPU needs to know about a transaction to start that transaction. Moreover, in system 10, only the CPUs which are needed are used and updated during the beginning of a transaction and during phase 2 when a transaction is completed. Thus, a TCB is not created in all CPU's, rather TCB's are only created in needed CPU's. This causes system 10 to be linearly improved.

Fig. 7 is a graph which indicates how throughput in the present invention is improved as the number of CPUs in system 10 are increased. Line 120 indicates the linear-ideal situation where throughput is not hindered by an increase in the number of CPU's. In prior art systems, messages must be sent to all CPU's for (1) the beginning of a transaction, (2) the Phase 1 or commit sequence, and (3) the Phase 2 or complete transaction. The prior art is indicated by line 122. In the present invention, messages are sent to all CPU's only during Phase 1. This improvement of the present invention's throughput, as the number of CPUs are increased, is indicated by line 124. An example of pseudo-code is included in the attache

Appendix to further provide an example of how to implement the above described functionality in the preferred embodiment.

While a full and complete disclosure of the inventio has been provided herein above, it will be obvious to those skilled in the art that various modifications and changes may be made.

PSEUDO CODE for "Tracking the State of Transactions Subject to Distributed Control"

CONSTANT NumberOfCpus • /* number of CPUs in the system */ CONSTANT SlotTableSize /* greater than the largest number */

/* of transactions will be begun */

/* simultaneously by one CPU /

/* Global Variables within each CPU: V

VARIABLE NextSeqNum: Integer /* sequence number to potentially

/* assign to the next transaction V

*/ /* begun in each CPU. *

VARIABLE SlotTable: Array [0..NumberOfCpus-I, 0..SlotTableSize-1] OF RECORD

SequenceNumber: Integer /* The sequence number of the */

/* transaction that currently */

/* occupies, or most recently */

/* occupied this slot. */

TcbPtr: POINTER TO Tcb_Struct;

/* Pointer to the TCB for the */

/* transaction that currently */

/* occupies this slot, or NIL if */

/* there is none. */

Epoch: Integer /* Current epoch number of the */

/* transaction that occupies this */

/* slot, or that most recently */

/* occupied this slot. */

END RECORD

At system initialization time:

BEGIN

VARIABLE Cpu: Integer; VARIABLE Slot: Integer;

NextSeqNum := 1;

FOR Cpu :- 0 TO NumberOfCpus-1 DO

FOR Slot :« 0 TO SlotTableSize-1 DO WITH SlotTable[Cpu,Slot] DO BEGIN SequenceNumber :- 0; TcbPtr := NIL; Epoch :- 0; END;

END;

When beginning a new transaction:

BEGIN

VARIABLE Slot: Integer;

LOOP

Slot :» NextSeqNum MOD SlotTableSize;

IF SlotTable[MyCpuNumber,Slot] <> Nil THEN EXITLOOP; NextSeqNum := NextSeqNum + 1; ENDLOOP;

Create new Tcb, using NextSeqNum as its Sequence Number. WITH SlotTable[Cpu,Slot] DO BEGIN

SequenceNumber := NextSeqNum; TcbPtr : - Pointer to newly created TCB. Epoch :- 0; END;

NextSeqNum :- NextSeqNum + 1; END;

When ending a transaction:

Input parameters: Transaction's Sequence Number. Assumptions: Transaction was begun in the current CPU.

BEGIN

VARIABLE Slot: Integer; Slot : - SequenceNumber MOD SlotTableSize;

WITH SlotTable[Cpu,Slot] DO BEGIN

TcbPtr points to the TCB for this transaction;

Send Phasel Request, containing MyCpuNumber, SequenceNumber, and Epoch {- 0) to every CPU, including my own CPU; Await Responses from all CPUs. Each response is either "OK", "ERROR", or "UNINVOLVED"; If Every Response was "OK" or "UNINVOLVED" THEN

Write "COMMIT" record to the log ELSE

BEGIN

LOOP Read log backwards, undoing every update performed by the transaction, until either all records are undone or an error occurs;

Send Phasel Request, containing MyCpuNumber, SequenceNumber, and Epoch (now > 0) to every CPU, including my own CPU;

Await Responses from all CPUs. Each response is either "OK", "ERROR", or "UNINVOLVED"; IF undo completed without error and every CPU replied "OK" or "UNINVOLVED" THEN EXITLOOP; ENDLOOP;

Write "ABORTED" record to the log. END;

Send Phase2 Request to every CPU that replied to a Phasel request with "OK" or "ERROR", except my own. Await all responses. Send Phase2 Request to my own CPU.

Await response.

END; END;

On receipt of a request to perform some database work for a transaction:

INPUT rCPU: Integer /* CPU in which the transaction */

/* was begun. */ INPUT rSequenceNumber: Integer:/* Sequence number of the */

/* transaction, as assigned by */

/* the CPU that began the */

/* transaction. */ INPUT rEpoch: Integer; /* The epoch of the transaction */

/* of which this request is a */ /* part. Zero for original */ work, one for the first */ attempt to undo the */ transaction, etc. */

OUTPUT Result: OK" or "TOOLATE"

/* An indication of whether it V is too late to perform the */ request. */ BEGIN

VARIABLE Slot: Integer;

Slot :» rSequenceNumber MOD SlotTableSize;

WITH SlotTable[rCpu,Slot] DO BEGIN

IF SequenceNumber /*from SlotTable*/ > rSequenceNumber OR (SequenceNumber /*from SlotTable*/ = rSequenceNumber AND

Epoch /*from SlotTable*/ > rEpoch) THEN BEGIN

Result :- "TOOLATE"; RETURN; END;

IF TcbPtr /*from SlotTable*/ - NIL THEN BEGIN

Create new TCB for this transaction; TcbPtr :» Pointer to newly created TCB;

END;

SequenceNumber : - rSequenceNumber; Epoch := rEpoch;

RETURN "OK";

END;

On receipt of a request to perform a Phasel operation for a transaction:

INPUT rCpu: Integer; /* CPU in which the transaction */

/* was begun. */

INPUT rSequenceNumber: Integer:/* Sequence number of the */ /* transaction, as assigned by */

/* the CPU that began the */ /* transaction.

The epoch to be completed /* for this transaction. Zero */ original work, one for the */ first attempt to undo the */ transaction, etc. */

OUTPUT Result: OK" or "ERROR" or "UNINVOLVED"

/* An indication of whether we: */

/* (1) successfully flushed */

/* our log records for the */

/* transaction; (2) encountered */

/* an error flushing log */

/* records; or (3) were never */

/* involved in the transaction. */

BEGIN

VARIABLE Slot: Integer

Slot :» rSequenceNumber MOD SlotTableSize;

WITH SlotTable[rCPU,Slot] DO BEGIN

IF SequenceNumber <> rSequenceNumber OR TcbPtr NIL THEN

Result := "UNINVOLVED" ELSE

BEGIN

Flush log records for epoch rEpoch; Result := "OK" or "ERROR"; END;

SequenceNumber := rSequenceNumber; Epoch := rEpoch +1;

END;

RETURN Result;

END;

On receipt of a request to perform a Phase2 operation for a transaction:

INPUT rCPU: Integer; /* CPU in which the transaction */ /* was begun. */ INPUT rSequenceNumber: Integer:/* Sequence number of the */

/* transaction, as assigned by */

/* the CPU that began the */

/* transaction. */

BEGIN

VARIABLE Slot: Integer;

Slot : - rSequenceNumber MOD SlotTableSize;

WITH SlotTable[rCpu,Slot] DO BEGIN IF SequenceNumber - rSequenceNumber AND TcbPtr <> NIL THEN

BEGIN

Perform Cleanup for the Transaction as Necessary; Destroy the Tcb pointed to by TcbPtr; TcbPtr :- NIL; END;

END;

Claims

WHAT IS CLAIMED IS;

1. A method of tracking and controlling data within a system, comprising the steps of: operating on at least one request associated with a transaction, said transaction capable of altering said data in said system, said operating including an implementation of at least one change, and said change being made to said data; committing said change after said request is operated upon; performing at least one undo when required, said undo being required when said request includes an error, said undo being implemented in the same manner as said implementation of said change; and completing said transaction when said request is free of errors.

2. The method of claim 1, wherein said change associated with said transaction is recorded in an audit trail, and wherein any said undo associated with said transaction, is recorded in said audit trail.

3. The method of claim 1, further comprising the steps of: updating said data only when said request is associated with a valid transaction; and operating on said request only when said request is associated with a valid transaction.

4. The method of claim 1, further comprising the step of: performing at least one additional undo on said undo when required, said additional undo on said undo being required when said undo includes an error, said additional undo on said undo being implemented in the same manner as said implementation of said change; wherein said system is returned to an original state when an error occurs; and wherein said transaction is completed when said request is error free.

5. A method of tracking and controlling data withi a system, said system including at least two system CPUs, comprising the steps of: receiving a transaction in a starting CPU, said starting CPU being one of said at least two system CPUs; starting said system with said starting CPU when sai transaction is received; generating at least one request from said transactio within said starting CPU, said request being associated with said transaction; and sending said request to working CPUs, said working CPUs being at least one of said at least two system CPUs; working CPUs operating on said request within said working CPU's.

6. The method of claim 5, further comprising the steps of: sending an order to commit said transaction, said order to commit being sent to said system CPUs, and said order to commit being sent after said request is operated upon; and sending an order to complete said transaction, said order to complete being sent to said working CPUs, and said order to complete being sent only when said request is properl committed and free of errors.

7. The method of claim 5, wherein said operating includes an implementation of at least one change.

8. The method of claim 7, further comprising the steps of: committing said change after said request is operate upon; performing at least one undo when required, said undo being required when said request includes an error, said undo being implemented in the same manner as said change; and completing said transaction only when said request is free of errors.

9. A method of tracking and controlling data within a system, said system including at least two system CPUs, comprising the steps of: receiving a transaction in one of said system CPUs, said transaction capable of altering said data in said system; generating at least one request; said request being associated with said transaction; operating on said request, said operating including an implementation of at least one change, said change being made by working CPUs; said working CPUs being at least one of said system CPUs; sending an order to commit said transaction, said order to commit being sent to said system CPUs, and said order to commit being sent after said request is operated upon; and sending an order to complete said transaction, said order to complete being sent to said working CPUs, and said order to complete being sent only when said request is properly committed and free of errors.

10. The method of claim 9, further comprising the steps of: performing at least one undo when said request includes an error, said undo being implemented in the same manner as said change.