WO1993018464A1

WO1993018464A1 - Distributed processing system

Info

Publication number: WO1993018464A1
Application number: PCT/NZ1993/000013
Authority: WO
Inventors: Ronald John Youngs; Richard Paul Schneider; Jean Margaret Ellis; Ian Chester
Original assignee: Ronald John Youngs; Richard Paul Schneider; Jean Margaret Ellis; Ian Chester
Priority date: 1992-03-09
Filing date: 1993-03-09
Publication date: 1993-09-16

Abstract

A computer system wherein tasks are distributed across a network of processors which may be of different vendor and/or architecture. Application tasks which are logically independent of each other are processed concurrently in parallel on a group of processors on the network. The system uses conventional network hardware and software (e.g. an industry standard local area network) to link the processors. The programs for performing each task are loaded on each processor in the group and data blocks to be processed are sent to an available processor in the group which performs the task and sends the results back.

Description

Di stri buted proces si ng system.

TECHNICAL FIELD

This invention relates to a computer system which permits distributed and parallel processing of data using a multiplicity of processors interconnected by a network.

BACKGROUND ART

The problems which arise in executing a computation-intensive program or. a single processor are well known. Because the processor must process sequentially, long (and expensive) processing times may result. Even with the fastest processors currently available (or affordable) processing time for certain transactions or certain applications is unacceptably long.

In order to overcome this problem the concept of parallel processing has been proposed. In parallel processing the data or the application program is fragmented so that data to be processed is allocated to a multiplicity of processors running the same application program or different processors are arranged to execute different parts of the application program. In the parallel processing systems hitherto proposed a number of hardware processors are tightly coupled to form a single unit. Either two or four conventional processors are used or alternatively a large number of special purpose processors (massively parallel) are used. An example of the latter is the INMOS T800 transputer.

A current disadvantage with massively parallel processing systems is the complexity involved in programming such a system.

It would be desirable if existing networks of conventional processors could be con^jBgured so that the processing of a single application is distributed over a number of processors on the network.

DISCLOSURE OF INVENTION

It is therefore an object of the present invention to provide a distributed processing system which meets the above desideratum. Accordingly in one aspect the invention consists in a parallel processing computer system which allows concurrent processing of logically independent operations comprising: a plurality of processors including network communicating hardware, a high bandwidth data communications network, each processor being connected to a node on said network, each processor programmed with network control software which enables said processors to exchange messages, a number of processors being programmed with at least the operations section of at least one common application program, one processor being programmed with the complete said at least one application program-, a second processor is programmed with operations management software which causes said second processor to sequentially

(a) receive requests from said one processor for the execution of an operation in the application program,

(b) select an available one of said number of processors,

(c) route a said request to the selected processor for execution, (d) receive the result of the processed request from said selected processor, and (e) pass said result back to said one processor.

In a second aspect the invention consists in a parallel processing computer system which allows concurrent processing of logically independent operations wherein the system includes: a plurality of processors including network communicating hardware, a high bandwidth data communications network, each processor being connected to a node on said network, characterised in that each processor is programmed with network control software which enables said processors to exchange messages, a number of processors are programmed with at least the operations section of at least one common application program, one processor is programmed with the complete said at least one application program, a second processor is programmed with operations management software which causes said second processor to sequentially

(a) receive requests from said one processor for the execution of an operation in the application program, (b) select an available one of said number of processors,

(c) route a said request to the selected processor for execution,

(d) receive the result of the processed request from said selected processor, and (e) pass said result back to said one processor.

In a third aspect the invention consists in computer software for a network of interconnected processors and associated network operating system, which software enables sequential data transactions (requests) received by the network to be processed concurrently in accordance with an application program by two or more processors on the network, said software comprising: a Despatch Manager (DM) module and a Processing Element (PE) module, a first network processor (DM processor) being programmed with the DM module and at least one other network processor (PE processor) being programmed with a PE module and the application program; the DM module causing the DM processor to:

(1) periodically ascertain the identity of active PE processors on the network and to store their identities in a table,

(2) receive requests to be processed and store the received requests in a queue,

(3) allocate a unique identifier to each received request and store the identifiers in a log,

(4) select a PE processor from the table of processor identities according to predetermined criteria, (5) send to the selected PE processor a copy of a request to be processed from the queue,

(6) receive from each PE processor the result of a processed request,

(7) match that request against the log of request identifiers,

(8) remove the corresponding request from the queue, and (9) send the result to a storage element for further processing or output, the PE module causing the PE processor in which it is resident to:

(a) receive a request sent to it by the DM processor,

(b) process that request in accordance with the application program, and

(c) send to the DM processor the result of that process.

BRIEF DESCRIPTION OF DRAWINGS

Figure 1 shows a diagrammatic representation of a local area network configured according to the present invention,

Figure 2 is a diagram illustrating the function of a Despatch Manager, Figure 3 is a diagrammatic representation of communication protocol levels, Figure 4 is a block diagram of a Processing Element, Figure 5 shows a request message format, Figure 6 shows a reply message format, and

Figure 7 shows diagrammatically an application of the present distributed processing system in a telecommunications toll charging environment.

BEST MODE FOR CARRYING OUT THE INVENTION

The present invention enables the processors on a conventional network to process a single application in parallel. It does this by distributing operations or tasks to each available processor on the network. It is particularly suitable for applications whose tasks or sub-tasks are logically independent of each other. Since logically independent tasks can be performed concurrently, applications incorporating logically independent tasks are ideal for use in a parallel processing environment such as that which is provided by the present invention.

Most application programs comprise three sections (1) a user interface or I/O which deals with any interaction with the outside world, (2) control logic, namely the logic which the program follows and (3) operations, that is sub-routines or functions that perform specific tasks such as calculations or the processing of records. The present invention separates the operations section from the rest of an application program and places copies of the operations section on a number of processors on the network.

A typical network (in this case a LAN) is shown in Figure 1. An Ethernet cable 1 links a set of work stations 3 (only four of which are shown) with a server 2. The work stations and server could be for example Apple Macintoshes using the AppleTalk network operating system. Alternatively the work stations and server could be IBM compatible personal computers supported by a network operating system such as Novell. Each processor includes a LAN card 4 which in combination with the network operating system allows for communication between stations on the network Whatever network hardware or software is used the software provided as part of the present invention will interface with it In broad terms the software of the present invention comprises two parts.

The first part is loaded on a chosen processor (3a in Figure 1) to make that processor function as a "Despatch Manager" (DM). It is the purpose of the Despatch Manager to receive data to be processed and to break the incoming data up and distribute the fragmented data to other processors on the network for execution. The second part of the software is loaded onto selected other workstations, 3b, on the network to cause those work stations to function as Processing Elements (PEs) which in combination with the application software loaded on each such PE will execute the transactions supplied to that processor by the DM and will return the result to the DM which will control further processing or output those results.

In this specification each data record to be processed under the control of the Despatch Manager will be referred to as a "transaction" or a "request". Further, "Despatch Manager" and "Processing Element" refer to both the software and the processor executing that software.

Despatch Manager Operation

The purpose of a Despatch Manager (DM) 3a is to receive requests and send them on to a Processing Element (PE) 3b. The DM also supplies monitoring information to any requesting PE, thus allowing real-time viewing and analysis of the entire processing.

Referring to Figure 2, the Despatch Manager receives input from an On¬ line Transaction (OLT) source (which could be a Processing Element), stores and forwards each transaction to a Processing Element, and then stores the result. Input to the DM is in the form of a request which consists of a service and request code and its associated data. The request is encoded in the External Data Representation (XDR), RFC 1014, format for portability between differing machine architectures as is mentioned further below. Upon receipt of a request the DM assigns a unique ID to the request and records it to a log file on disk 5. This ID consists of a date/time "stamp", and a monotonically increasing number. By maintaining a unique ID, the DM and its associated PEs can determine when duplicate transactions are received.

Output consists of sending a request to a PE and then logging the results. At predefined intervals, the DM will "checkpoint" itself and record summary information on the number and types of processed requests.

The DM maintains a LRU (least recently used) table of available PEs. When a request becomes computable, the DM selects the PE at the head of the list to process the request. This is the PE that was used the longest time ago. This algorithm has the added benefit of forcing the DM to communicate with each PE and thereby determining if the PE is still active. Thus, the DM can maintain a fairly accurate state of each PE without generating extra network traffic.

Protocol and Drivers To enable conventional processors on a conventional network to function in accordance with the present invention, each processor must be provided with appropriate application program interface (API) software and driver software. This is illustrated in figure 3 which shows a processor configured as a prQcessing element capable of executing two different applications. Conventional network hardware 21 and associated network software 22 are provided with the network hardware 21 connected to the network cable 23 which could be an Ethernet cable for example. Driver software 24 provides additional networking layers between network software 22 and the applications. The Driver provides the commumcations functions to enable the Processing Element to communicate with a Despatch Manager irrespective of the network operating software protocol (e.g. Local Talk, Net BIOS, TCP/IP etc). This allows the application program to be independent of the network protocols in use. Applications programs 25 and 26 each interface with the driver 24 through an Application Program Interface (API) which is software which is linked with the applications when they are compiled. The API provides for data conversion and for messaging between the Processing Elements and Despatch Manager as further described below.

Processing Elements

A processing element performs one or both of the following independent functions:

(1) any of the functions that the application programme would ordinarily perform in a stand-alone environment such as (a) user interface, (b) control logic and (c) having tasks performed by other programmes.

(2) accepting and executing tasks requested by other processing elements (PE's). The structure of a typical PE is shown in figure 4 Block 31 shows two components of a conventional application programme including a user interface module 33 and programme logic and processing module 34. Block 32 includes software contained in the APL This software includes a service handler module 35, services 36 and 37 each with three associated requests. Module 38 enables the PE to make requests as well as receive them.

The tasks a PE can perform are called "requests". Related requests are grouped together into "services". A PE may support a number of services. To have a task performed the following must first be selected: (1) a service; and (2) a request within that service.

To facilitate this each service must be assigned by the application programmer with a unique service ID. With each service, each request is assigned a unique request ID. These ID's enable a PE or DM to distinguish between different services and requests. These ID's must be kept unique within the same network environment.

The service handler 35 receives and processes messages containing requests for tasks to be performed. The service ID and request ID contained in the message are used to decide which request should be invoked. By way of example one or more PE's may be programmed as calculators and service 1 could be a calculation service with a service 2 being a function service. The first request within service 1 could be "addition" while the second request could be "subtraction". The first request within service 2 could be a "square root function" while the second request could be a "square function".

The service handler must thus accept a request, decide which service it is, decide which request command it is, perform the request and return the result.

Conversion of Data and Messaging Not all processors (computers) represent data in the same way and since a network utilising the present invention could be made up of different types of processors representing data in different ways, it is necessary to provide for conversion of data to a standard representation. It is therefore an important function of the API to cany out data conversion to and from a standard representation. In the present invention, the standard chosen is External Data Representation (XDR) which is a standard for the description of data at the byte level. XDR fits into the ISO presentation layer no. 6 and is roughly analogous in purpose to X.409, ISO Abstract Syntax Notation (ASN.l).

The API contains XDR format conversion functions for converting the following data type to and from XDR format: integers (signed, unsigned, long and short); boolean, char*(string) and opaque (block data). These functions are provided in a library forming part of the API.

PEs and DMs communicate by way of messages passed over the network to which the hardware in which they are resident is connected. As a pre-condition for message transmission, a "connection" must be established between a PE or a DM and another PE or DM. In a usual situation a PE will only need to connect to a DM. This connection is established automatically whenever a PE initialises.

A message can cany either a "request" or a "reply". The message process used in the present invention comprises the four steps as follows: (1) construction - a request is constructed and placed in the message;

(2) sending - the message is sent to the remote agent (eg a DM);

(3) receipt - the remote agent receives the message, takes any action based on the contents of the message and composes a reply; (4) reply - the reply is sent to the originating agent containing an acknowledgement of the message. Results may be included at this time.

The format of the request and reply fields respectively is shown in figures 5 and 6 respectively. All data in the message, including service ID, request ID and reply code must be in XDR format.

Messages to be sent from PEs will be constructed by application developers who will embed request and reply messages in the application programme. On the other hand, Despatch Manager messages will be included in the software the present invention which would normally be provided to users. Similarly, a PE must be able to receive messages. When a message has been received through the API, service and request IDs are obtained from the relevant message fields and the appropriate service and request functions in the application programme are called to perform the required task

Communication Fault Immunity

The present software is communication fault tolerant. In any communications environment, failures can and do occur between the communicating agents. Failures can be caused by lost messages, agents becoming locked-up (i.e. software failure) or "race" conditions.

If a processing element does not supply a result in a predetermined time

(Processing Time), the DM will then "tickle" the PE to determine if it is still active and processing the transaction. If no response is received in a given time the DM assumes that the PE is inactive or unavailable and reschedules the transaction to another PE.

If the dead PE somehow generates a result the DM must avoid duplicate recording of the result It does so, by maintaining a dead PE list, and any result from a PE in this list is ignored and the PE is commanded to re-initialise itself.

When the PE completes the initialisation process the DM will then and only then remove the PE from the dead list.

Interconnected networks and virtual packet switched networks can produce out of sequence errors, due to selecting alternate delivery paths and communications lag. It is possible that a PE could receive a "are you alive tickle" before receiving the transaction to process. The PE could conceivably re-initialise itself and then receive the old transaction. To resolve this dilemma, each transaction has a unique ID assigned to it. If the DM receives a result from a PE with the wrong call ID it will ignore the result and place the PE in the dead PE list. With the above recovery procedures, it is remotely possible that a PE could spend most of its time re-initialising and thereby degrading the overall performance of the system. The present software avoids this. When requested to re-initialise, each PE is caused to idle for a preset time to catch any orphaned requests. Further, each PE is caused to locally log the times it has re-initialised due to recoverable errors. If the frequency of this exceeds a given number the PE will assume that an abnormal situation exists and withdraw itself from the processing community, until an operator forcibly restarts it.

On the other hand if the DM detects an internal inconsistency it will log it and restart itself. The PE will then have to re-initialise to the DM, which most likely will have a different network address. If the PE was processing a transaction during the restart it will either be treated as dead by the DM when it sends the result and eventually re-initialise itself or it will not receive a new transaction to process from the despatches In itself, receiving no new transaction from the DM is not an error. There could be no work to perform. What the PE does is tickle, i.e. send a "are you alive" message, to the DM if it has not heard from it in a given time. If the DM then does not reply, the PE re-initialises.

The preceding section discussed the algorithms used to avoid losing a request. The algorithm for determining when a communicating agent is not responding to a request will now be described. Essentially this determines the appropriate time-out value. The environment consists of processing elements, or nodes, connected by a physical wire to form a LAN. LANs, in turn, can be interconnected to form an inter-net Inter-netting is typically accomplished by gateway nodes that participate in two or more LANs. A request can travel between different gateways at different times.

The algorithm that the present software employs makes the following assumptions: nodes connected to the same LAN have relatively stable transmission times, and nodes on different LANs have dynamic transmission times. The adaptive retry algorithm records the round trip time for each request. The average round trip time is maintained as a weighted average and an averaging technique is applied with the new round trip time.

The weighted function is defined as follows: T_w = (q*T_w) = ((1 - q)*T_n) where: q is the constant weighing factor, 0 ≤ q ≤ 1 T_w is the weighted average, and T_n is the new round trip time A value close to 1 for q makes the weighted average immune to rapid fluctuations in the round-trip time; whereas a value close to 0 makes the function respond rapidly to changes in the round-trip time.

When a connection is first being established the transmitting, or client, node will assume that a request takes 10 seconds round trip for an intra-LAN connection and 1 minute for an inter-LAN connection. Once the initial request is completed the client node will then use the actual time as the seed value for x.

It is not required but most likely appropriate that the server would then "ping" the client so that the server can seed itself. The primary advantage of the present software is that it enables conventional processors interconnected on a network under the control of conventional network operating system software to be harnessed to process data in parallel. For applications such as that described above which are conventionally carried out on a single large processor the reduction in processing time is extremely significant A further advantage of the present software is that it is not necessary to have a network of computers dedicated to the particular application requiring parallel processing. The network can be an existing network with all processors being used for other applications such as word processing, spread sheet work etc. The present software makes use of the idle time of each processor. Existing users are unaware even that the Despatch Manager has utilised their processor at various times to execute the application requiring parallel processing. Thus existing local area and wide area networks can be utilised for computation intensive applications without the need to invest in large mainframes which would otherwise be required.

Example Application

An example of the sort of application which is ideally handled by the present software is shown diagrammatically in Figure 7. In this case the transactions to be processed are telephone call records output from a telephone exchange 11. The transaction information is output from the exchange using signal system number 7 protocol. This constitutes the input to the Despatch Manager 12 of the present invention. In this case the transactions to be processed are essentially on-line transactions although there may be some "store and forward" function inherent in the exchange software. Typically the Despatch Manager in this application will receive 50 transactions per second. Each transaction record includes: callers telephone number, number called, and duration of call.

The output required from the computer system illustrated is the charge to be made for each transaction. The application software therefore calculates the toll charge for each transaction. In order to make the required calculation each processor running the application software needs to access a look-up table containing the per minute tariffs for each charging step for each call time band.

As the transactions come in from exchange 11 they are queued on the DM 12 and sent individually to any work station on the network 13 which has been loaded with PE software and which is active and free for appropriation by the DM.

The Processing Elements perform their computation on the transaction received from the DM and return the result. The DM then matches this result with the transaction log and then removes the transaction from the queue. The transaction result is stored on disk storage element SE 14 as another record in the output file.

In practice the output file is used as input for further processing. In the present application this further processing comprises sorting the transactions according to subscriber and the printing of a telephone toll invoice for each subscriber.

Optimally the Despatch Manager uses eight Processing Elements in this apphcation.

Claims

CLAIMS:

1. A parallel processing computer system which allows concurrent processing of logically independent operations comprising: a plurality of processors including network communicating hardware, a high bandwidth data communications network, each processor being connected to a node on said network, each processor programmed with network control software which enables said processors to exchange messages, a number of processors being programmed with at least the operations section of at least one common application program, one processor being programmed with the complete said at least one application program, a second processor is programmed with operations management software which causes said second processor to sequentially

(b) select an available one of said number of processors,

2. A parallel processing computer system which allows concurrent processing of logically independent operations wherein the system includes: a plurality of processors including network communicating hardware, a high bandwidth data communications network, each processor being connected to a node on said network, characterised in that each processor is programmed with network control software which enables said processors to exchange messages, a number of processors are programmed with at least the operations section of at least one common application program, one processor is programmed with the complete said at least one application program, a second processor is programmed with operations management software which causes said second processor to sequentially (a) receive requests from said one processor for the execution of an operation in the application program,

(b) select an available one of said number of processors,

3. A parallel processing computer system according to either of claims 1 or 2 wherein said network control software comprises conventional network operation system software which allows messaging between processors and interface software which interfaces the application program or sections thereof with said conventional network operation system software.

4. A parallel processing computer system according to claim 3 wherein said interface software establishes the messaging protocol between each copy of the application program or section thereof and the operations management software.

5. A parallel processing computer system according to claim 4 wherein said interface software converts processor-specific data representations to a standard representation for messaging transmission and converts said standard data representations back to an appropriate processor-specific data representation for message reception.

6. Computer software for a network of interconnected processors and associated network operating system, which software enables sequential data transactions (requests) received by the network to be processed concurrently in accordance with an application program by two or more processors on the network, said software comprising: a Despatch Manager (DM) module and a Processing Element (PE) module, a first network processor (DM processor) being programmed with the DM module and at least one other network processor (PE processor) being programmed with a PE module and the application program; the DM module causing the DM processor to: (1) periodically ascertain the identity of active PE processors on the network and to store their identities in a table, (2) receive requests to be processed and store the received requests in a queue, (3) allocate a unique identifier to each received request and store the identifiers in a log,

(6) receive from each PE processor the result of a processed request,

(7) match that request against the log of request identifiers,

(a) receive a request sent to it by the DM processor,

(b) process that request in accordance with the application program, and

(c) send to the DM processor the result of that process.