WO2009037614A2 - Circuit with a plurality of processors connected to a plurality of memory circuits via a network - Google Patents

Circuit with a plurality of processors connected to a plurality of memory circuits via a network Download PDF

Info

Publication number
WO2009037614A2
WO2009037614A2 PCT/IB2008/053633 IB2008053633W WO2009037614A2 WO 2009037614 A2 WO2009037614 A2 WO 2009037614A2 IB 2008053633 W IB2008053633 W IB 2008053633W WO 2009037614 A2 WO2009037614 A2 WO 2009037614A2
Authority
WO
WIPO (PCT)
Prior art keywords
shared memory
data object
memory circuits
processors
data
Prior art date
Application number
PCT/IB2008/053633
Other languages
French (fr)
Other versions
WO2009037614A3 (en
Inventor
Marco Jan Gerrit Bekooij
Jan Willem Van Den Brand
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2009037614A2 publication Critical patent/WO2009037614A2/en
Publication of WO2009037614A3 publication Critical patent/WO2009037614A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)

Definitions

  • the invention relates to a multi-processor circuit with shared memory circuits.
  • a shared memory circuit may be used that may be accessed by cooperating programs from different processors. Special attention is needed for relative timing of the processors when this shared memory circuit comprises shared data that may be updated in the course of time. For example, when an instruction form one program reads from a storage location and an instruction from another program writes to the same location, the result of the read instruction will differ dependent on the relative time of execution of the write instruction. Similarly, if two processors write to the same shared memory circuit location, the result will differ dependent on the sequence of writing. To realize predictable results, it is necessary that a predictable access sequence of operations from different processors can be realized.
  • a circuit according to claim 1 is provided.
  • processors limit access to a data object during a program execution period to parts of a program between acquire and release instructions.
  • the acquire and release instructions set and clear a semaphore flag for the data object.
  • Write operations to the data object are implemented by distributing messages with write operation records to different shared memory circuits.
  • the semaphore flag is cleared after testing that all previous messages from the processor to all shared memory circuits that store data from the data object have been processed.
  • the processors are configured to test this in response to a single instruction, by using a detector circuit to monitor the number of unprocessed messages.
  • the result produced by the detector circuit may be read in response to the release instruction, before clearing the semaphore flag for the data object, or it may be read in response to a separate instruction, executed before the release instruction.
  • a credit based mechanism is used for monitoring ensured free buffer space for receiving messages from the processors in the shared memory circuits.
  • the information about the ensured free buffer space for different shared memory circuits may be combined to perform detection that all previous messages from the processor to all shared memory circuits that store data from the data object have been processed. Thus the detection can be performed with little overhead.
  • a plurality of distributed data objects is stored.
  • respective semaphore flags are used for different data objects. For acquire and release instructions the data object to which they apply is indicated. Before clearing the semaphore flag for an indicated data object, message processing of previous messages to the different shared memory circuits is tested specifically for the indicated data object.
  • the processors each have first in first out buffers for respective ones of the shared memory circuits.
  • detection may be performed by detecting a status of transmissions from the first in first out buffers.
  • the data objects do not occupy the entire shared memory circuits.
  • other processors may still access the shared memory circuits that store parts of the data objects, in order to access data outside the data object. This is possible because the semaphore flag is provided specifically for the data object.
  • write operation records for the data object are buffered separately form accesses to other parts of the shared memory circuits. In this way it can be determined quickly when the release instruction can be executed.
  • Fig. 1 shows a multiprocessing system
  • Fig. 2 illustrates information flow
  • Fig. 3 shows a processor
  • Fig. 4 shows a shared memory circuit
  • Fig. 1 shows a multiprocessing circuit comprising a plurality of processors 10, a communication network 12 with an array of router circuits 120 (only some explicitly labeled) and shared memory circuits 14.
  • Processors 10 and shared memory circuit 14 are coupled to selected router circuits.
  • Processors 10 contain FIFO buffer circuits 100 and detector circuits 102.
  • One of the shared memory circuits contains a semaphore flag memory 140.
  • Each of the shared memory circuits 14 contains a main memory 142.
  • Communication network 12 comprises connections between neighboring router circuits 120.
  • Communication network 12 is a variable route network, that is, communication network 12 provides for selectable routes through router circuits 120 between stations such as processors 10 and shared memory circuit 14.
  • variable route communication network 12 makes it possible to communicate between a large number of stations without introducing an exceedingly large number of connections.
  • Any number of router circuits 120 and any connection pattern may be used. For the sake of clarity only a small communication network 12 is shown, wherein the router circuits 120 are connected in a two dimensional matrix pattern.
  • the variable route communication network 12 makes it possible to avoid frequent blocking of messages by other message traffic.
  • communication network 12 is configured to allow for different alternative routes through the router circuits 120 for any message transmission. The availability of alternative routes makes it possible to select any route that avoids router circuits 120 that are too busy at the time of message transmission.
  • Variable route communication networks that support such selected routing and methods of selecting routes through such networks are known per se. It suffices to note that the time needed for passing a message between stations such as processors 10 and shared memory circuit 14 may delay the message by a different amount of time (unpredictably for the processors 10), dependent on the number of router circuits 120 used along the route.
  • processors 10 of the multiprocessing circuit execute instructions from respective programs.
  • the execution comprises access to data in shared memory circuit 14 (writing and/or reading).
  • Fig.2 illustrates information flow involved in access to a data object that is stored in a distributed way in parts of shared memory circuits 14.
  • the actions are used to perform access according to what will be called an acquire/release protocol, wherein a processor 10 executes instructions to accesses shared memory circuit 14 only between executing an "acquire" instruction and a "release” instruction.
  • the acquire instruction is used to set a semaphore flag in flag memory 140 and the release instruction is used to clear this semaphore flag.
  • time is represented vertically, progressing downwards.
  • Vertical lines represent main memory 142, FIFO buffer circuits 100, detector circuits 102, processors 10 and flag memory 140. Access involves three stages 20, 22, 24.
  • a processor 10 executes the "acquire" instruction, which causes a read modify write transaction to be performed with flag memory 140.
  • This transaction involves a first transmission 200 via communication network 12 from processor 10 to flag memory 140 to request a read modify write action on the semaphore flag in flag memory 140 and a second transmission 202 back from flag memory 140 to the processor 10 to indicate the read result.
  • Execution of the acquire instruction completes when the result indicates that the flag was not in as set state. If the result indicates that the flag was in as set state, the read modify write transaction is repeated until the result indicates that the flag was not in as set state.
  • processor 10 that completed the acquire instruction executes instructions that lead to access to main memories 142.
  • processor 10 performs a transaction 220 to send data and addresses to the FIFO buffer circuit 100.
  • Different FIFO buffer circuits 100 are provided for respective shared memory circuits 14.
  • the FIFO buffer circuit 100 for the shared memory circuit that is addresses in turn performs a transaction 222 to send a message containing the data and the address to main memory 142 of the addressed shared memory circuit 14, via communication network 12.
  • shared memory circuit 14 the address and data are buffered and subsequently used to write the data at the address in the main memory 142.
  • processor 10 may performs any number of such transactions 220.
  • Each FIFO buffer circuit 100 maintains a count representing an ensured free buffer space in a corresponding FIFO buffer of its associated shared memory circuit 14.
  • FIFO buffer circuit 100 reduces this represented ensured free buffer space each time when it sends a message to shared memory circuit 14.
  • the associated shared memory circuit 14 sends messages back to the processor 10, each message indicating a number of buffer positions that it has freed by using the address and data to write into main memory 142.
  • a transmission 224 represents the transmission of such a message.
  • FIFO buffer circuits 100 receive these messages from their associated shared memory circuits 14 and increase the represented ensured free buffer space accordingly. Initially each this represented ensured free buffer space is set to a maximum buffer space available in the associated shared memory circuit 14.
  • FIFO buffer circuit 100 holds up transmission if the represented ensured free buffer space indicates that it is not ensured that sufficient free space is available.
  • Detector circuit 102 receives information about the represented ensured free buffer space from FIFO buffer circuits 100. Reception of this information is represented by a transaction 226. As shown the information about is received after an update of the represented ensured free buffer space. In practice, the represented ensured free buffer space may be monitored continuously, or sampled when needed by detector circuit 102.
  • FIFO buffer circuits 100 After completing execution of the acquire instruction processor 10 may perform any number of access operations, FIFO buffer circuits 100 sending corresponding messages via communication network 12 and maintaining the represented ensured free buffer space. This continues until a third stage 24 of execution.
  • processor 10 executes the "release" instruction.
  • Execution of the release instruction comprises performing a read action 240 on detector circuit 102, the detector circuit 102 performing a return action 242 to provide a read result.
  • This read result represents information gathered by detector circuit 102 whether the corresponding represented ensured free buffer space values for all FIFO buffer circuits 100 in the processor involved with access have their maximum (initial) value. If so, processor 10 completes the release instruction, by sending a message 244 to flag memory 140 to clear the semaphore flag that was set by the acquire instruction.
  • This protocol may be used to ensure that only one processor at a time will have access to a data object that is stored in a distributed way in a plurality of main memories 142.
  • the protocol may be used to ensure that different ones of the processors 10 will have access selected data objects in main memories 142, only one at a time for each data object.
  • the acquire and release instructions contain specifications of the data objects to which they relate. Access to a plurality of such data objects may be managed in parallel by providing a plurality of selectable semaphore flags in flag memory 140, each for a respective data object.
  • Different FIFO buffer circuits 100 may be provided in a processor 10 for different data objects in this case.
  • detector circuit 102 collects represented ensured free buffer space values to produce respective results for groups of FIFO buffer circuits 100 that write to the same data object.
  • different processors 10 may have access to different data objects in parallel. In an embodiment access to regions of main memories 142 outside the data objects may be allowed without such a protocol.
  • Each data object may be defined for example by a range of address values, or by a root pointer to the data object.
  • one or more bit of the write address may be used to indicate the data object, remaining bits of the address indicating the shared memory circuit where the data is stored and the address within said shared memory. This makes it easy to determine the data object involved in a write operation.
  • each data object may comprise a root address pointer, addresses of data in the data object being derived through the root pointer. In this case, a processor may need to indicate explicitly which data object it addresses, for example by providing additional information in access operations to identify the data object.
  • the acquire and release instruction are performed for the data object as a whole, i.e. in one action for all parts of the data object that are stored in different shared memory circuits 14.
  • a single semaphore flag is used to acquire all these parts.
  • respective different semaphore flags may be used for different parts of a data object that are stored in different shared memory circuits.
  • the acquire instruction involves setting all of these semaphore flags, repeating the setting of each flag until it is set from a reset state to a set state. Deadlocks may be avoided by using a predetermined sequence of accessing respective semaphore flags.
  • a single instruction from a program may be used for acquiring a data object and a single instruction from the program may be used for releasing the data object.
  • the processor 10 When the processor 10 encounters such an instruction in its program it sets or clears the relevant semaphore flag or flags for a data object in response.
  • the processor 10 may also be configured to perform accompanying actions in response to these instructions, or the accompanying actions may be controlled by additional instructions.
  • the acquire and release program instructions may have an operand for identification of the data object, in which case the processor 10 uses this operand to select a semaphore flag when it encounters an acquire or release instruction in its program. Alternatively, the selection may be performed in response to an additional instruction, although this complicates the program.
  • the processor 10 may be configured to test the detection result in response to the release instruction, or alternatively one or more separate program instructions may be used to test the detection result before reaching the release instruction.
  • the entire function of the detector circuit 102 (testing the status of feedback information for a plurality of queues to different shared memory circuits 14) may be implemented using instructions that test queues individually. However, preferably the processing core is configured to return the test result for all queues in response to a single instruction.
  • a read transaction involves sending a message with a read address from a processor 10 to shared memory circuit 14 via communication network 12, reading data from main memory 142 at the read address and sending a return message with the read data from shared memory circuit 14 to the processor 10 via communication network 12.
  • Read transactions may be buffered as needed. However, because completion of read transactions may be confirmed by the reception of read data in the processor 10, no special mechanism is needed to detect completion of read operations.
  • Fig. 3 shows an embodiment of a processor 10, comprising a processing core 30, a program memory 31 for storing instructions for processor 10, a local data memory 32, a plurality of distributor circuits 33, a plurality of FIFO buffer circuits 34, a network interface circuit 36, detector circuits 38 and detection result registers 39. Only a few distributor circuits 33, FIFO buffer circuits 34, detector circuits 38 and detection result registers 39 are shown by way of example a different number of such circuits may be present.
  • Processing core 30 is coupled to program memory 31, local data memory 32, detection result registers 39, distributor circuits 33 and network interface circuit 36.
  • Distributor circuits 33 have outputs coupled to FIFO buffer circuits 34.
  • FIFO buffer circuits 34 are coupled to network interface circuit 36, which has a connection 37 coupled to a router circuit in the communication network (not shown).
  • Detector circuits 38 are coupled to respective groups of FIFO buffer circuits 34 and to respective detection result registers 39.
  • FIFO buffer circuits may be provided for read instructions, coupled between processing core 30 and network interface circuit 36.
  • processing core 30 executes successive instructions from a program in program memory 31.
  • the instructions may involve reading and writing data in local data memory 32, performing read modify write operations and write operation to flag memory 140 (not shown) and writing data to distributor circuits 33.
  • data is provided to distributor circuits 33 in the form of a write operation record, containing the write data in combination with write addresses of the write data in shared memory circuit 14.
  • Processing core 30 may transact read modify write operations and write operations to flag memory 140 directly with network interface circuit 36, network interface circuit 36 transmitting and receiving the corresponding messages via communication network 12, the processing core 30 waiting for reception of the read data of the read modify write operations. Alternatively, additional buffers may be provided for these operations.
  • Each of distributor circuits 33 is associated with a different data object.
  • processing core 30 has respective outputs for different data objects, coupled to different distributor circuits 33.
  • processing core 30 may have a shared output for all data objects, the distributor circuits distinguishing between write operations records for different data objects on the bases of write addresses.
  • Distributor circuits 33 determine the shared memory circuits 14 associated with the write operation records and distribute the write operation records to different FIFO buffer circuits 34 accordingly.
  • each FIFO buffer circuit 34 is associated with a combination of a shared memory circuit and a data object, and the FIFO buffer circuit 34 buffers write operation records for that combination.
  • Network interface circuit 36 reads write operation records from FIFO buffer circuits 34, forms messages with the data and transmits the messages over communication network 12. Predetermined information about the shared memory circuits and/or data objects associated with the FIFO buffer circuits 34 may be stored in network interface circuit 36, for defining the destinations of the messages.
  • FIFO buffer circuits 34 contain budget registers 340 (the budget register of only one FIFO buffer circuits 34 is shown separately) for storing free buffer space counts. Initially, the budget registers are preset to represent the maximum buffer capacity of a relevant buffer in the shared memory circuit 14 that is associated with the FIFO buffer circuit 34.
  • the free space represented by the budget register of the FIFO buffer circuit 34 is decremented.
  • Network interface circuit 36 tests the content of the budget register of a FIFO buffer circuit 34 before transmitting a write operation from that of FIFO buffer circuit 34.
  • Network interface circuit 36 receives messages with information about freed buffer space for data objects in shared memory circuits 14. When network interface circuit 36 receives such a message from a shared memory circuit 14 via communication network 12, network interface circuit 36 takes data from the message and increments the space represented in budget register of the FIFO buffer circuits 34 that is associated with the data object and shared memory circuit 14, in accordance with this information.
  • the free buffer space may be represented for example as a count that is initially set to a predetermined maximum value and decremented and incremented as messages are transmitted and messages with information about freed space are transmitted. In this case, messages are held up if the count is zero. But many alternatives exist, for example a count may be used that is initially set to zero and incremented and decremented in response to transmission and reception, messages being held up when the count has reached a maximum.
  • Detector circuits 38 monitor the represented ensured free buffer space values in the budget registers. Detector circuits 38 are each associated with a respective data object and connected to the FIFO buffer circuits 34 for different shared memory circuits 14 for that data object. When any represented ensured free buffer space value in the FIFO buffer circuits 34 for the associated data object is below its maximum value, the detector circuit 38 clears a commit flag in a detection result register 39. When all represented ensured free buffer space values in the FIFO buffer circuits 34 for the associated data object are at their maximum value the detector circuit 38 sets the commit flag in the detection result register 39.
  • processing core 30 When processing core 30 executes a release instruction for a data object, processing core first reads the detection result register 39 for the relevant data object. Processing core 30 repeats this, if necessary, until the commit flag in the detection result register 39 is found to be set. Subsequently, processing core 30 signals network interface circuit 36 to send a message that the semaphore flag for the data object in flag memory 142 must be cleared. After this, execution of the release instruction is completed and processing core 30 proceeds to execute subsequent instructions from its program.
  • the processor of Fig.3 supports writing to a plurality of data objects that are stored distributed over a plurality of shared memory circuit circuits.
  • a data object could be a linked list for example, with pointers that point from one memory location to another in different shared memory circuits, or a list of records, data representing an image, a video and/or audio stream etc.
  • FIFO buffer circuits 34 are provided to support distributed writing of the data objects.
  • Detector circuits 38 are provided for the data objects to support access control with the acquire/release protocol. It should be appreciated that various alternative implementations are possible for producing the same effect. For example, in an embodiment a processor may be allowed to acquire only one data object at a time. In this case, no distributor circuits 33 are needed and a single detector circuit 38 suffices. This embodiment still allows use of a plurality of data objects, in which case a processor may acquire data object even when another processor has previously acquired another data object.
  • a single buffer may be used, from which write operation records for different data objects and different shared memory circuits may be taken.
  • One solution may be to wait until all data objects can be released.
  • one buffer may be used for each data object, containing write operation records for different shared memory circuits. This may have the effect that write operation records for a first shared memory circuit 14 will have to wait because there is no capacity for receiving a message in a second shared memory circuit 14, for which an earlier write operation record is in the buffer.
  • this can be overcome by searching for write operation records for a specific shared memory circuit 14 when capacity is available for that shared memory circuit, but this makes the network interface more complicated.
  • a simple handshake based mechanism may be used, wherein a message is transmitted to a shared memory circuit 14 is transmitted only after reception of an acknowledgment that a previous message has been handled.
  • detector circuits 38 need only look whether FIFO buffer circuits 34 are empty and an acknowledgment has been received.
  • budget based transmission the ensured free space can be used for detection.
  • Budget based transmission i.e. the use of a budget count, represented ensured free buffer space, that may take the values zero, one, two and possibly higher, has the advantage that less waiting is needed before transmission.
  • the return messages from the shared memory circuit 14, which indicate freed buffer space need not be transmitted for every processed message, when the freed space value can take more than one value in a return message.
  • FIFO buffer circuits may structurally be located anywhere, for example as part of network interface circuit 36, as a location in a memory, such as local data memory 32 or in detector circuits 38.
  • FIFO buffer circuits and detection result registers 39 may be implemented using part of local data memory 32.
  • Fig. 4 shows an embodiment of a shared memory circuit 14, comprising a main memory 40, a flag memory 41, a main memory interface 42, a flag memory interface 43, a plurality of FIFO buffer circuits 44, flag FIFO circuits 45, and a network interface circuit 46 with a connection 47 to a router circuit in the communication network (not shown).
  • Main memory interface 42 is coupled between main memory 40 and FIFO buffer circuits 44.
  • Flag memory interface 43 is coupled between flag main memory 40 and flag FIFO buffer circuits 45.
  • Network interface circuit 46 is coupled between FIFO buffer circuits 44, flag FIFO circuits 45 and connection 47.
  • network interface circuit 46 receives messages from the communication network and places write operation records from the messages into the FIFO buffer circuits 44 or flag FIFO buffer circuits 45, selecting the FIFO buffer circuits 44 according to the sources of the messages and the data object involved and selecting the flag FIFO buffer circuits 45 according to destination.
  • Main memory interface 42 reads the write operation records from the FIFO buffer circuits 44.
  • Main memory interface 42 writes the write data from each write operation record into memory 20 at an address indicated by the write address from the write operation record.
  • network interface circuit 46 When main memory interface 42 has processed at least one write operation record from a FIFO buffer circuit 44, this is signaled to network interface circuit 46, which transmits return messages specifying amounts of space freed by processing write operation records.
  • Flag memory interface 43 reads write operation records from the flag FIFO buffer circuits 45.
  • flag memory interface 43 When such a write operation record indicates that a read modify write operation is involved, flag memory interface 43 first performs a read operation at the write address of the write operation record and places a return record with the read flag data in a return flag FIFO buffer circuit 45, for transmission of a message with the read flag data to the source of the read modify write command. After reading the flag, if at all, flag memory interface 43 writes the write data of the write operation record into flag memory 11.
  • set and clear messages may used, which specify the write data by means of a message type indication.
  • read FIFO buffers may be provided.
  • memory interface 42 reads data from the memory 20 and places it the FIFO buffer circuits 44, selected according to the source of read information.
  • Network interface circuit 46 reads the FIFO buffer circuits 44 with data from read operations, forms messages including this data and transmits messages with this data via the communication network.
  • flag memory 41 may be part of a station connected to communication network 12 that does not contain a main memory at all. Flags may be stored in one of the processors 10 for example. Also, different flags may be stored in different stations. Alternatively, flag memory 41 may be realized using a region of main memory 40, in which case the functions of flag memory interface 43 may be performed by main memory interface 42.
  • Some or all of shared memory circuits may be combined into a single station with a processor 10.
  • a common network interface may be used for the main memory in that station and the processing core.
  • write operations in this station can be kept local, which may make it superfluous to test for completion of the local write operations in this station before completing execution of the release instruction.
  • Processors 10 may be provided with a cache memory, for caching data from the shared memory circuits in this case FIFO buffer circuits 34 may be used for posting write operations that have been effected in the cache memory, in order to maintain memory consistency.
  • instructions of the program of processors 10 access data in the data objects always only with instructions between acquire and release instructions for the data object.
  • access by a processor 10 outside such context may be permitted, for example if it is known that no write operations from other processors will occur within a sufficiently large time window to ensure that the data will remain stable.
  • a computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Abstract

A plurality of computer programs is executed concurrently with a plurality of processors (10) that are coupled to a plurality of shared memory circuits (14) via a communication network (12). A data object is stored distributed over the plurality of shared memory circuits (14), in respective memory portions of the shared memory circuits (14). Access to a data object by different ones of the processors (10) during a program execution period is made mutually exclusive by executing acquire and release instructions of a semaphore flag for the data object. Write operation records for the write instructions are buffered in buffers (34) in the processors. Messages are transmitted via a communication network (12), comprising the write operation records from the buffers, to those of the shared memory circuits (14) where the parts are stored that are addressed by the write operations. Processing of the messages is signaled by the shared memory circuits (14) back to the processors (10). Each processor (10) verifies prior to clearing the semaphore flag in response to the release instruction that signals have been received that all previous messages to all shared memories that store data from the data object have been processed.

Description

Circuit with a plurality of processors connected to a plurality of memory circuits via a network
FIELD OF THE INVENTION
The invention relates to a multi-processor circuit with shared memory circuits.
BACKGROUND OF THE INVENTION US 6,338,095 describes a multiprocessor system wherein a communication network is used to transmit messages between processors. Data is sent between pairs of sending and receiving processors and written into a local memory of a receiving processor. Upon completion of reception the receiving processor writes back a completion flag into a specified location in the memory of the sending processor. This allows the sending processor to wait until the message has been received.
In a multiprocessor system a shared memory circuit may be used that may be accessed by cooperating programs from different processors. Special attention is needed for relative timing of the processors when this shared memory circuit comprises shared data that may be updated in the course of time. For example, when an instruction form one program reads from a storage location and an instruction from another program writes to the same location, the result of the read instruction will differ dependent on the relative time of execution of the write instruction. Similarly, if two processors write to the same shared memory circuit location, the result will differ dependent on the sequence of writing. To realize predictable results, it is necessary that a predictable access sequence of operations from different processors can be realized.
In the multiprocessor of US 6,338,095 an order of access to data is ensured because data is communicated by means of messages that are initiated by the processors. Thus, for example, one processor can ensure that another processor gets data from a specific memory location only after it has been updated, by delaying transmission of a message with the data until the data has been updated.
Proper relative time becomes difficult when a plurality of processors have to access a shared, distributed data object with parts that are stored in portions of different shared memory circuits, via a communication network wherein different routes from a processor to the shared memory circuits can be allocated, dependent on network congestion. Such a communication network effectively makes it impossible to provide for a predetermined common notion of temporal sequence that applies consistently to all processors and share memory circuits. Nor is there a one-to-one communication relation between processors, by which a lock step time sequence can be realized between processors. Instead each processor may access the data when its program calls for it. Each program may contain a large number of instructions that access the shared memory circuit. Ensuring that different shared memory circuit access instructions from different programs are executed in a predictable sequence may require considerable overhead.
SUMMARY OF THE INVENTION
Among others, it is an object to provide for a multi-processing circuit with shared memory circuits and a network for providing communication routes between processors and the shared memory circuit, wherein timing constraints can be imposed on the programs of the processors with little program overhead.
A circuit according to claim 1 is provided. Herein processors limit access to a data object during a program execution period to parts of a program between acquire and release instructions. The acquire and release instructions set and clear a semaphore flag for the data object. Write operations to the data object are implemented by distributing messages with write operation records to different shared memory circuits. The semaphore flag is cleared after testing that all previous messages from the processor to all shared memory circuits that store data from the data object have been processed. In an embodiment the processors are configured to test this in response to a single instruction, by using a detector circuit to monitor the number of unprocessed messages. The result produced by the detector circuit may be read in response to the release instruction, before clearing the semaphore flag for the data object, or it may be read in response to a separate instruction, executed before the release instruction. By providing for a processor with an instruction for this purpose, efficient programmable access to a distributed data object is ensured.
In an embodiment a credit based mechanism is used for monitoring ensured free buffer space for receiving messages from the processors in the shared memory circuits. The information about the ensured free buffer space for different shared memory circuits may be combined to perform detection that all previous messages from the processor to all shared memory circuits that store data from the data object have been processed. Thus the detection can be performed with little overhead. In an embodiment a plurality of distributed data objects is stored. In this embodiment respective semaphore flags are used for different data objects. For acquire and release instructions the data object to which they apply is indicated. Before clearing the semaphore flag for an indicated data object, message processing of previous messages to the different shared memory circuits is tested specifically for the indicated data object.
In an embodiment the processors each have first in first out buffers for respective ones of the shared memory circuits. In this case detection may be performed by detecting a status of transmissions from the first in first out buffers.
The data objects do not occupy the entire shared memory circuits. When a data object has been acquired by a processor, other processors may still access the shared memory circuits that store parts of the data objects, in order to access data outside the data object. This is possible because the semaphore flag is provided specifically for the data object. As a result there is no need to block all access to a plurality of shared memory circuits to ensure consistent access to a data object that is stored distributed over the plurality of shared memory circuits. In a further embodiment, write operation records for the data object are buffered separately form accesses to other parts of the shared memory circuits. In this way it can be determined quickly when the release instruction can be executed.
BRIEF DESCRIPTION OF THE DRAWINGS These and other objects and advantageous aspects will become apparent from a description of exemplary embodiments, using the following figures:
Fig. 1 shows a multiprocessing system
Fig. 2 illustrates information flow
Fig. 3 shows a processor Fig. 4 shows a shared memory circuit
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
Fig. 1 shows a multiprocessing circuit comprising a plurality of processors 10, a communication network 12 with an array of router circuits 120 (only some explicitly labeled) and shared memory circuits 14. Processors 10 and shared memory circuit 14 are coupled to selected router circuits. Processors 10 contain FIFO buffer circuits 100 and detector circuits 102. One of the shared memory circuits contains a semaphore flag memory 140. Each of the shared memory circuits 14 contains a main memory 142. Communication network 12 comprises connections between neighboring router circuits 120. Communication network 12 is a variable route network, that is, communication network 12 provides for selectable routes through router circuits 120 between stations such as processors 10 and shared memory circuit 14. Such a variable route communication network 12 makes it possible to communicate between a large number of stations without introducing an exceedingly large number of connections. Any number of router circuits 120 and any connection pattern may be used. For the sake of clarity only a small communication network 12 is shown, wherein the router circuits 120 are connected in a two dimensional matrix pattern. Moreover, the variable route communication network 12 makes it possible to avoid frequent blocking of messages by other message traffic. For this purpose communication network 12 is configured to allow for different alternative routes through the router circuits 120 for any message transmission. The availability of alternative routes makes it possible to select any route that avoids router circuits 120 that are too busy at the time of message transmission. Variable route communication networks that support such selected routing and methods of selecting routes through such networks are known per se. It suffices to note that the time needed for passing a message between stations such as processors 10 and shared memory circuit 14 may delay the message by a different amount of time (unpredictably for the processors 10), dependent on the number of router circuits 120 used along the route.
In operation, processors 10 of the multiprocessing circuit execute instructions from respective programs. The execution comprises access to data in shared memory circuit 14 (writing and/or reading). Fig.2 illustrates information flow involved in access to a data object that is stored in a distributed way in parts of shared memory circuits 14. The actions are used to perform access according to what will be called an acquire/release protocol, wherein a processor 10 executes instructions to accesses shared memory circuit 14 only between executing an "acquire" instruction and a "release" instruction. The acquire instruction is used to set a semaphore flag in flag memory 140 and the release instruction is used to clear this semaphore flag. In the Figure time is represented vertically, progressing downwards. Vertical lines represent main memory 142, FIFO buffer circuits 100, detector circuits 102, processors 10 and flag memory 140. Access involves three stages 20, 22, 24. In a first stage 20 a processor 10 executes the "acquire" instruction, which causes a read modify write transaction to be performed with flag memory 140. This transaction involves a first transmission 200 via communication network 12 from processor 10 to flag memory 140 to request a read modify write action on the semaphore flag in flag memory 140 and a second transmission 202 back from flag memory 140 to the processor 10 to indicate the read result. Execution of the acquire instruction completes when the result indicates that the flag was not in as set state. If the result indicates that the flag was in as set state, the read modify write transaction is repeated until the result indicates that the flag was not in as set state.
In a second stage 22 the processor 10 that completed the acquire instruction executes instructions that lead to access to main memories 142. By way of example only write access is illustrated. In response to the instructions processor 10 performs a transaction 220 to send data and addresses to the FIFO buffer circuit 100. Different FIFO buffer circuits 100 are provided for respective shared memory circuits 14. The FIFO buffer circuit 100 for the shared memory circuit that is addresses in turn performs a transaction 222 to send a message containing the data and the address to main memory 142 of the addressed shared memory circuit 14, via communication network 12. In shared memory circuit 14 the address and data are buffered and subsequently used to write the data at the address in the main memory 142. In response to other instructions processor 10 may performs any number of such transactions 220.
Each FIFO buffer circuit 100 maintains a count representing an ensured free buffer space in a corresponding FIFO buffer of its associated shared memory circuit 14. FIFO buffer circuit 100 reduces this represented ensured free buffer space each time when it sends a message to shared memory circuit 14. The associated shared memory circuit 14 sends messages back to the processor 10, each message indicating a number of buffer positions that it has freed by using the address and data to write into main memory 142. A transmission 224 represents the transmission of such a message. FIFO buffer circuits 100 receive these messages from their associated shared memory circuits 14 and increase the represented ensured free buffer space accordingly. Initially each this represented ensured free buffer space is set to a maximum buffer space available in the associated shared memory circuit 14. As will be appreciated this has the effect that the count represents buffer space that is ensured to be free, even if the buffer space that is actually free may be larger than the represented ensured free buffer space. FIFO buffer circuit 100 holds up transmission if the represented ensured free buffer space indicates that it is not ensured that sufficient free space is available.
Detector circuit 102 receives information about the represented ensured free buffer space from FIFO buffer circuits 100. Reception of this information is represented by a transaction 226. As shown the information about is received after an update of the represented ensured free buffer space. In practice, the represented ensured free buffer space may be monitored continuously, or sampled when needed by detector circuit 102.
After completing execution of the acquire instruction processor 10 may perform any number of access operations, FIFO buffer circuits 100 sending corresponding messages via communication network 12 and maintaining the represented ensured free buffer space. This continues until a third stage 24 of execution.
In the third stage 24 of execution, processor 10 executes the "release" instruction. Execution of the release instruction comprises performing a read action 240 on detector circuit 102, the detector circuit 102 performing a return action 242 to provide a read result. This read result represents information gathered by detector circuit 102 whether the corresponding represented ensured free buffer space values for all FIFO buffer circuits 100 in the processor involved with access have their maximum (initial) value. If so, processor 10 completes the release instruction, by sending a message 244 to flag memory 140 to clear the semaphore flag that was set by the acquire instruction. This protocol may be used to ensure that only one processor at a time will have access to a data object that is stored in a distributed way in a plurality of main memories 142. In a further embodiment the protocol may be used to ensure that different ones of the processors 10 will have access selected data objects in main memories 142, only one at a time for each data object. In this case, the acquire and release instructions contain specifications of the data objects to which they relate. Access to a plurality of such data objects may be managed in parallel by providing a plurality of selectable semaphore flags in flag memory 140, each for a respective data object. Different FIFO buffer circuits 100 may be provided in a processor 10 for different data objects in this case. In this case detector circuit 102 collects represented ensured free buffer space values to produce respective results for groups of FIFO buffer circuits 100 that write to the same data object. Thus different processors 10 may have access to different data objects in parallel. In an embodiment access to regions of main memories 142 outside the data objects may be allowed without such a protocol.
Each data object may be defined for example by a range of address values, or by a root pointer to the data object. In an embodiment, one or more bit of the write address may be used to indicate the data object, remaining bits of the address indicating the shared memory circuit where the data is stored and the address within said shared memory. This makes it easy to determine the data object involved in a write operation. Alternatively, each data object may comprise a root address pointer, addresses of data in the data object being derived through the root pointer. In this case, a processor may need to indicate explicitly which data object it addresses, for example by providing additional information in access operations to identify the data object.
A point to note is that the acquire and release instruction are performed for the data object as a whole, i.e. in one action for all parts of the data object that are stored in different shared memory circuits 14. A single semaphore flag is used to acquire all these parts. As an alternative respective different semaphore flags may be used for different parts of a data object that are stored in different shared memory circuits. In this case, the acquire instruction involves setting all of these semaphore flags, repeating the setting of each flag until it is set from a reset state to a set state. Deadlocks may be avoided by using a predetermined sequence of accessing respective semaphore flags. Use of a single semaphore flag for a data object that is distributed over a plurality of shared memory circuits 14 that are connected at different points to the communication network 12 has the advantage that a much more robust mechanism is realized that requires less overhead. A single instruction from a program may be used for acquiring a data object and a single instruction from the program may be used for releasing the data object. When the processor 10 encounters such an instruction in its program it sets or clears the relevant semaphore flag or flags for a data object in response. The processor 10 may also be configured to perform accompanying actions in response to these instructions, or the accompanying actions may be controlled by additional instructions. For example the acquire and release program instructions may have an operand for identification of the data object, in which case the processor 10 uses this operand to select a semaphore flag when it encounters an acquire or release instruction in its program. Alternatively, the selection may be performed in response to an additional instruction, although this complicates the program. Similarly, the processor 10 may be configured to test the detection result in response to the release instruction, or alternatively one or more separate program instructions may be used to test the detection result before reaching the release instruction. In principle, the entire function of the detector circuit 102 (testing the status of feedback information for a plurality of queues to different shared memory circuits 14) may be implemented using instructions that test queues individually. However, preferably the processing core is configured to return the test result for all queues in response to a single instruction.
Although only write transactions have been described, it should be appreciated that read transactions from processors 10 to shared memory circuit 14 are also possible in second stage 22. A read transaction involves sending a message with a read address from a processor 10 to shared memory circuit 14 via communication network 12, reading data from main memory 142 at the read address and sending a return message with the read data from shared memory circuit 14 to the processor 10 via communication network 12. Read transactions may be buffered as needed. However, because completion of read transactions may be confirmed by the reception of read data in the processor 10, no special mechanism is needed to detect completion of read operations.
Fig. 3 shows an embodiment of a processor 10, comprising a processing core 30, a program memory 31 for storing instructions for processor 10, a local data memory 32, a plurality of distributor circuits 33, a plurality of FIFO buffer circuits 34, a network interface circuit 36, detector circuits 38 and detection result registers 39. Only a few distributor circuits 33, FIFO buffer circuits 34, detector circuits 38 and detection result registers 39 are shown by way of example a different number of such circuits may be present. Processing core 30 is coupled to program memory 31, local data memory 32, detection result registers 39, distributor circuits 33 and network interface circuit 36. Distributor circuits 33 have outputs coupled to FIFO buffer circuits 34. FIFO buffer circuits 34 are coupled to network interface circuit 36, which has a connection 37 coupled to a router circuit in the communication network (not shown). Detector circuits 38 are coupled to respective groups of FIFO buffer circuits 34 and to respective detection result registers 39. In addition further FIFO buffer circuits (not shown) may be provided for read instructions, coupled between processing core 30 and network interface circuit 36.
In operation, processing core 30 executes successive instructions from a program in program memory 31. The instructions may involve reading and writing data in local data memory 32, performing read modify write operations and write operation to flag memory 140 (not shown) and writing data to distributor circuits 33. In an embodiment data is provided to distributor circuits 33 in the form of a write operation record, containing the write data in combination with write addresses of the write data in shared memory circuit 14. Processing core 30 may transact read modify write operations and write operations to flag memory 140 directly with network interface circuit 36, network interface circuit 36 transmitting and receiving the corresponding messages via communication network 12, the processing core 30 waiting for reception of the read data of the read modify write operations. Alternatively, additional buffers may be provided for these operations.
Each of distributor circuits 33 is associated with a different data object. For the sake of exposition, it is assumed that processing core 30 has respective outputs for different data objects, coupled to different distributor circuits 33. Alternatively, processing core 30 may have a shared output for all data objects, the distributor circuits distinguishing between write operations records for different data objects on the bases of write addresses.
Distributor circuits 33 determine the shared memory circuits 14 associated with the write operation records and distribute the write operation records to different FIFO buffer circuits 34 accordingly. Thus, each FIFO buffer circuit 34 is associated with a combination of a shared memory circuit and a data object, and the FIFO buffer circuit 34 buffers write operation records for that combination.
Network interface circuit 36 reads write operation records from FIFO buffer circuits 34, forms messages with the data and transmits the messages over communication network 12. Predetermined information about the shared memory circuits and/or data objects associated with the FIFO buffer circuits 34 may be stored in network interface circuit 36, for defining the destinations of the messages.
FIFO buffer circuits 34 contain budget registers 340 (the budget register of only one FIFO buffer circuits 34 is shown separately) for storing free buffer space counts. Initially, the budget registers are preset to represent the maximum buffer capacity of a relevant buffer in the shared memory circuit 14 that is associated with the FIFO buffer circuit 34. When network interface circuit 36 transmits a write operation record from a FIFO buffer circuit 34, the free space represented by the budget register of the FIFO buffer circuit 34 is decremented. Network interface circuit 36 tests the content of the budget register of a FIFO buffer circuit 34 before transmitting a write operation from that of FIFO buffer circuit 34.
When the represented ensured free buffer space is too low to allow for reception of a message with a write operation record, transmission of that message is deferred.
Network interface circuit 36 receives messages with information about freed buffer space for data objects in shared memory circuits 14. When network interface circuit 36 receives such a message from a shared memory circuit 14 via communication network 12, network interface circuit 36 takes data from the message and increments the space represented in budget register of the FIFO buffer circuits 34 that is associated with the data object and shared memory circuit 14, in accordance with this information.
The free buffer space may be represented for example as a count that is initially set to a predetermined maximum value and decremented and incremented as messages are transmitted and messages with information about freed space are transmitted. In this case, messages are held up if the count is zero. But many alternatives exist, for example a count may be used that is initially set to zero and incremented and decremented in response to transmission and reception, messages being held up when the count has reached a maximum.
Detector circuits 38 monitor the represented ensured free buffer space values in the budget registers. Detector circuits 38 are each associated with a respective data object and connected to the FIFO buffer circuits 34 for different shared memory circuits 14 for that data object. When any represented ensured free buffer space value in the FIFO buffer circuits 34 for the associated data object is below its maximum value, the detector circuit 38 clears a commit flag in a detection result register 39. When all represented ensured free buffer space values in the FIFO buffer circuits 34 for the associated data object are at their maximum value the detector circuit 38 sets the commit flag in the detection result register 39.
When processing core 30 executes a release instruction for a data object, processing core first reads the detection result register 39 for the relevant data object. Processing core 30 repeats this, if necessary, until the commit flag in the detection result register 39 is found to be set. Subsequently, processing core 30 signals network interface circuit 36 to send a message that the semaphore flag for the data object in flag memory 142 must be cleared. After this, execution of the release instruction is completed and processing core 30 proceeds to execute subsequent instructions from its program.
As will be appreciated the processor of Fig.3 supports writing to a plurality of data objects that are stored distributed over a plurality of shared memory circuit circuits. A data object could be a linked list for example, with pointers that point from one memory location to another in different shared memory circuits, or a list of records, data representing an image, a video and/or audio stream etc. FIFO buffer circuits 34 are provided to support distributed writing of the data objects. Detector circuits 38 are provided for the data objects to support access control with the acquire/release protocol. It should be appreciated that various alternative implementations are possible for producing the same effect. For example, in an embodiment a processor may be allowed to acquire only one data object at a time. In this case, no distributor circuits 33 are needed and a single detector circuit 38 suffices. This embodiment still allows use of a plurality of data objects, in which case a processor may acquire data object even when another processor has previously acquired another data object.
As another example, instead of using different FIFO buffer circuits, a single buffer may be used, from which write operation records for different data objects and different shared memory circuits may be taken. However, when more than one data object can be acquired at the same time by one processor, this means that the implementation of detection may be more complicated when the FIFO buffer has to be scanned for write operation records for the same data object. One solution may be to wait until all data objects can be released. Alternatively, one buffer may be used for each data object, containing write operation records for different shared memory circuits. This may have the effect that write operation records for a first shared memory circuit 14 will have to wait because there is no capacity for receiving a message in a second shared memory circuit 14, for which an earlier write operation record is in the buffer. Alternatively, this can be overcome by searching for write operation records for a specific shared memory circuit 14 when capacity is available for that shared memory circuit, but this makes the network interface more complicated. As a further example, instead of budget based transmission, a simple handshake based mechanism may be used, wherein a message is transmitted to a shared memory circuit 14 is transmitted only after reception of an acknowledgment that a previous message has been handled. In this case, detector circuits 38 need only look whether FIFO buffer circuits 34 are empty and an acknowledgment has been received. With budget based transmission the ensured free space can be used for detection. Budget based transmission (i.e. the use of a budget count, represented ensured free buffer space, that may take the values zero, one, two and possibly higher, has the advantage that less waiting is needed before transmission. Also, the return messages from the shared memory circuit 14, which indicate freed buffer space, need not be transmitted for every processed message, when the freed space value can take more than one value in a return message.
Furthermore, although the budget registers in FIFO buffer circuits were described, it should be appreciated that such registers may structurally be located anywhere, for example as part of network interface circuit 36, as a location in a memory, such as local data memory 32 or in detector circuits 38. Similarly, FIFO buffer circuits and detection result registers 39 may be implemented using part of local data memory 32.
Fig. 4 shows an embodiment of a shared memory circuit 14, comprising a main memory 40, a flag memory 41, a main memory interface 42, a flag memory interface 43, a plurality of FIFO buffer circuits 44, flag FIFO circuits 45, and a network interface circuit 46 with a connection 47 to a router circuit in the communication network (not shown). Main memory interface 42 is coupled between main memory 40 and FIFO buffer circuits 44. Flag memory interface 43 is coupled between flag main memory 40 and flag FIFO buffer circuits 45. Network interface circuit 46 is coupled between FIFO buffer circuits 44, flag FIFO circuits 45 and connection 47. In operation network interface circuit 46 receives messages from the communication network and places write operation records from the messages into the FIFO buffer circuits 44 or flag FIFO buffer circuits 45, selecting the FIFO buffer circuits 44 according to the sources of the messages and the data object involved and selecting the flag FIFO buffer circuits 45 according to destination.
Main memory interface 42 reads the write operation records from the FIFO buffer circuits 44. Main memory interface 42 writes the write data from each write operation record into memory 20 at an address indicated by the write address from the write operation record. When main memory interface 42 has processed at least one write operation record from a FIFO buffer circuit 44, this is signaled to network interface circuit 46, which transmits return messages specifying amounts of space freed by processing write operation records. Flag memory interface 43 reads write operation records from the flag FIFO buffer circuits 45. When such a write operation record indicates that a read modify write operation is involved, flag memory interface 43 first performs a read operation at the write address of the write operation record and places a return record with the read flag data in a return flag FIFO buffer circuit 45, for transmission of a message with the read flag data to the source of the read modify write command. After reading the flag, if at all, flag memory interface 43 writes the write data of the write operation record into flag memory 11. As an alternative, set and clear messages may used, which specify the write data by means of a message type indication.
In addition read FIFO buffers (not shown) may be provided. In the case of read messages, memory interface 42 reads data from the memory 20 and places it the FIFO buffer circuits 44, selected according to the source of read information. Network interface circuit 46 reads the FIFO buffer circuits 44 with data from read operations, forms messages including this data and transmits messages with this data via the communication network.
As may be appreciated, not all shared memory circuits 14 need contain a flag memory 41 with associated flag memory interface 43 and FIFO buffer circuits 45. In fact, flag memory 41 may be part of a station connected to communication network 12 that does not contain a main memory at all. Flags may be stored in one of the processors 10 for example. Also, different flags may be stored in different stations. Alternatively, flag memory 41 may be realized using a region of main memory 40, in which case the functions of flag memory interface 43 may be performed by main memory interface 42.
Some or all of shared memory circuits may be combined into a single station with a processor 10. In this case a common network interface may be used for the main memory in that station and the processing core. Of course, in this case write operations in this station can be kept local, which may make it superfluous to test for completion of the local write operations in this station before completing execution of the release instruction.
Processors 10 may be provided with a cache memory, for caching data from the shared memory circuits in this case FIFO buffer circuits 34 may be used for posting write operations that have been effected in the cache memory, in order to maintain memory consistency.
Preferably, instructions of the program of processors 10 access data in the data objects always only with instructions between acquire and release instructions for the data object. However, in some cases, access by a processor 10 outside such context may be permitted, for example if it is known that no write operations from other processors will occur within a sufficiently large time window to ensure that the data will remain stable.
Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Claims

CLAIMS:
1. A circuit comprising: a communication network (12) with respective access points, the network (12) being configured to provide communication routes between the access points; a plurality of processors (10) coupled to mutually different ones of the access points; a plurality of shared memory circuits (14) coupled to mutually different ones of the access points, for storing a data object distributed over the shared memory circuits (14), each shared memory circuit (14) being configured to receive write operation records from the processors (10) via the communication network, to perform write operations in a part of the data object in response to the write operation records and to transmit report messages back to the processors, indicating progress of processing of the write operation records; - a flag memory (140) for maintaining a semaphore flag to control access to the data object; each of the processors (10) being configured to gather information obtained from the report messages from all shared memory circuits (14) that store data from the data object; - each of the processors (10) comprising a processing core (30), configured to execute instructions from a program of the processor, including a release instruction for the data object, the processing core (30) being configured to cause the semaphore flag to be cleared in response to the release instruction and to delay said clearing of the semaphore flag in response to the release instruction until it is determined from said gathering that all previous messages from the processor (10) to all shared memory circuits that store data from the data object have been processed.
2. A circuit according to claim 1, wherein the processors (10) each comprise budget registers (340), for storing information representing ensured amounts of free buffer space for buffering the write operation records for the data object in respective ones of the shared memory circuits (14), each processor (10) being to configured to decrement and increment the represented ensured amounts of free buffer space in association with transmission of the write operation records and reception of the report messages respectively, each processor (10) being configured to use a comparison of the represented ensured amounts of free buffer space for all respective ones of the shared memory circuits (14), each with a maximum available amount of free buffer space for the respective one of the shared memory circuits (14), for said determination whether all previous messages from the processor (10) to all shared memory circuits (14) that store data from the data object have been processed by the shared memory circuits (14).
3. A circuit according to claim 1, wherein the shared memory circuits (14) are configured to store a plurality of data objects, each distributed over the shared memory circuits (14), the flag memory (140) storing respective semaphore flags to control access to respective ones of the data objects, each release instruction being associated with an indication of an indicated data object to which the release instruction applies, the processing core (30) being configured to cause the semaphore flag for the indicated data object to be cleared in response to the release instruction and to delay clearing of the semaphore flag for the indicated data object until it is determined from said gathering that all previous messages associated with the indicated data object from the processor (10) to all shared memory circuits (14) that store data from the indicated data object have been processed.
4. A circuit according to claim 1, wherein each processor (10) comprises: - a plurality of first in first out buffers (34), each for buffering the write operation records for a respective one of the shared memory circuits (14) until transmission via the communication network to the associated shared memory circuit (14) is possible; a detector (38) for detecting a status of transmissions from the first in first out buffers (34), the processor (10) delaying clearing of the semaphore flag until the detector (38) indicates that for each first in first out buffer circuit (34) all previous messages from the first in first out buffer circuit (34) to all shared memory circuits (14) that store data from the data object have been processed.
5. A circuit according to claim 1, wherein at least one of the processors (10) is configured to access data outside the data object in at least one of the shared memory circuits (14), between execution of the acquire instruction and the release instructions by another one of the processors (10), and wherein the other one of the processors (10) is configured to send the write operation records to at least one of the shared memory circuits (14) between execution of the acquire instruction and the release instruction.
6. A method of executing a plurality of computer programs concurrently with a plurality of processors (10) that are coupled to a plurality of shared memory circuits (14) via a communication network (12), wherein access to a data object by different ones of the processors (10) during a program execution period is made mutually exclusive by executing acquire and release instructions for the data object, the method comprising: storing the data object distributed over the plurality of shared memory circuits (14), in respective memory portions of the shared memory circuits (14); - executing programs of instructions in the processors (10); each time when any one of the processors (10) encounters an acquire instruction for the data object, setting a semaphore flag in response to the acquire instruction; each time when any one of the processors (10) encounters a release instruction for the data object, clearing the semaphore flag in response to the release instruction, - executing instructions from the program to write to parts of the data object during the program execution period in each of the processors (10) only between execution of the acquire instruction and the release instruction in that processor (10); buffering the write operation records for the write instructions in buffers (34) in the processors; - transmitting messages via a communication network (12), comprising the write operation records from the buffers, to those of the shared memory circuits (14) where the parts are stored that are addressed by the write operations; signaling processing of the messages by the shared memory circuits (14) back to the processors (10); - detecting, in each processor (10), prior to clearing the semaphore flag in response to the release instruction that signals have been received that all previous messages to all shared memories that store data from the data object have been processed.
7. A method according to claim 6, comprising: storing, in each processor (10), information representing ensured amounts of free buffer space for buffering the write operation records for the data object in respective ones of the shared memory circuits (14); - decrementing the represented ensured amounts of free buffer space in association with transmission of the write operation records; incrementing the represented ensured amounts of free buffer space in association with response messages from the shared memory circuits (14), indicating how much buffer space has been freed; - comparing the represented ensured amounts of free buffer space for all respective ones of the shared memory circuits (14), each with a maximum available amount of free buffer space for the respective one of the shared memory circuits (14), using a result of said comparison to enable clearing of the semaphore flag in response to the release instruction.
8. A method according to claim 6, comprising: storing a plurality of data objects, each distributed over the shared memory circuits (14), storing respective semaphore flags to control access to respective ones of the data objects, associated indications of respective data objects with the release instructions to indicate the data object to which the release instruction applies; clearing the semaphore flag for a data object in response to a release instruction associated with the data object, clearing being delayed until it said detecting shows, selectively for all previous messages with write operation records of write operations to parts of the indicated data object, that all previous messages from the processor to all shared memory circuits (14) that store data from the indicated data object have been processed.
9. A method according to claim 6, wherein each processor (10) comprises a plurality of first in first out buffers (34), each for buffering the write operation records for a respective one of the shared memory circuits until transmission via the communication network to the associated shared memory circuit is possible, the method comprising performing said detecting by detecting transmission status information for each of the first in first out buffers (34).
10. A method according to claim 6, comprising: - executing the acquire instruction and the release instruction with a first one of the processors (10); sending write operation records to at least one of the shared memory circuits (14) from the first one of the processors between execution of the acquire instruction and the release instruction; - accessing data outside the data object in at least one of the shared memory circuits (14) with at least a second one of the processors (10) between execution of the acquire instruction and the release instruction.
PCT/IB2008/053633 2007-09-18 2008-09-09 Circuit with a plurality of processors connected to a plurality of memory circuits via a network WO2009037614A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP07116624 2007-09-18
EP07116624.3 2007-09-18

Publications (2)

Publication Number Publication Date
WO2009037614A2 true WO2009037614A2 (en) 2009-03-26
WO2009037614A3 WO2009037614A3 (en) 2009-05-22

Family

ID=40382028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/053633 WO2009037614A2 (en) 2007-09-18 2008-09-09 Circuit with a plurality of processors connected to a plurality of memory circuits via a network

Country Status (1)

Country Link
WO (1) WO2009037614A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3304297A1 (en) * 2015-06-04 2018-04-11 Siemens Aktiengesellschaft Method and system for clustering engineering data in a multidisciplinary engineering system
WO2022063185A1 (en) * 2020-09-27 2022-03-31 中兴通讯股份有限公司 Data acquisition method and system, data reporting method and system, chip, cpu, and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060230411A1 (en) * 2005-04-12 2006-10-12 Microsoft Corporation Resource accessing with locking

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060230411A1 (en) * 2005-04-12 2006-10-12 Microsoft Corporation Resource accessing with locking

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MONCHIERO ET AL: "Efficient Synchronization for Embedded On-Chip Multiprocessors" IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 14, no. 10, 1 October 2006 (2006-10-01), pages 1049-1062, XP011142360 ISSN: 1063-8210 *
RADULESCU A ET AL: "An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration" IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATEDCIRCUITS AND SYSTEMS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 24, no. 1, 1 January 2005 (2005-01-01), pages 4-17, XP002343901 ISSN: 0278-0070 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3304297A1 (en) * 2015-06-04 2018-04-11 Siemens Aktiengesellschaft Method and system for clustering engineering data in a multidisciplinary engineering system
WO2022063185A1 (en) * 2020-09-27 2022-03-31 中兴通讯股份有限公司 Data acquisition method and system, data reporting method and system, chip, cpu, and storage medium

Also Published As

Publication number Publication date
WO2009037614A3 (en) 2009-05-22

Similar Documents

Publication Publication Date Title
JP3996454B2 (en) System and method for inter-domain low overhead message passing in a split server
US20060047849A1 (en) Apparatus and method for packet coalescing within interconnection network routers
JP3606551B2 (en) Data processing system, method and storage medium including interrupt architecture
US5890007A (en) Multi-cluster parallel processing computer system
US8155134B2 (en) System-on-chip communication manager
CN110741356A (en) Relay -induced memory management in multiprocessor systems
US9274861B1 (en) Systems and methods for inter-process messaging
US7409481B2 (en) Data processing system, method and interconnect fabric supporting destination data tagging
US20050149665A1 (en) Scratchpad memory
US8102855B2 (en) Data processing system, method and interconnect fabric supporting concurrent operations of varying broadcast scope
CN111949568B (en) Message processing method, device and network chip
US10397144B2 (en) Receive buffer architecture method and apparatus
CN113794764A (en) Request processing method and medium for server cluster and electronic device
US7093037B2 (en) Generalized queue and specialized register configuration for coordinating communications between tightly coupled processors
US20190146845A1 (en) Lock Allocation Method and Apparatus, and Computing Device
US8924784B2 (en) Controller for managing a reset of a subset of threads in a multi-thread system
JP4489116B2 (en) Method and apparatus for synchronous non-buffer flow control of packets over ring interconnect
WO2009037614A2 (en) Circuit with a plurality of processors connected to a plurality of memory circuits via a network
US20190138630A1 (en) Techniques for implementing a split transaction coherency protocol in a data processing system
US7483428B2 (en) Data processing system, method and interconnect fabric supporting a node-only broadcast
US7257680B2 (en) Storage system including shared memory and plural disk drives, processors, and shared memory control units
US20080192761A1 (en) Data processing system, method and interconnect fabric having an address-based launch governor
EP1780976A1 (en) Methods and system to offload data processing tasks
US9338219B2 (en) Direct push operations and gather operations
CN115203210A (en) Hash table processing method, device and equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08807581

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08807581

Country of ref document: EP

Kind code of ref document: A2