EP0665503A2 - High-speed synchronization communication control mechanism for multi-processor system - Google Patents

High-speed synchronization communication control mechanism for multi-processor system Download PDF

Info

Publication number
EP0665503A2
EP0665503A2 EP95101163A EP95101163A EP0665503A2 EP 0665503 A2 EP0665503 A2 EP 0665503A2 EP 95101163 A EP95101163 A EP 95101163A EP 95101163 A EP95101163 A EP 95101163A EP 0665503 A2 EP0665503 A2 EP 0665503A2
Authority
EP
European Patent Office
Prior art keywords
communication register
communication
processor
modules
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP95101163A
Other languages
German (de)
French (fr)
Other versions
EP0665503A3 (en
Inventor
Masanobu C/O Nec Kofu Ltd. Inaba
Noriyuki C/O Nec Kofu Ltd. Ando
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of EP0665503A2 publication Critical patent/EP0665503A2/en
Publication of EP0665503A3 publication Critical patent/EP0665503A3/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • the present invention generally relates to a synchronization communication mechanism, and more specifically to a synchronization communication control mechanism employed in a multi-processor system.
  • communication registers high-speed shared registers
  • This communication register is required such that the accessing time thereof is shortened, as compared with a storage unit, and/or the throughput thereof is relatively high.
  • the respective processors execute the communications through such a communication register, so that the data processing speeds can be increased. Since sufficient parallelism could not be substantially achieved in the multi-processor system in the synchronisation control, the mutual exclusion, or the communication control, these controls may give great adverse influences to the performance of the overall system as the parallelism is increased. As a consequence, the arrangement of the communication register would give considerably large influences to an improvement of the performance of the multi-processor system.
  • the barrier synchronization implies such a process operation that all of processors are waiting in a barrier synchronization routine until all of these processors execute this barrier synchronization routine.
  • This barrier synchronization routine is represented in Fig. 9. It is assumed that the number of processors for executing the barrier synchronization is stored in the communication register for storing therein the word of #0 as an initial value, a non-zero value is stored in the communication register for storing the word of #1, and zero values are stored into scalar registers S0 and S1.
  • the value of the word #0 in the communication register is first saved to the scalar register S0, and then is decremented. Since the number of processor has been stored as the initial value into the word #0 of the communication register, when all of the processors enter into this barrier routine, the value of the word #0 in the communication register becomes zero. Finally, the processors other than the processor which has entered into this barrier routine jump to loop 1, and wait in this loop until the final processor enters into the routine. It can be judged as to whether the processor corresponds to the final processor by checking the value of the word #0 in the communication register, which has been read by the FDCR command. If the checked processor corresponds to the final processor, then the zero value is written into the word #1 in the communication register, which will then be announced to other processors.
  • the final processor after the processors except for the final processor have executed the FDCR command, the final processor repeatedly executes the LCR command within loop 1 until this final processor causes the value of the word #1 to be zero value.
  • This repeat execution is referred to "spin lock". Since the spin lock is performed by all of the processors which have entered into the routine, the access operations to the communication registers are concentrated, so that large access contention may occur. Because of this access contention, the FDCR command access which is executed by the processor that has entered into the barrier synchronization routine should be brought into the waiting condition. In the worst case, the waiting time may reach the time period defined by the quantity of processors which is under spin lock condition and waits for the barrier synchronization.
  • each of these processor sequentially decrements the word #0, and thereafter each processor checks as to whether or not the operations of the other processors are completed.
  • 8 cycles are required to accomplish the synchronization.
  • (2xN) cycles are required for N processors. It should be noted that symbol "N" indicates an integer.
  • An object of the present invention has been made in an attempt to solve the above-described problems, and therefore realizes that the synchronization communication control via the communication register in the multi-processor system can be effected at high speeds.
  • Another object of the present invention is to avoid contention occurred among the processors while referring to the communication registers.
  • a multi-processor system comprises N processors ("N" being an integer), a storage unit, a communication register unit, and an interconnection network for interconnecting said processors, said storage unit, and said communication register unit.
  • the communication register unit includes N communication register modules each for storing data having the same number of word, each of said communication register modules is so controlled as to store the respective words having the same contents with each other, and also is referred by only one specific processor.
  • a multi-processor system includes n processors 100 for processing data, a storage unit 400 for storing the data, and a communication register unit 300 for synchronizing communications among the processors. These units are interconnecting each other via an interconnection network 200.
  • Each of these processors 100 own a single access port with respect to the interconnection network 200.
  • the storage unit 400 owns a single access port as the overall unit.
  • the communication register unit 300 is subdivided into n communication register modules 320.
  • An exclusively discriminable module number is attached to each of these communication register modules as identifier. In this drawing, this module number is indicated by #1, #2, ---, #n, respectively.
  • Each of the communication register modules 320 owns a single access port with regard to the interconnection network.
  • the interconnection network 200 owns n ports in total with respect to the respective processors, n ports in total for the respective communication modules 320, and a single access port for the main storage unit. Access paths are provided among the access ports, and the access request is transported through the access paths.
  • another multi-processor system is so arranged as to employ multiple access ports/access paths in order that the access throughput may be improved. For example, it may be arranged that n access paths are established between the main storage unit and the interconnection network.
  • this processor 100 When the processor 100 accesses either the storage unit 400 or the communication register unit 300, this processor 100 produces the request packet and sends it out via the access path to the interconnection network 200.
  • the interconnection network 200 arbitrates the contention occurred in a plurality of request packets transferred from a plurality of processors 100, routes the respective request packets to their destinations, namely the storage unit 400 and the communication register unit 300, and sends out the request packets through the respective access paths thereto.
  • the request packet arrived at either the storage unit 400, or the communication register unit 300 causes the read access or the write access in the respective units. In case of the read access, the read data is again returned via the interconnection network to the processor.
  • a format of a request packet transferred through the interconnection network 200 is constructed of an access type field 801 for indicating whether the storage unit 400 or the communication register unit 300 is accessed, a code field 802 for denoting whether the load access or a store access is made, an address field 803 for showing either the address of the storage unit 400 or the address of the communication register 300, and also a data field 804 for the write data.
  • the read data is held in the data field and returned via the interconnection network 200 to the processor 100.
  • this interconnection network 200 may be arranged by employing various network arrangements, such a network arrangement is desired that no blocking is produced when both of the request from the processor 100 to the communication register unit 300 and the request from another processor 100 to another communication register unit 300 are simultaneously reached at the access port of the interconnection network 200.
  • the non-blocking type crossbar switch is one of the desirable arrangements.
  • each of the communication modules 320 within the communication register unit 300 includes a communication register memory 301 constructed of a plurality of words, a write register 302 for supplying the write data to the communication register memory 301, an address register 303 for supplying the address to the communication register memory 301, and a read register 304 for holding the data read out from communication register memory 301.
  • This communication register module 320 further includes a write enable register 305 for enabling the communication register memory 301 to write the data, a read enable register 306 for enabling the communication register memory 301 to read the data, a request packet control circuit 311 for taking the request packet sent from the interconnection network 200 apart into a plurality of request packets which will then be distributed to the respective circuit units, a communication register control circuit 310 for controlling accessing operation to the communication register memory 301, and a reply packet control circuit 312 for producing a reply packet to the interconnection network 200.
  • the address numbers are allocated to the communication register memories 301 from the zero address in a serial form. In the access to the communication register issued from the processor 100, this communication register is address-designated to determine the word position of the communication register to be accessed.
  • the contents of the data stored into the respective words of the communication register memory 301 may be arbitrary determined.
  • the communication register is used for achieving synchronization, either all bits of the word, or some bits thereof may be used as a synchronizing flag. Alternatively, only the most significant bit (MSB) of the word may be used as the synchronizing flag, and the remaining bits thereof may be utilized as the storage data sent/received among the processors.
  • MSB most significant bit
  • the value of "1" is set to the write enable register 305, the address of the word to be written is set to the address register 303, and then the data to be written is set into the write register 302.
  • the value of the write register 302 is written into the word of the communication register memory 301 designated by the address register 303.
  • the value of 1 is set into the read enable register 306, and then the address of the word to be read is set into the address register 303.
  • the data is read out from the word of the communication register memory 301 designated by the address register 303, and thereafter held into the read register 304.
  • These registers provided around the communication register memory 301 are controlled by the communication register control circuit 310.
  • a request packet control circuit 311 controls the request packet arrived from the interconnection network 200. Upon receipt of the request packet from the interconnection network 200, the request packet control circuit 311 decodes a request code field 802, and also judges whether the load access or the store access is issued, The decoded result is sent to the communication register control circuit 310.
  • a reply packet control circuit 312 causes the data held in the read register to be stored into a data field 804 of a packet, thereby constituting this data as a reply packet to the interconnection network 200.
  • the data within the data field 804 is written into the word addressed by the address field 803 in the communication register memory 301. That is, the address of the communication register is entered into the address register 303 at the write timing. Also, the write data within the data field 804 is entered into the write register 302. At the same time, the content of the write enable register 305 is set to the value of "1", so that the write access is completed at the next timing.
  • the data is read out from the word addressed by the address field 803 in the communication register memory 301.
  • the communication register address in the address field 803 is entered into the address register 303, and at the same time, the content of the read enable register 306 is set to the value of "1" at the read timing.
  • the read data is held in the read register 304.
  • This data held by the read register 304 is stored into the data field 804 of the packet, and is constructed as the reply packet to the interconnection network. This reply packet is sent out to the interconnection network 200.
  • each of the communication register modules 320 employed in the communication register unit 300 is equally divided into n register module groups.
  • Each of these equally divided register module groups will be referred to a "set.”
  • a single set is arranged by "m” words.
  • each of these communication register modules 320 contains n sets of "m” words-sets.
  • set numbers discriminable from each other are attached to the sets. In Fig. 2, this set number is indicated by %1, %2, ---, %n.
  • the set number identical to the module number in the respective communication register modules 320 is called as a "real set,” and the communication register in this real set is referred as a “real communication register.”
  • Other (n-1) sets are called as “copy sets,” whereas the communication register within this set is referred to "a copy of communication register”.
  • the sets to which the same set numbers have been attached are controlled so as to store the same contents. For instance, in the communication register module #1, the set of %1 corresponds to the read set, and other sets correspond to the copy sets.
  • the interconnection network 200 controls this data writing operation.
  • the interconnection network 200 performs such a control operation that the decrement process by the broadcast is performed at a top priority.
  • the decrement is executed in the communication register module #1 at the first cycle, resulting that the writing operation by the broadcast is carried out in other communication register modules.
  • the decrement is executed in the communication register module #2, so that the writing operation by the broadcast is performed in other communication register modules.
  • a similar process operation is carried out with respect to the communication register module #3 at the third cycle and the communication register module #4 at the fourth cycle.
  • the checking process is performed at the respective communication register modules at the fifth cycle. As a result, it is confirmed that all of that processors could be synchronized with each other.
  • the conventional multi-processor system requires the checking phases steps in proportion to the total number of the employed processors, the checking phase of the multi-processor system according to this embodiment could be completed within steps in proportion to one processor.
  • the communication register access is processed as follows: In case of the read access, the read access operation is carried out for such a communication register module having the same module number as that of the processor which has issued the read access. At this time, each of communication registers is accessed unrelated to whether the real set or the copy set.
  • the write data is written into the respective communication register modules 320 having the same addresses.
  • the test process is performed for such a communication register module that there is the real communication register corresponding to the address of the communication register which performs the test process.
  • the test result is returned to the processor.
  • the writing operation of the lock bit is carried out for the real communication register.
  • the writing process of the lock bit is performed via the interconnection network 200 with respect to all of the communication register copies having the same addresses.
  • the communication register module 320 dedicated to each of the processors is employed.
  • the writing request is broadcasted via the interconnection network 200, so that the checking operation can be performed by the respective processors 100 at the same time, and the time required to carry out this checking operation can be shortened.
  • the multi-processor system of this second embodiment owns a similar arrangement to that of the first embodiment except for such a different internal arrangement of the communication register unit 300 as follows. That is, as represented in Fig. 6, the communication register unit 300 according to the second embodiment of the present invention includes n communications register modules 320, which is similar to those of the above-described first embodiment. However, this communication register unit 300 includes a network interface circuit 330 between these communication register modules 320 and the interconnection network 200, which is different from that of the first embodiment.
  • the network interface circuit 330 owns an interface mechanism between the interconnection network 200 and each of the communication register modules 320.
  • the network interface circuit 330 causes the request sent from the interconnection network 200 to pass through the same output port as the input port.
  • the network interface circuit 330 broadcasts the request to all the communication register modules 320.
  • the request format is transmitted to the respective communication register modules 320 without any modification.
  • the network interface circuit 330 routes this request to such a communication register module that there exists the real communication register corresponding to the address of the communication register which executes the test. Furthermore, as a result of the test & set access operations, if it succeeds to lock, then the result is broadcasted to all of the communication register modules 320.
  • each of the communication register modules 320 accesses the word of the communication register indicated in the address field 803 to execute the process indicated in the request code field 802.
  • the network interface circuit 330 when the decrement by the broadcasting operation and the checking process contend with each other in the respective communication register modules 320, the network interface circuit 330 performs the controls in such a manner that the broadcasting operation should be carried out prior to other access.
  • the communication register access request is processed as follows: That is, in case of the read access request, the read access operation is carried out for such a communication register module having the same module number as that of the processor which has issued the access request. At this time, each of communication registers is accessed unrelated to whether the real set or the copy set.
  • the test process is performed for such a communication register module that there is the real communication register corresponding to the address of the communication register which performs the test process.
  • the test result is returned to the processor.
  • the writing operation of the lock bit is carried out for the real communication register.
  • the writing operation of the lock bit is performed via the network interface circuit 330 with respect to all the communication register copies having the same addresses.
  • the communication register module 320 dedicated to each of the processors is employed.
  • the writing request is broadcasted via the network interface circuit 330, so that the checking operation can be performed by the respective processors 100 at the same time, and the time required to carry out this checking operation can be shortened.
  • the multi-processor system of this third embodiment owns a similar arrangement to that of other embodiments except for such a different internal arrangement of the communication register unit 300 as follows.
  • the communication register unit 300 includes n communication register modules 320, which is similar to those of the above-described first and second embodiments. However, this communication register unit 300 includes a communication register network 340 coupling the communication register modules 320 with each other, which is different from the other embodiments.
  • the respective communication register modules 320 have two ports capable of simultaneously accepting two accesses at maximum.
  • the communication register network 340 controls this data writing operation.
  • the communication register network 340 performs the controls in such a manner that the broadcasting operation should be carried out prior to other accesses.
  • the access operation for checking process is performed for the communication register module #1 after the second cycle, in which the decrement has been completed at the first cycle. Thereafter, when the decrements arc carried out in all of the communication register modules, the synchronization of all the processors is completed. As a consequence, the checking operations are carried out at the same time in all of the communication register module at the fifth cycle, so that a confirmation can be made that the synchronization for all of the processors could be completed.
  • the communication register access is processed as follows: In case of the read access request, the read access operation is carried out for such a communication register module having the same module number as that of the processor which has issued the read access. At this time, each of communication registers is accessed unrelated to whether the real set or the copy set.
  • the write data is written into the respective communication register modules 320 in the same addresses.
  • the test process is performed for such a communication register module that there is the real communication register corresponding to the address of the communication register which performs the test process.
  • the test result is returned to the processor.
  • the writing operation of the lock bit is carried out for the real communication register.
  • the writing process of the lock bit is performed via the communication register network 340 with respect to all of the communication register copies in the same addresses.
  • the communication register module 320 dedicated to each of the processors is employed.
  • the writing request is broadcasted via the communication register network 340, so that the checking operation can be performed by the respective processors 100 at the same time, and the time required to carry out this checking operation can be shortened.
  • the multi-processor system it is possible to reduce buffering among the processors when the communication registers are referred. As a consequence, the synchronization control, the mutual exclusion control, or the communication control executed through the communication registers can be performed fast.

Abstract

A multi-processor system includes n of processors (100) ("n" being an integer) for processing data, a storage unit (400) for storing data, and a communication register unit (300) for synchronizing a communication performed among the processors. These units are interconnected via an interconnection network (200). The communication register unit (300) is subdivided into n communication register modules (320) for storing the same contents. Each of communication register modules (320) is referred by the respective processors (100) in one-to-one correspondence, which can be referred at the same time. When a write request is made in a certain communication register module (320), the content of this write request is broadcasted to other communication register modules.

Description

  • The present invention generally relates to a synchronization communication mechanism, and more specifically to a synchronization communication control mechanism employed in a multi-processor system.
  • In multiple multi-processor systems, there are some possibilities that high-speed shared registers called as "communication registers" are used so as to hold shared variables for executing synchronization controls, mutual exclusion controls, or communication controls among processors. This communication register is required such that the accessing time thereof is shortened, as compared with a storage unit, and/or the throughput thereof is relatively high. Thus, the respective processors execute the communications through such a communication register, so that the data processing speeds can be increased. Since sufficient parallelism could not be substantially achieved in the multi-processor system in the synchronisation control, the mutual exclusion, or the communication control, these controls may give great adverse influences to the performance of the overall system as the parallelism is increased. As a consequence, the arrangement of the communication register would give considerably large influences to an improvement of the performance of the multi-processor system.
  • As description will now be made of the barrier synchronization as one example of the above-described synchronization control.
  • The barrier synchronization implies such a process operation that all of processors are waiting in a barrier synchronization routine until all of these processors execute this barrier synchronization routine. This barrier synchronization routine is represented in Fig. 9. It is assumed that the number of processors for executing the barrier synchronization is stored in the communication register for storing therein the word of #0 as an initial value, a non-zero value is stored in the communication register for storing the word of #1, and zero values are stored into scalar registers S0 and S1.
  • The below-mentioned commands should be interpreted:
       FDCR S0, CR#0 : after the value of the word #0 in the communication register is stored into the scalar register S0, the value of the word #0 in this communication register is decreased by 1.
       BL S0, loop 1 : when the value of the scalar register S0 exceeds the zero value, the process operation is branched to loop 1.
       SCR S1, CR#1 : the value of the scalar register S1 is stored into the word #1 of the communication register.
       B looped : the process operation jumps to looped without any condition.
       LCR, S2, CR#1 : the value of the word #1 in the communication register is stored into the scalar register S2.
       BNE S2, loop 1 : if the value of the scalar register S0 is equal to any values other than zero value, then the process operation is branched to loop 1.
  • When the respective processors enter into the barrier routine, the value of the word #0 in the communication register is first saved to the scalar register S0, and then is decremented. Since the number of processor has been stored as the initial value into the word #0 of the communication register, when all of the processors enter into this barrier routine, the value of the word #0 in the communication register becomes zero. Finally, the processors other than the processor which has entered into this barrier routine jump to loop 1, and wait in this loop until the final processor enters into the routine. It can be judged as to whether the processor corresponds to the final processor by checking the value of the word #0 in the communication register, which has been read by the FDCR command. If the checked processor corresponds to the final processor, then the zero value is written into the word #1 in the communication register, which will then be announced to other processors.
  • In the above-described conventional multi-processor system, only one request among a plurality of communication register access requests issued from a plurality of processors is accessible to the communication register unit at the same time. This may cause large overhead in the synchronization, mutual exclusion, and communication controls using the communication registers.
  • In this case, after the processors except for the final processor have executed the FDCR command, the final processor repeatedly executes the LCR command within loop 1 until this final processor causes the value of the word #1 to be zero value. This repeat execution is referred to "spin lock". Since the spin lock is performed by all of the processors which have entered into the routine, the access operations to the communication registers are concentrated, so that large access contention may occur. Because of this access contention, the FDCR command access which is executed by the processor that has entered into the barrier synchronization routine should be brought into the waiting condition. In the worst case, the waiting time may reach the time period defined by the quantity of processors which is under spin lock condition and waits for the barrier synchronization.
  • Referring now to a time chart shown in Fig. 10, when the above-described barrier synchronization is executed by four processors, each of these processor sequentially decrements the word #0, and thereafter each processor checks as to whether or not the operations of the other processors are completed. As a consequence, when the barrier synchronization is performed by these four processors, 8 cycles are required to accomplish the synchronization. In other words, (2xN) cycles are required for N processors. It should be noted that symbol "N" indicates an integer.
  • An object of the present invention has been made in an attempt to solve the above-described problems, and therefore realizes that the synchronization communication control via the communication register in the multi-processor system can be effected at high speeds.
  • Another object of the present invention is to avoid contention occurred among the processors while referring to the communication registers.
  • A multi-processor system, according to one preferred embodiment of the present invention, comprises N processors ("N" being an integer), a storage unit, a communication register unit, and an interconnection network for interconnecting said processors, said storage unit, and said communication register unit.
  • The communication register unit includes N communication register modules each for storing data having the same number of word, each of said communication register modules is so controlled as to store the respective words having the same contents with each other, and also is referred by only one specific processor.
  • BRIEF DESCRIPTION OF THE DRAWING
  • Various modes of multi-processor system according to the present invention will be readily appreciated with reference to the accompanying drawings, in which:
    • Fig. 1 schematically shows an overall arrangement of a multi-processor system according to the invention concept of the present invention;
    • Fig. 2 schematically represents an arrangement of a communication register unit according to a first embodiment of the present invention;
    • Fig. 3 is a schematic illustration for showing an arrangement of a communication register module employed in the communication register unit of Fig. 2;
    • Fig. 4 indicates a format of a request passing through an interconnection network employed in the multi-processor system of the present invention;
    • Fig. 5 is a time chart for explaining operations of the first embodiment and a second embodiment of the present invention;
    • Fig. 6 schematically represents an arrangement of a communication register unit according to the second embodiment of the present invention;
    • Fig. 7 schematically represents an arrangement of a communication register unit according to a third embodiment of the present invention;
    • Fig. 8 is a time chart for explaining operations of a third embodiment of the present invention;
    • Fig. 9 illustrates an example of the program used to realize the barrier synchronization; and
    • Fig. 10 is a time chart for explaining operations of the conventional multi-processor system.
    DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • A multi-processor system according to a preferred embodiment of the present invention will now be described in detail with reference to the drawings.
  • Referring now to Fig. 1, a multi-processor system according to an embodiment of the present invention includes n processors 100 for processing data, a storage unit 400 for storing the data, and a communication register unit 300 for synchronizing communications among the processors. These units are interconnecting each other via an interconnection network 200.
  • Each of these processors 100 own a single access port with respect to the interconnection network 200. The storage unit 400 owns a single access port as the overall unit.
  • Referring now to Fig. 2, the communication register unit 300 is subdivided into n communication register modules 320. An exclusively discriminable module number is attached to each of these communication register modules as identifier. In this drawing, this module number is indicated by #1, #2, ---, #n, respectively. Each of the communication register modules 320 owns a single access port with regard to the interconnection network.
  • Referring back to Fig. 1, the interconnection network 200 owns n ports in total with respect to the respective processors, n ports in total for the respective communication modules 320, and a single access port for the main storage unit. Access paths are provided among the access ports, and the access request is transported through the access paths. As an alternative arrangement, another multi-processor system is so arranged as to employ multiple access ports/access paths in order that the access throughput may be improved. For example, it may be arranged that n access paths are established between the main storage unit and the interconnection network.
  • When the processor 100 accesses either the storage unit 400 or the communication register unit 300, this processor 100 produces the request packet and sends it out via the access path to the interconnection network 200. The interconnection network 200 arbitrates the contention occurred in a plurality of request packets transferred from a plurality of processors 100, routes the respective request packets to their destinations, namely the storage unit 400 and the communication register unit 300, and sends out the request packets through the respective access paths thereto. The request packet arrived at either the storage unit 400, or the communication register unit 300 causes the read access or the write access in the respective units. In case of the read access, the read data is again returned via the interconnection network to the processor.
  • Referring to Fig. 4, a format of a request packet transferred through the interconnection network 200 is constructed of an access type field 801 for indicating whether the storage unit 400 or the communication register unit 300 is accessed, a code field 802 for denoting whether the load access or a store access is made, an address field 803 for showing either the address of the storage unit 400 or the address of the communication register 300, and also a data field 804 for the write data. In case of the load access, the read data is held in the data field and returned via the interconnection network 200 to the processor 100.
  • It should be noted that although this interconnection network 200 may be arranged by employing various network arrangements, such a network arrangement is desired that no blocking is produced when both of the request from the processor 100 to the communication register unit 300 and the request from another processor 100 to another communication register unit 300 are simultaneously reached at the access port of the interconnection network 200. For instance, the non-blocking type crossbar switch is one of the desirable arrangements.
  • As apparent from Fig. 3, each of the communication modules 320 within the communication register unit 300 includes a communication register memory 301 constructed of a plurality of words, a write register 302 for supplying the write data to the communication register memory 301, an address register 303 for supplying the address to the communication register memory 301, and a read register 304 for holding the data read out from communication register memory 301. This communication register module 320 further includes a write enable register 305 for enabling the communication register memory 301 to write the data, a read enable register 306 for enabling the communication register memory 301 to read the data, a request packet control circuit 311 for taking the request packet sent from the interconnection network 200 apart into a plurality of request packets which will then be distributed to the respective circuit units, a communication register control circuit 310 for controlling accessing operation to the communication register memory 301, and a reply packet control circuit 312 for producing a reply packet to the interconnection network 200.
  • The address numbers are allocated to the communication register memories 301 from the zero address in a serial form. In the access to the communication register issued from the processor 100, this communication register is address-designated to determine the word position of the communication register to be accessed.
  • The contents of the data stored into the respective words of the communication register memory 301 may be arbitrary determined. When the communication register is used for achieving synchronization, either all bits of the word, or some bits thereof may be used as a synchronizing flag. Alternatively, only the most significant bit (MSB) of the word may be used as the synchronizing flag, and the remaining bits thereof may be utilized as the storage data sent/received among the processors.
  • When the data is written into the communication register memory 301, the value of "1" is set to the write enable register 305, the address of the word to be written is set to the address register 303, and then the data to be written is set into the write register 302. At the next timing, the value of the write register 302 is written into the word of the communication register memory 301 designated by the address register 303.
  • When the data is read into the communication register memory 301, the value of 1 is set into the read enable register 306, and then the address of the word to be read is set into the address register 303. At the subsequent timing, the data is read out from the word of the communication register memory 301 designated by the address register 303, and thereafter held into the read register 304.
  • These registers provided around the communication register memory 301 are controlled by the communication register control circuit 310.
  • A request packet control circuit 311 controls the request packet arrived from the interconnection network 200. Upon receipt of the request packet from the interconnection network 200, the request packet control circuit 311 decodes a request code field 802, and also judges whether the load access or the store access is issued, The decoded result is sent to the communication register control circuit 310. A reply packet control circuit 312 causes the data held in the read register to be stored into a data field 804 of a packet, thereby constituting this data as a reply packet to the interconnection network 200.
  • Subsequently, a description will now be made of process operations carried out in the communication register module 320 during the access operation to the communication register.
  • In case of the store access, the data within the data field 804 is written into the word addressed by the address field 803 in the communication register memory 301. That is, the address of the communication register is entered into the address register 303 at the write timing. Also, the write data within the data field 804 is entered into the write register 302. At the same time, the content of the write enable register 305 is set to the value of "1", so that the write access is completed at the next timing.
  • In case of the load access, the data is read out from the word addressed by the address field 803 in the communication register memory 301. In other words, the communication register address in the address field 803 is entered into the address register 303, and at the same time, the content of the read enable register 306 is set to the value of "1" at the read timing. At the next timing, the read data is held in the read register 304. This data held by the read register 304 is stored into the data field 804 of the packet, and is constructed as the reply packet to the interconnection network. This reply packet is sent out to the interconnection network 200.
  • Referring back to Fig. 2, each of the communication register modules 320 employed in the communication register unit 300 is equally divided into n register module groups. Each of these equally divided register module groups will be referred to a "set." A single set is arranged by "m" words. In other words, each of these communication register modules 320 contains n sets of "m" words-sets. It should be noted that set numbers discriminable from each other are attached to the sets. In Fig. 2, this set number is indicated by %1, %2, ---, %n.
  • Also, it should be noted that the set number identical to the module number in the respective communication register modules 320 is called as a "real set," and the communication register in this real set is referred as a "real communication register." Other (n-1) sets are called as "copy sets," whereas the communication register within this set is referred to "a copy of communication register". The sets to which the same set numbers have been attached are controlled so as to store the same contents. For instance, in the communication register module #1, the set of %1 corresponds to the read set, and other sets correspond to the copy sets.
  • When the data is written into a certain communication register module 320, the data having the same content as that of the first-mentioned data is written into the corresponding word within the other communication register module in this cycle. In this embodiment, the interconnection network 200 controls this data writing operation. When the decrement process by the broadcast contends with the checking process in the respective communication register modules 320, the interconnection network 200 performs such a control operation that the decrement process by the broadcast is performed at a top priority.
  • Referring now to the time chart of Fig. 5, in accordance with this embodiment, when four sets of the processors execute the synchronization operation, the decrement is executed in the communication register module #1 at the first cycle, resulting that the writing operation by the broadcast is carried out in other communication register modules. At the second cycle, the decrement is executed in the communication register module #2, so that the writing operation by the broadcast is performed in other communication register modules. Subsequently, a similar process operation is carried out with respect to the communication register module #3 at the third cycle and the communication register module #4 at the fourth cycle. Then, the checking process is performed at the respective communication register modules at the fifth cycle. As a result, it is confirmed that all of that processors could be synchronized with each other. In other words, although the conventional multi-processor system requires the checking phases steps in proportion to the total number of the employed processors, the checking phase of the multi-processor system according to this embodiment could be completed within steps in proportion to one processor.
  • In the multi-processor system with the above-described arrangements according to this preferred embodiment, the communication register access is processed as follows: In case of the read access, the read access operation is carried out for such a communication register module having the same module number as that of the processor which has issued the read access. At this time, each of communication registers is accessed unrelated to whether the real set or the copy set.
  • In case of the write access request, after the write data is broadcasted by the interconnection network 200, the write data is written into the respective communication register modules 320 having the same addresses.
  • In case of the test & set command, the test process is performed for such a communication register module that there is the real communication register corresponding to the address of the communication register which performs the test process. As a result of the test process, if it fails to lock, then the test result is returned to the processor. Conversely, if it succeeds to lock, then the writing operation of the lock bit is carried out for the real communication register. Also, the writing process of the lock bit is performed via the interconnection network 200 with respect to all of the communication register copies having the same addresses.
  • As previously described, in accordance with the first embodiment of the present invention, the communication register module 320 dedicated to each of the processors is employed. When the writing operation is requested, the writing request is broadcasted via the interconnection network 200, so that the checking operation can be performed by the respective processors 100 at the same time, and the time required to carry out this checking operation can be shortened.
  • Next, a description will now be made of a multi-processor system according to a second preferred embodiment of the present invention. The multi-processor system of this second embodiment owns a similar arrangement to that of the first embodiment except for such a different internal arrangement of the communication register unit 300 as follows. That is, as represented in Fig. 6, the communication register unit 300 according to the second embodiment of the present invention includes n communications register modules 320, which is similar to those of the above-described first embodiment. However, this communication register unit 300 includes a network interface circuit 330 between these communication register modules 320 and the interconnection network 200, which is different from that of the first embodiment.
  • The network interface circuit 330 owns an interface mechanism between the interconnection network 200 and each of the communication register modules 320. During the read access operation, the network interface circuit 330 causes the request sent from the interconnection network 200 to pass through the same output port as the input port. During the write access operation, the network interface circuit 330 broadcasts the request to all the communication register modules 320. At this time, the request format is transmitted to the respective communication register modules 320 without any modification. During the test & set access operation, the network interface circuit 330 routes this request to such a communication register module that there exists the real communication register corresponding to the address of the communication register which executes the test. Furthermore, as a result of the test & set access operations, if it succeeds to lock, then the result is broadcasted to all of the communication register modules 320.
  • In response to the request derived from the network interface circuit 330, each of the communication register modules 320 accesses the word of the communication register indicated in the address field 803 to execute the process indicated in the request code field 802.
  • According to the second embodiment, when the decrement by the broadcasting operation and the checking process contend with each other in the respective communication register modules 320, the network interface circuit 330 performs the controls in such a manner that the broadcasting operation should be carried out prior to other access.
  • In the multi-processor system with the above-described arrangement, according to the second embodiment, the communication register access request is processed as follows: That is, in case of the read access request, the read access operation is carried out for such a communication register module having the same module number as that of the processor which has issued the access request. At this time, each of communication registers is accessed unrelated to whether the real set or the copy set.
  • In case of the write access request, after the communication module having the same module number as that of the processor which has issued the write access is broadcasted by the network interface circuit 330, the data is written into the respective communication register modules 320 having the same addresses.
  • In case of the test & set command, the test process is performed for such a communication register module that there is the real communication register corresponding to the address of the communication register which performs the test process. As a result of the test process, if it fails to lock, then the test result is returned to the processor. Conversely, if it succeeds to lock, then the writing operation of the lock bit is carried out for the real communication register. At the same time, the writing operation of the lock bit is performed via the network interface circuit 330 with respect to all the communication register copies having the same addresses.
  • As previously described, in accordance with the second embodiment of the present invention, the communication register module 320 dedicated to each of the processors is employed. When the writing operation is requested, the writing request is broadcasted via the network interface circuit 330, so that the checking operation can be performed by the respective processors 100 at the same time, and the time required to carry out this checking operation can be shortened.
  • Next, a description will now be made of a multi-processor system according to a third preferred embodiment of the present invention. The multi-processor system of this third embodiment owns a similar arrangement to that of other embodiments except for such a different internal arrangement of the communication register unit 300 as follows.
  • That is, as represented in Fig. 7, the communication register unit 300 according to the third embodiment of the present invention includes n communication register modules 320, which is similar to those of the above-described first and second embodiments. However, this communication register unit 300 includes a communication register network 340 coupling the communication register modules 320 with each other, which is different from the other embodiments.
  • In this third embodiment, the respective communication register modules 320 have two ports capable of simultaneously accepting two accesses at maximum.
  • When the data is written into a certain communication module 320, the data having the same contents are written into the corresponding words in other communication register modules at this data writing cycle. In this embodiment, the communication register network 340 controls this data writing operation. In the respective communication register modules 320, when the decrement by the broadcasting operation contends with the checking process, the communication register network 340 performs the controls in such a manner that the broadcasting operation should be carried out prior to other accesses.
  • Referring now to a time chart of Fig. 8, in accordance with the third embodiment of the present invention, one decrement and more than one checking are allowed in the same cycle. In other words, the access operation for checking process is performed for the communication register module #1 after the second cycle, in which the decrement has been completed at the first cycle. Thereafter, when the decrements arc carried out in all of the communication register modules, the synchronization of all the processors is completed. As a consequence, the checking operations are carried out at the same time in all of the communication register module at the fifth cycle, so that a confirmation can be made that the synchronization for all of the processors could be completed.
  • In the multi-processor system with the above-described arrangements according to this third preferred embodiment, the communication register access is processed as follows: In case of the read access request, the read access operation is carried out for such a communication register module having the same module number as that of the processor which has issued the read access. At this time, each of communication registers is accessed unrelated to whether the real set or the copy set.
  • In case of the write access request, after the write data is broadcasted by the communication register network 340, the write data is written into the respective communication register modules 320 in the same addresses.
  • In case of the test & set command, the test process is performed for such a communication register module that there is the real communication register corresponding to the address of the communication register which performs the test process. As a result of the test process, if it fails to lock, then the test result is returned to the processor. Conversely, if it succeeds to lock, then the writing operation of the lock bit is carried out for the real communication register. Also, the writing process of the lock bit is performed via the communication register network 340 with respect to all of the communication register copies in the same addresses.
  • As previously described, in accordance with the third embodiment of the present invention, the communication register module 320 dedicated to each of the processors is employed. When the writing operation is requested, the writing request is broadcasted via the communication register network 340, so that the checking operation can be performed by the respective processors 100 at the same time, and the time required to carry out this checking operation can be shortened.
  • As previously described in detail, in the multi-processor system according to the present invention, it is possible to reduce buffering among the processors when the communication registers are referred. As a consequence, the synchronization control, the mutual exclusion control, or the communication control executed through the communication registers can be performed fast.

Claims (6)

  1. A multi-processor system comprising N processors (100) ("N" being an integer), a storage unit (400), a communication register unit (300), and an interconnection network (200) for interconnecting said processors, said storage unit, and said communication register unit, characterized in that:
       said communication register unit (300) includes N communication register modules (320) each for storing data having the same number of word, each of said communication register modules is so controlled as to store the respective words having the same contents with each other, and also is referred by only one specific processor (100).
  2. A multi-processor system as claimed in claim 1, characterized in that:
       said interconnection network (200) broadcasts a write request issued from said processor to all of said N communication modules.
  3. A multi-processor system as claimed in claim 1, characterized in that:
       said communication register unit (300) further includes a network interface circuit (330), and said network interface circuit broadcasts a write request issued from said processor to all of N communication modules.
  4. A multi-processor system as claimed in claim 1, characterized in that:
       said communication register unit (300) further includes a communication register network (340), and when a write request is issued from said processor to a certain communication register module, said communication register network broadcasts said write request to all of other communication register modules.
  5. A multi-processor system as claimed in claim 1, characterized in that:
       each of said communication register modules (320) is subdivided into N sets constructed of M words ("M" being an integer);
       when the request issued from said processor corresponds to a read request, a read access is produced in the communication register module dedicated to said processor;
       when the request issued from said processor corresponds to a write request, said write request is broadcasted to all of said communication register modules and then the write accesses are produced in all of said communication register modules; and
       when the request issued from said processor corresponds to a test & set request, a test operation is carried out for such a communication register module containing a set where module numbers sequentially attached to said communication register modules from a first module number is coincident with set numbers sequentially attached to said sets from a first set number; when a result of said test operation becomes "lock fail," said test operation result is returned to said processor, whereas when a result of said test operation becomes "lock success," a write access of lock is performed in the same address of all the said communication register modules.
  6. A multi-processor system as claimed in claim 4, characterized in that:
       said communication register module (320) includes a plurality of ports which can be accessed at the same time, and allows such simultaneous accesses containing the read access from said processor (100) and the write access through said communication register network (340).
EP95101163A 1994-01-28 1995-01-27 High-speed synchronization communication control mechanism for multi-processor system. Withdrawn EP0665503A3 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP806594 1994-01-28
JP8065/94 1994-01-28
JP32263394 1994-12-26
JP322633/94 1994-12-26

Publications (2)

Publication Number Publication Date
EP0665503A2 true EP0665503A2 (en) 1995-08-02
EP0665503A3 EP0665503A3 (en) 1996-01-17

Family

ID=26342505

Family Applications (1)

Application Number Title Priority Date Filing Date
EP95101163A Withdrawn EP0665503A3 (en) 1994-01-28 1995-01-27 High-speed synchronization communication control mechanism for multi-processor system.

Country Status (3)

Country Link
US (1) US5659784A (en)
EP (1) EP0665503A3 (en)
CA (1) CA2141268C (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356568B2 (en) 2002-12-12 2008-04-08 International Business Machines Corporation Method, processing unit and data processing system for microprocessor communication in a multi-processor system
US7360067B2 (en) 2002-12-12 2008-04-15 International Business Machines Corporation Method and data processing system for microprocessor communication in a cluster-based multi-processor wireless network
US7359932B2 (en) * 2002-12-12 2008-04-15 International Business Machines Corporation Method and data processing system for microprocessor communication in a cluster-based multi-processor system
US7493417B2 (en) * 2002-12-12 2009-02-17 International Business Machines Corporation Method and data processing system for microprocessor communication using a processor interconnect in a multi-processor system

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5790888A (en) * 1996-08-12 1998-08-04 Seeq Technology, Inc. State machine for selectively performing an operation on a single or a plurality of registers depending upon the register address specified in a packet
US5898678A (en) 1996-09-25 1999-04-27 Seeq Technology, Inc. Method and apparatus for using synthetic preamable signals to awaken repeater
KR100259276B1 (en) 1997-01-27 2000-06-15 윤종용 Interconnection network having extendable bandwidth
US6516403B1 (en) * 1999-04-28 2003-02-04 Nec Corporation System for synchronizing use of critical sections by multiple processors using the corresponding flag bits in the communication registers and access control register
JP3667585B2 (en) * 2000-02-23 2005-07-06 エヌイーシーコンピュータテクノ株式会社 Distributed memory type parallel computer and its data transfer completion confirmation method
US7058823B2 (en) * 2001-02-28 2006-06-06 Advanced Micro Devices, Inc. Integrated circuit having programmable voltage level line drivers and method of operation
US6912611B2 (en) * 2001-04-30 2005-06-28 Advanced Micro Devices, Inc. Split transactional unidirectional bus architecture and method of operation
US6813673B2 (en) * 2001-04-30 2004-11-02 Advanced Micro Devices, Inc. Bus arbitrator supporting multiple isochronous streams in a split transactional unidirectional bus architecture and method of operation
US6785758B1 (en) 2001-06-01 2004-08-31 Advanced Micro Devices, Inc. System and method for machine specific register addressing in a split transactional unidirectional bus architecture
US7162573B2 (en) * 2003-06-25 2007-01-09 Intel Corporation Communication registers for processing elements
US20050204102A1 (en) * 2004-03-11 2005-09-15 Taylor Richard D. Register access protocol for multi processor systems
US7564226B2 (en) * 2005-07-01 2009-07-21 Apple Inc. Rapid supply voltage ramp using charged capacitor and switch
US8510738B2 (en) * 2009-08-20 2013-08-13 Microsoft Corporation Preventing unnecessary context switching by employing an indicator associated with a lock on a resource

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991020043A1 (en) * 1990-06-11 1991-12-26 Supercomputer Systems Limited Partnership Global registers for a multiprocessor system
EP0486167A2 (en) * 1990-11-13 1992-05-20 International Business Machines Corporation Multiple computer system with combiner/memory interconnection system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4412303A (en) * 1979-11-26 1983-10-25 Burroughs Corporation Array processor architecture
US4952930A (en) * 1988-11-18 1990-08-28 International Business Machines Corp. Multipath hierarchical network
US4980819A (en) * 1988-12-19 1990-12-25 Bull Hn Information Systems Inc. Mechanism for automatically updating multiple unit register file memories in successive cycles for a pipelined processing system
US5224215A (en) * 1990-07-13 1993-06-29 International Business Machines Corporation Message queue processing among cooperative processors having significant speed differences
US5265235A (en) * 1990-11-30 1993-11-23 Xerox Corporation Consistency protocols for shared memory multiprocessors
US5249283A (en) * 1990-12-24 1993-09-28 Ncr Corporation Cache coherency method and apparatus for a multiple path interconnection network
JP3309425B2 (en) * 1992-05-22 2002-07-29 松下電器産業株式会社 Cache control unit
US5522029A (en) * 1993-04-23 1996-05-28 International Business Machines Corporation Fault tolerant rendezvous and semaphore for multiple parallel processors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991020043A1 (en) * 1990-06-11 1991-12-26 Supercomputer Systems Limited Partnership Global registers for a multiprocessor system
EP0486167A2 (en) * 1990-11-13 1992-05-20 International Business Machines Corporation Multiple computer system with combiner/memory interconnection system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7356568B2 (en) 2002-12-12 2008-04-08 International Business Machines Corporation Method, processing unit and data processing system for microprocessor communication in a multi-processor system
US7360067B2 (en) 2002-12-12 2008-04-15 International Business Machines Corporation Method and data processing system for microprocessor communication in a cluster-based multi-processor wireless network
US7359932B2 (en) * 2002-12-12 2008-04-15 International Business Machines Corporation Method and data processing system for microprocessor communication in a cluster-based multi-processor system
US7493417B2 (en) * 2002-12-12 2009-02-17 International Business Machines Corporation Method and data processing system for microprocessor communication using a processor interconnect in a multi-processor system
US7698373B2 (en) 2002-12-12 2010-04-13 International Business Machines Corporation Method, processing unit and data processing system for microprocessor communication in a multi-processor system
US7734877B2 (en) 2002-12-12 2010-06-08 International Business Machines Corporation Method and data processing system for processor-to-processor communication in a clustered multi-processor system
US7818364B2 (en) 2002-12-12 2010-10-19 International Business Machines Corporation Method and data processing system for microprocessor communication in a cluster-based multi-processor system

Also Published As

Publication number Publication date
US5659784A (en) 1997-08-19
CA2141268C (en) 1999-09-21
EP0665503A3 (en) 1996-01-17
CA2141268A1 (en) 1995-07-29

Similar Documents

Publication Publication Date Title
US5659784A (en) Multi-processor system having communication register modules using test-and-set request operation for synchronizing communications
US4636942A (en) Computer vector multiprocessing control
KR900006791B1 (en) Packet switched multiport memory nxm switch node and processing method
JP2575557B2 (en) Super computer system
US4799199A (en) Bus master having burst transfer mode
US5253346A (en) Method and apparatus for data transfer between processor elements
US7434016B2 (en) Memory fence with background lock release
JPH0581216A (en) Parallel processor
CA1218754A (en) Computer vector multi-processing control
US5495619A (en) Apparatus providing addressable storage locations as virtual links and storing predefined destination information for any messages transmitted on virtual links at these locations
EP0184791A1 (en) Information processing device capable of rapidly processing instructions of different groups
US5857111A (en) Return address adding mechanism for use in parallel processing system
JP2644185B2 (en) Data processing device
US6571301B1 (en) Multi processor system and FIFO circuit
JP2731738B2 (en) Multiprocessor system
CA1329656C (en) Method for controlling a vector register and a vector processor applying thereof
CA1228675A (en) Computer vector multi-processing control
JP3982077B2 (en) Multiprocessor system
WO1991002310A1 (en) Non-busy-waiting resource control
JPH07271654A (en) Controller
JP2878160B2 (en) Competitive mediation device
CA1311307C (en) Method and circuit for implementing an arbitrary graph on a polymorphic mesh
JPH09231188A (en) Multicluster information processing system
JPS63155352A (en) Storage control system
JPS626374A (en) Vector processor

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): CH DE FR LI NL

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): CH DE FR LI NL

17P Request for examination filed

Effective date: 19951206

17Q First examination report despatched

Effective date: 19980617

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20070201