US20070136559A1 - METHOD AND SYSTEM OF COMMUNICATING BETWEEN PEER PROCESSORS IN SoC ENVIRONMENT - Google Patents

METHOD AND SYSTEM OF COMMUNICATING BETWEEN PEER PROCESSORS IN SoC ENVIRONMENT Download PDF

Info

Publication number
US20070136559A1
US20070136559A1 US11/275,091 US27509105A US2007136559A1 US 20070136559 A1 US20070136559 A1 US 20070136559A1 US 27509105 A US27509105 A US 27509105A US 2007136559 A1 US2007136559 A1 US 2007136559A1
Authority
US
United States
Prior art keywords
pulse generator
data
processors
processor
interrupt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/275,091
Other versions
US9367493B2 (en
Inventor
Robert Devins
David Milton
Pascal Nsame
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GlobalFoundries Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/275,091 priority Critical patent/US9367493B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DEVINS, ROBERT J., MILTON, DAVID W., NSAME, PASCAL A.
Priority to PCT/EP2006/068523 priority patent/WO2007065777A1/en
Publication of US20070136559A1 publication Critical patent/US20070136559A1/en
Assigned to GLOBALFOUNDRIES U.S. 2 LLC reassignment GLOBALFOUNDRIES U.S. 2 LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Assigned to GLOBALFOUNDRIES INC. reassignment GLOBALFOUNDRIES INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GLOBALFOUNDRIES U.S. 2 LLC, GLOBALFOUNDRIES U.S. INC.
Publication of US9367493B2 publication Critical patent/US9367493B2/en
Application granted granted Critical
Assigned to WILMINGTON TRUST, NATIONAL ASSOCIATION reassignment WILMINGTON TRUST, NATIONAL ASSOCIATION SECURITY AGREEMENT Assignors: GLOBALFOUNDRIES INC.
Assigned to GLOBALFOUNDRIES INC. reassignment GLOBALFOUNDRIES INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WILMINGTON TRUST, NATIONAL ASSOCIATION
Assigned to GLOBALFOUNDRIES U.S. INC. reassignment GLOBALFOUNDRIES U.S. INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WILMINGTON TRUST, NATIONAL ASSOCIATION
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/24Handling requests for interconnection or transfer for access to input/output bus using interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/17Interprocessor communication using an input/output type connection, e.g. channel, I/O port

Definitions

  • the invention relates to a method and system of communicating between processors, and more particularly, to a method and system of communication between multiple processors in a SoC test and verification environment.
  • IC chips have advanced significantly in both complexity and sophistication.
  • a chip might embody relatively simple electronic logic blocks effected by interconnections between logic gates; whereas, newer generation chips include designs having combinations of complex, modularized IC designs often called “cores”, which together constitute an entire SoC.
  • cores complex, modularized IC designs often called “cores”, which together constitute an entire SoC.
  • These newer generation IC designs increase the overall functionality and performance characteristics of the chip, itself, by, for example, having the ability to include smaller feature sizes and thus increasing the amount of circuitry which can be built on a single chip. But, this comes at a cost: longer design and verification times which, in turn, translate into added development and manufacturing costs.
  • SoC systems use tightly coupled software programs and processes running in independent peer processors. These independent execution units must be able to communicate with one another in a timely manner.
  • communication is through mailbox/semaphore mechanisms to implement inter-process communication (IPC) protocols.
  • IPC inter-process communication
  • processors 1 though n communicate with each other through an on-chip chip bus arbiter, via a UIC (universal interrupt controller).
  • the system of FIG. 1 additionally includes a single or multiple port memory controller and network controller in communication with the on-chip bus arbiter.
  • processor 1 transfers data to processor “n” by first requesting authorization from the arbiter. Once this is granted, data is written into memory.
  • Processor 2 polls the system and requests authorization from the arbiter to read the data from the memory. Once authorization is granted, processor “n” uploads the data for read operations. This same process would also occur for non-interrupt network controllers. This, of course, can take many hundreds of cycles to perform, taking into account the arbiters role of prioritizing data transfer between many devices.
  • a method comprises transferring data from a first processor to at least one pulse generator directly connected to an interrupt control of at least a second processor.
  • the transferring of the data bypasses memory.
  • the method further includes reading the transferred data directly from the at least one pulse generator by the at least a second processor.
  • the method includes transferring of the data and reading of the transferred data is in real-time in a multiprocessor SoC design.
  • the transferring step is provided in a single clock cycle.
  • the reading step is provided in a single clock cycle.
  • the method further includes, in embodiments, providing write access directly to the at least one pulse generator.
  • the transferring data includes sending interrupts to the first processor or at least a second processor through the at least one pulse generator.
  • the method further includes, in embodiments, obtaining authorization from an arbiter to begin the transferring of data.
  • the reading of the transferred data is directly from the at least one pulse generator.
  • the transferring of the data and reading of the transferred data is at least (i) processor to processor data traffic, (ii) processor to enabled interrupt device data traffic and (iii) processor to non-enable interrupt devices data.
  • the method further includes partitioning the at least one pulse generator such that each partition is dedicated to at least one of separate functions and separate processors. The method can be used to fabricate an integrated circuit chip and distributing of the integrated circuit chip.
  • the method includes obtaining application and ordering requirements and selecting at least one channel, an arbitration algorithm and an interrupt type. Upon completion of the obtaining and selecting step, the method includes sending a complex message over a bus to a pulse generator and reading, by a processor, the complex message directly from the pulse generator.
  • the obtaining the application requirement is at least one of an application code, data transmission rates, and amount of time and data type to transmit.
  • the selecting the channel includes selecting all channels if there is a broadcast message or one or more channels if there is no broadcast message. The selection of the one or more channels is based on at least a partitioning of the pulse generator.
  • the selecting of the arbitration algorithm provides priority to the complex message.
  • the selecting of the interrupt type is one of a fast interrupt type, a normal interrupt type or a non-maskable interrupt type.
  • the method further includes the pulse generator:
  • the system includes at least two processors connected to a bus system and at least one pulse generator connected to the bus system and each of the at least two processors.
  • the at least one pulse generator is a write-only device receiving data from the at least two processors which has bypassed memory, and the at least two processors read data directly from the at least one pulse generator, bypassing the memory.
  • the at least one pulse generator is equal to an amount of the at least two processors.
  • the at least one pulse generator is equal to or less than the number of the at least two process and equal to or greater than 1.
  • the bus system is an on-chip bus arbiter or an on-chip crossbar/switch.
  • the at least one pulse generator is connected directly to an interrupt controller of each of the at least two processors.
  • at least one interrupt enabled device and one non-interrupt enabled device write data directly to the at least one pulse generator.
  • the at least one pulse generator is partitioned for at least one of each of the at least two processors, functions and a combination thereof.
  • the at least one pulse generator is a single pulse generator connected to the at least two processors.
  • the system includes peer processors connected to a bus system. At least one pulse generator receives data over the bus system and is connected to an interrupt control of the peer processors such that data from one of the peer processors is read directly from the at least one pulse generator by the one of the peer processors or another of the peer processors.
  • the at least one pulse generator is a write-only device receiving data from the one of the peer processors.
  • the at least one pulse generator bypasses memory such that the one or the another of the peer processors read data directly from the at least one pulse generator, bypassing the memory.
  • the at least one pulse generator is equal to or less than the number of the peer processors and equal to or greater than 1, and the at least one pulse generator is partitioned such that the partition is dedicated to one or more of the peer processors, functions or a combination thereof.
  • the bus system can be an on-chip bus arbiter or an on-chip crossbar/switch.
  • a computer program product comprises a computer useable medium including a computer readable program.
  • the computer readable program when executed on a computer causes the computer to provide a signal in one clock cycle to a pulse generator.
  • the signal has data associated therewith.
  • a processor reads the data directly from the pulse generator in one clock cycle and bypasses memory.
  • FIG. 1 is a block diagram describing current state of the art
  • FIG. 2 shows an environment which implements aspects of the invention
  • FIG. 3 shows a block diagram of an embodiment implementing the system and method of the invention
  • FIG. 4 shows a block diagram of an embodiment implementing the system and method of the invention
  • FIG. 5 shows a block diagram of an embodiment implementing the system and method of the invention
  • FIG. 6 shows a block diagram of an embodiment implementing the system and method of the invention
  • FIG. 7 is a flow diagram implementing steps of the invention.
  • FIG. 8 is a flow diagram implementing step of the invention.
  • the invention relates to a method and system of communicating or transferring data between multiple (peer) processors in a system-on-chip (SoC) design verification environment.
  • the system and method uses a hardware device that sits as a slave on a bus system (e.g., on-chip bus, on-chip crossbar/switch, et.) which includes the peer processors as masters, enabling each processor write access to this hardware device.
  • the hardware device is a write-only device which maps bits in a word to output ports, and which minimizes the amount of time to communicate data to the processors and any shared resources.
  • the system and method provides a structure for interrupt-based IPC signaling between peer processors in a real-time, multiprocessor SoC design. For instance, if a bus system is a 32 bit bus, the writeable entity on hardware device will be any number of 32 bit words. Each bit in each word has a built-in pulse generator such that if a “1” is written to this bit, a pulse is driven out on its corresponding port. If there are two 32 bit words, there would be a total of 64 pulses which can be generated. These pulses can be connected as interrupts to the processors.
  • this enables any processor to send any combination of interrupts to any processor(s), including itself.
  • the bus system will guarantee ordering of the write operations to the hardware device, and the hardware device, itself, will complete the first set of pulses(s) commanded by the first write before accepting the second write operation, so that it can generate the second set of pulse(s) independent of the first set.
  • a processor may receive two pulses relatively close together and the processor's interrupt controller will maintain a record.
  • the pulse duration is one clock cycle of the clock driving hardware device, and the minimum width between two pulses is also one clock cycle of the hardware device's clock.
  • FIG. 2 shows a block diagram implementing the features of the invention.
  • the one or workstations 100 include a memory, one or more peer processors 200 , 300 and other well-known components. It should be understood by those of skill in the art that any number of processors are contemplated by the invention, and hence the designation “n” is provided with reference to processor 300 .
  • the one or more workstations 100 include a shared bus (or switch) 400 to support inter-TOS protocols. (AutoTOSTM (or ADF (Application Definition File) (where software resources are specified)) may be used to compile user specified parameters (e.g., resources) in order to generate a specific test to verify.)
  • AutoTOSTM or ADF (Application Definition File) (where software resources are specified)
  • ADF Application Definition File
  • the one or more workstations 100 additionally include one or more pulse generator (PGEN) 500 .
  • the PGEN 500 is arranged to connect to the interrupt inputs of each processor in the SoC.
  • PGEN pulse generator
  • the processor 200 can generate a request to the bus 400 to transfer data to processor 300 . Once the request is granted, the processor 200 will transfer data directly to the PGEN 500 ; that is, the data will be written directly into the PGEN 500 .
  • the processor 300 can now read the data directly from the PGEN 500 (i.e., directly connected to the interrupt control of the processor(s), bypassing memory). In this manner, data can be transferred between processors in real-time and the data transfer, in preferred implementations, will also be deterministic. This example is applicable between at least (i) processor to processor data transfer, (ii) processor to enabled interrupt device data transfer and (iii) processor to non-enable interrupt devices data transfer.
  • FIG. 3 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention.
  • processors 1 through n e.g., 200 , 300
  • the on-chip bus arbiter 400 in embodiments via a 32 bit, 64 bit, 128 bit, etc. channel.
  • a separate PGEN is in data communication with the each of the processors 1 . . . n.
  • PGEN 500 a is in data communication with the processor 200 ; whereas, the PGEN 500 b is in data communication with processor 300 .
  • a memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400 .
  • processor 1 , 200 is desirous of transferring data to processor “n”, 300 .
  • processor 1 , 200 requests authorization from the on-chip bus arbiter 400 to transfer data to processor “n”, 300 .
  • the processor 1 , 200 will write data directly to the PGEN 500 b , eliminating the need for a shared memory or processor “n”, 300 , having to poll the system and request its own authorization to read data from a shared memory.
  • Processor “n”, 300 reads the data directly from the PGEN 500 b .
  • This same example can be implemented for any number of processors, each having their own PGEN.
  • FIG. 4 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention.
  • processors 1 through “n”, e.g., 200 , 300 are in data communication with the on-chip bus arbiter 400 , in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel.
  • a single PGEN 500 is directly connected to the interrupt control of the processors 1 . . . “n”.
  • PGEN 500 is in data communication with the processor 200 and processor 300 .
  • FIG. 4 represents any number of processors, n, in data communication with the on-chip bus arbiter 400 , and a single PGEN is directly connected to the interrupt control of each of the processors.
  • a memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400 .
  • processor 1 , 200 is desirous of transferring data to processor “n”, 300 .
  • processor 1 , 200 requests authorization from the on-chip bus arbiter 400 to transfer data to processor “n”, 300 .
  • the processor 1 , 200 will write data directly to the PGEN 500 , eliminating the need for a shared memory and processor “n”, 300 , having to poll the system and request its own authorization to read data from a shared memory.
  • Processor “n”, 300 reads the data directly from the PGEN 500 .
  • This same example can be implemented for any number of processors, each sharing the PGEN 500 .
  • FIG. 5 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention.
  • processors 1 through “n”, e.g., 200 , 300 are in data communication with an on-chip crossbar/switch 800 , in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel.
  • the on-chip crossbar/switch 450 provides a non-blocking communication, e.g., allows more than one processor to transfer data at one time.
  • a separate PGEN is directly connected to the interrupt control of each of the processors 1 . . . “n”.
  • PGEN 500 a is in data communication with the processor 200 ; whereas, the PGEN 500 b is in data communication with processor 300 .
  • FIG. 5 represents any number of processors “n” in data communication with the on-chip crossbar/switch 800 , and an equal number of PGENs each of which are directly connected to the interrupt control of the respective processor.
  • a memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400 .
  • processor 1 , 200 is desirous of transferring data to processor “n”, 300 .
  • processor 1 , 200 requests authorization from the on-chip crossbar/switch 450 to transfer data to processor “n”, 300 .
  • the processor 1 , 200 will write data directly to the PGEN 500 b , eliminating the need for a shared memory and processor “n”, 300 , having to poll the system and request its own authorization to read data from a shared memory.
  • Processor “n”, 300 reads the data directly from the PGEN 500 b .
  • This same example can be implemented for any number of processors, each having their own PGEN.
  • FIG. 6 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention.
  • processors 1 through “n”, e.g., 200 , 300 are in data communication with the on-chip crossbar/switch 800 , in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel.
  • a single PGEN 500 is in data communication with all of the processors 1 . . . “n”, i.e., directly connected to the interrupt control of the processor(s).
  • PGEN 500 is in data communication with the processor 200 and processor 300 .
  • FIG. 6 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention.
  • processors 1 through “n” e.g., 200 , 300
  • the on-chip crossbar/switch 800 in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel.
  • a single PGEN 500 is in data communication with all of the processors 1
  • processor 6 represents any number of processors “n” in data communication with the on-chip crossbar/switch 800 , and a single PGEN in direct data communication with each of the respective processors.
  • a memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400 .
  • processor 1 , 200 is desirous of transferring data to processor “n”, 300 .
  • processor 1 , 200 requests authorization from the on-chip crossbar/switch 450 to transfer data to processor “n”, 300 .
  • the processor 1 , 200 will write data directly to the PGEN 500 , eliminating the need for a shared memory and processor “n”, 300 , having to poll the system and request its own authorization to read data from a shared memory.
  • Processor “n”, 300 reads the data directly from the PGEN 500 .
  • This same example can be implemented for any number of processors, each sharing the PGEN 500 .
  • the number of PGENs may be optimally matched to the system requirements.
  • the PGEN 500 may be a 32 bit, 64 bit or 128 bit channel; although, other bit channels are also contemplated by the invention.
  • the PGEN 500 may be desirous to have a 64 bit data channel; although a 32 bit data channel (or less) is also possible, with the understanding that two or more cycles will be required for data transfer.
  • processors associated with the system So, for example, if there are one hundred processors, it may be advantageous to have more than one PGEN with one hundred connections. Instead, as an example, there may be four PGENs connected to the system, each with 25 connections, in addition to a 32 bit (64 bit, etc. channel to the bus.
  • each PGEN is associated with a single or multiple processors or function(s).
  • a 32 bit PGEN can be partitioned into four partitions of 8 bits each, with each partition being responsible for a single processor and/or function.
  • data can be written into a single partition of a single PGEN, which can be dedicated to a single processor (or multiple processors).
  • the partitions can be repartitioned across any processor or within one processor such that, in one example, a single partition (8 bit channel) can be dedicated to all processors, with the remaining channels dedicated to variations of different processors and/or functions.
  • the number of PGEN is less than or equal to a number of processors and equal to or greater than 1.
  • FIG. 7 is a flow diagram implementing steps of the invention.
  • FIG. 7 (and any other flow diagrams) may equally represent a high-level block diagram of the system, implementing the steps thereof.
  • the steps of FIG. 7 (and FIG. 8 ) may be implemented on computer program code in combination with the appropriate hardware.
  • This computer program code may be stored on storage media such as a diskette, hard disk, CD-ROM, DVD-ROM or tape, as well as a memory storage device or collection of memory storage devices such as read-only memory (ROM) or random access memory (RAM).
  • ROM read-only memory
  • RAM random access memory
  • the sending processor obtains application requirements. These application requirements may be, for example, application code, data transmission rates, amount of time and data type to transmit.
  • the sending processor selects at least one channel to transfer data. For example, the sending processor may select all channels if there is a broadcast message; however, if there is not a broadcast message, the sending processor may select one or more channels. The selection of one or more channels may depend on such variables as the partitioning of the PGEN, the number of PGEN on the system, etc., all readily implemented by those of skill in the art.
  • the sending processor obtains ordering requirements. By way of example, if processor “A” would like to write data in a certain order (e.g., 1, 2, 3, 4 . . . ), the PGEN must comply with this ordering during the data transfer cycle.
  • the sending processor will select the arbitration algorithm.
  • This arbitration algorithm may provide priority to certain complex messages, e.g., priority to data transfer over the on-chip bus or on-chip crossbar/switch. This may be especially relevant when using the on-chip crossbar/switch since multiple communications can occur at the same time.
  • an interrupt type is selected. The interrupt type may be, for example, a fast interrupt type, a normal interrupt type or a non-maskable interrupt type, all known to those of skill in the art.
  • the complex message is sent over the bus (or switch) to the PGEN.
  • the receipt of the complex message is acknowledged. The step(s) described herein ensures that data is read in a priority order and that data from multiple processors are read in a deterministic manner.
  • FIG. 8 is a flow diagram implementing steps of the invention.
  • the PGEN decodes the message.
  • the PGEN receives the complex message.
  • the PGEN decodes the interrupt type, e.g., fast interrupt type, normal interrupt type or a non-maskable interrupt type.
  • the PGEN applies the arbitration algorithm such that it can determine the priority given to the complex message.
  • the PGEN applies the register ordering requirement.
  • the PGEN registers the channel and, at step 830 , the PGEN applies the ordering requirements. In this way, the PGEN can provide the information directly to the receiving processor in a reliable manner.
  • the step(s) described herein ensures that data is read in a priority order and that data from multiple processors are read in a deterministic manner.
  • the system and method of the invention provides a flexible hardware pulse generator, arranged to connect to the interrupt inputs of each processor in the SoC.
  • the system of the invention provides the ability to issue complex, real-time messages between the processors in a multiple processor SoC design.
  • the system and method of the invention further provides global access from each processor to any of the interrupt pulse controls, and allows for broadcast, sub-broadcast and individual shoulder taps, with automatic interlocking/deterministic mechanism. That is, the system and method of the invention ensures that data is read in a priority order and that data from multiple processors are read in a deterministic manner.
  • the method as described herein is used in the fabrication of integrated circuit chips.
  • the resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare die, or in a packaged form.
  • the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multichip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections).
  • the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product.
  • the end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard or other input device, and a central processor.
  • the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
  • the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

Abstract

A method and system comprises transferring data from a first processor to at least one pulse generator directly connected to an interrupt control of at least a second processor. The transferring of the data bypasses memory. The method further includes reading the transferred data directly from the at least one pulse generator by the at least a second processor.

Description

    FIELD OF THE INVENTION
  • The invention relates to a method and system of communicating between processors, and more particularly, to a method and system of communication between multiple processors in a SoC test and verification environment.
  • BACKGROUND DESCRIPTION
  • Present-day integrated circuit (IC) chips have advanced significantly in both complexity and sophistication. For example, in early generation chip designs, a chip might embody relatively simple electronic logic blocks effected by interconnections between logic gates; whereas, newer generation chips include designs having combinations of complex, modularized IC designs often called “cores”, which together constitute an entire SoC. These newer generation IC designs increase the overall functionality and performance characteristics of the chip, itself, by, for example, having the ability to include smaller feature sizes and thus increasing the amount of circuitry which can be built on a single chip. But, this comes at a cost: longer design and verification times which, in turn, translate into added development and manufacturing costs.
  • The verification phase of chip design has moved toward a software simulation approach to avoid the costs of implementing designs in hardware to verify the workability of such designs. However, multiprocessor and multicore designs can lead to very large simulation models. Even when using modern simulation tools, simulation load and execution time, as well as build time can become cost and time prohibitive. This is especially true in complex design cases with inter-processor clusters since a complete gate level representation of the design must be constructed and loaded into the simulation for each processor.
  • As the chip design becomes more complex, the verification tends to require an even more inordinate amount of time and computing resources, largely due to the modeling and verification of the interaction of functions associated with the design. This verification process becomes more complicated for verification of multi-processor cores, which interact with one another. These inefficiencies in current verification methodologies exacerbate time pressures and increase, significantly, the time-to-market, a key factor for developers and marketers of IC chips in being competitive in business.
  • To effectuate the growing trend towards SoC implementations of IC using multiprocessor platforms, SoC systems use tightly coupled software programs and processes running in independent peer processors. These independent execution units must be able to communicate with one another in a timely manner. However, in currently known implementations, communication is through mailbox/semaphore mechanisms to implement inter-process communication (IPC) protocols. Such mechanisms tend to be non-deterministic with respect to message delivery time, and are often not sufficient for real-time SoC functionality.
  • By way of example, and referring to FIG. 1, processors 1 though n communicate with each other through an on-chip chip bus arbiter, via a UIC (universal interrupt controller). The system of FIG. 1 additionally includes a single or multiple port memory controller and network controller in communication with the on-chip bus arbiter. In implementation, hundreds of cycles may pass before there is full data transfer between the processors (or other logic), thus impairing real-time communications. In the example of FIG. 1, processor 1 transfers data to processor “n” by first requesting authorization from the arbiter. Once this is granted, data is written into memory. Processor 2 polls the system and requests authorization from the arbiter to read the data from the memory. Once authorization is granted, processor “n” uploads the data for read operations. This same process would also occur for non-interrupt network controllers. This, of course, can take many hundreds of cycles to perform, taking into account the arbiters role of prioritizing data transfer between many devices.
  • SUMMARY OF THE INVENTION
  • In a first aspect of the invention, a method comprises transferring data from a first processor to at least one pulse generator directly connected to an interrupt control of at least a second processor. The transferring of the data bypasses memory. The method further includes reading the transferred data directly from the at least one pulse generator by the at least a second processor.
  • In further embodiments, the method includes transferring of the data and reading of the transferred data is in real-time in a multiprocessor SoC design. The transferring step is provided in a single clock cycle. The reading step is provided in a single clock cycle. The method further includes, in embodiments, providing write access directly to the at least one pulse generator. The transferring data includes sending interrupts to the first processor or at least a second processor through the at least one pulse generator. The method further includes, in embodiments, obtaining authorization from an arbiter to begin the transferring of data. The reading of the transferred data is directly from the at least one pulse generator. The transferring of the data and reading of the transferred data is at least (i) processor to processor data traffic, (ii) processor to enabled interrupt device data traffic and (iii) processor to non-enable interrupt devices data. In further embodiments, the method further includes partitioning the at least one pulse generator such that each partition is dedicated to at least one of separate functions and separate processors. The method can be used to fabricate an integrated circuit chip and distributing of the integrated circuit chip.
  • In another aspect of the invention, the method includes obtaining application and ordering requirements and selecting at least one channel, an arbitration algorithm and an interrupt type. Upon completion of the obtaining and selecting step, the method includes sending a complex message over a bus to a pulse generator and reading, by a processor, the complex message directly from the pulse generator.
  • In further embodiments, the obtaining the application requirement is at least one of an application code, data transmission rates, and amount of time and data type to transmit. The selecting the channel includes selecting all channels if there is a broadcast message or one or more channels if there is no broadcast message. The selection of the one or more channels is based on at least a partitioning of the pulse generator. The selecting of the arbitration algorithm provides priority to the complex message. The selecting of the interrupt type is one of a fast interrupt type, a normal interrupt type or a non-maskable interrupt type.
  • The method further includes the pulse generator:
      • decoding the complex message and the interrupt type;
      • applying the arbitration algorithm and register ordering requirement;
      • registering the at least one channel; and
      • upon completion of the decoding, applying and registering steps, applying the ordering requirement.
        The reading step is provided after the above steps performed by the pulse generator. The reading of the transferred data is in real time.
  • In another aspect of the invention, the system includes at least two processors connected to a bus system and at least one pulse generator connected to the bus system and each of the at least two processors. The at least one pulse generator is a write-only device receiving data from the at least two processors which has bypassed memory, and the at least two processors read data directly from the at least one pulse generator, bypassing the memory.
  • In further embodiments of the system the at least one pulse generator is equal to an amount of the at least two processors. The at least one pulse generator is equal to or less than the number of the at least two process and equal to or greater than 1. The bus system is an on-chip bus arbiter or an on-chip crossbar/switch. The at least one pulse generator is connected directly to an interrupt controller of each of the at least two processors. In further embodiments, at least one interrupt enabled device and one non-interrupt enabled device write data directly to the at least one pulse generator. The at least one pulse generator is partitioned for at least one of each of the at least two processors, functions and a combination thereof. The at least one pulse generator is a single pulse generator connected to the at least two processors.
  • In yet another aspect of the invention, the system includes peer processors connected to a bus system. At least one pulse generator receives data over the bus system and is connected to an interrupt control of the peer processors such that data from one of the peer processors is read directly from the at least one pulse generator by the one of the peer processors or another of the peer processors.
  • In embodiments, the at least one pulse generator is a write-only device receiving data from the one of the peer processors. The at least one pulse generator bypasses memory such that the one or the another of the peer processors read data directly from the at least one pulse generator, bypassing the memory. The at least one pulse generator is equal to or less than the number of the peer processors and equal to or greater than 1, and the at least one pulse generator is partitioned such that the partition is dedicated to one or more of the peer processors, functions or a combination thereof. The bus system can be an on-chip bus arbiter or an on-chip crossbar/switch.
  • In a further aspect of the invention, a computer program product comprises a computer useable medium including a computer readable program. The computer readable program when executed on a computer causes the computer to provide a signal in one clock cycle to a pulse generator. The signal has data associated therewith. A processor reads the data directly from the pulse generator in one clock cycle and bypasses memory.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram describing current state of the art;
  • FIG. 2 shows an environment which implements aspects of the invention;
  • FIG. 3 shows a block diagram of an embodiment implementing the system and method of the invention;
  • FIG. 4 shows a block diagram of an embodiment implementing the system and method of the invention;
  • FIG. 5 shows a block diagram of an embodiment implementing the system and method of the invention;
  • FIG. 6 shows a block diagram of an embodiment implementing the system and method of the invention;
  • FIG. 7 is a flow diagram implementing steps of the invention; and
  • FIG. 8 is a flow diagram implementing step of the invention.
  • DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
  • The invention relates to a method and system of communicating or transferring data between multiple (peer) processors in a system-on-chip (SoC) design verification environment. In an embodiment of the invention, the system and method uses a hardware device that sits as a slave on a bus system (e.g., on-chip bus, on-chip crossbar/switch, et.) which includes the peer processors as masters, enabling each processor write access to this hardware device. The hardware device is a write-only device which maps bits in a word to output ports, and which minimizes the amount of time to communicate data to the processors and any shared resources. Thus, in one implementation, the system and method provides a structure for interrupt-based IPC signaling between peer processors in a real-time, multiprocessor SoC design. For instance, if a bus system is a 32 bit bus, the writeable entity on hardware device will be any number of 32 bit words. Each bit in each word has a built-in pulse generator such that if a “1” is written to this bit, a pulse is driven out on its corresponding port. If there are two 32 bit words, there would be a total of 64 pulses which can be generated. These pulses can be connected as interrupts to the processors.
  • In implementation, this enables any processor to send any combination of interrupts to any processor(s), including itself. There is sufficient interlocking inside the hardware device such that if two processors attempt to send interrupts at the same time, the result is deterministic; the bus system will guarantee ordering of the write operations to the hardware device, and the hardware device, itself, will complete the first set of pulses(s) commanded by the first write before accepting the second write operation, so that it can generate the second set of pulse(s) independent of the first set. For example, in such case, a processor may receive two pulses relatively close together and the processor's interrupt controller will maintain a record. The pulse duration is one clock cycle of the clock driving hardware device, and the minimum width between two pulses is also one clock cycle of the hardware device's clock.
  • FIG. 2 shows a block diagram implementing the features of the invention. In particular, FIG. 2 shows one or more workstations denoted as reference numeral 100. The one or workstations 100 include a memory, one or more peer processors 200, 300 and other well-known components. It should be understood by those of skill in the art that any number of processors are contemplated by the invention, and hence the designation “n” is provided with reference to processor 300. In one implementation of the invention, the one or more workstations 100 include a shared bus (or switch) 400 to support inter-TOS protocols. (AutoTOS™ (or ADF (Application Definition File) (where software resources are specified)) may be used to compile user specified parameters (e.g., resources) in order to generate a specific test to verify.)
  • Still referring to FIG. 2, the one or more workstations 100 additionally include one or more pulse generator (PGEN) 500. The PGEN 500 is arranged to connect to the interrupt inputs of each processor in the SoC. In different implementations,
      • a single PGEN 500 may be associated with all of the processors;
      • a single PGEN may be associated with each of the processors; or
      • there may be an equal number or less of PGENs than processors (but greater than one).
        In the embodiments, data can be transmitted directly between the peer processors 200, 300, via the PGEN 500, thus bypassing memory.
  • By way of one illustration, the processor 200 can generate a request to the bus 400 to transfer data to processor 300. Once the request is granted, the processor 200 will transfer data directly to the PGEN 500; that is, the data will be written directly into the PGEN 500. The processor 300 can now read the data directly from the PGEN 500 (i.e., directly connected to the interrupt control of the processor(s), bypassing memory). In this manner, data can be transferred between processors in real-time and the data transfer, in preferred implementations, will also be deterministic. This example is applicable between at least (i) processor to processor data transfer, (ii) processor to enabled interrupt device data transfer and (iii) processor to non-enable interrupt devices data transfer.
  • FIG. 3 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention. In this example, processors 1 through n, e.g., 200, 300, are in data communication with the on-chip bus arbiter 400, in embodiments via a 32 bit, 64 bit, 128 bit, etc. channel. A separate PGEN is in data communication with the each of the processors 1 . . . n. For example, PGEN 500 a is in data communication with the processor 200; whereas, the PGEN 500 b is in data communication with processor 300. As should thus be understood, FIG. 3 represents any number of processors n in data communication with the on-chip bus arbiter 400, and an equal number of PGENs each of which are directly connected to the interrupt control of a respective processor. A memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400.
  • In the example of FIG. 3, in one non-limiting illustrative example, processor 1, 200, is desirous of transferring data to processor “n”, 300. In this example, processor 1, 200, requests authorization from the on-chip bus arbiter 400 to transfer data to processor “n”, 300. Upon obtaining such authorization, the processor 1, 200, will write data directly to the PGEN 500 b, eliminating the need for a shared memory or processor “n”, 300, having to poll the system and request its own authorization to read data from a shared memory. Processor “n”, 300, reads the data directly from the PGEN 500 b. This same example can be implemented for any number of processors, each having their own PGEN.
  • FIG. 4 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention. In this example, processors 1 through “n”, e.g., 200, 300, are in data communication with the on-chip bus arbiter 400, in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel. A single PGEN 500 is directly connected to the interrupt control of the processors 1 . . . “n”. For example, PGEN 500 is in data communication with the processor 200 and processor 300. As should thus be understood, FIG. 4 represents any number of processors, n, in data communication with the on-chip bus arbiter 400, and a single PGEN is directly connected to the interrupt control of each of the processors. A memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400.
  • In the example of FIG. 4, in one non-limiting illustrative example, processor 1, 200, is desirous of transferring data to processor “n”, 300. In this example, processor 1, 200, requests authorization from the on-chip bus arbiter 400 to transfer data to processor “n”, 300. Upon obtaining such authorization, the processor 1, 200, will write data directly to the PGEN 500, eliminating the need for a shared memory and processor “n”, 300, having to poll the system and request its own authorization to read data from a shared memory. Processor “n”, 300, reads the data directly from the PGEN 500. This same example can be implemented for any number of processors, each sharing the PGEN 500.
  • FIG. 5 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention. In this example, processors 1 through “n”, e.g., 200, 300, are in data communication with an on-chip crossbar/switch 800, in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel. The on-chip crossbar/switch 450 provides a non-blocking communication, e.g., allows more than one processor to transfer data at one time. A separate PGEN is directly connected to the interrupt control of each of the processors 1 . . . “n”. For example, PGEN 500 a is in data communication with the processor 200; whereas, the PGEN 500 b is in data communication with processor 300. As should thus be understood, FIG. 5 represents any number of processors “n” in data communication with the on-chip crossbar/switch 800, and an equal number of PGENs each of which are directly connected to the interrupt control of the respective processor. A memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400.
  • In the example of FIG. 5, in one non-limiting illustrative example, processor 1, 200, is desirous of transferring data to processor “n”, 300. In this example, processor 1, 200, requests authorization from the on-chip crossbar/switch 450 to transfer data to processor “n”, 300. Upon obtaining such authorization, the processor 1, 200, will write data directly to the PGEN 500 b, eliminating the need for a shared memory and processor “n”, 300, having to poll the system and request its own authorization to read data from a shared memory. Processor “n”, 300, reads the data directly from the PGEN 500 b. This same example can be implemented for any number of processors, each having their own PGEN.
  • FIG. 6 shows an illustrative example of the invention which may equally be representative of a flow diagram showing steps of the invention. In this example, processors 1 through “n”, e.g., 200, 300, are in data communication with the on-chip crossbar/switch 800, in embodiments, via a 32 bit, 64 bit, 128 bit, etc. channel. A single PGEN 500 is in data communication with all of the processors 1 . . . “n”, i.e., directly connected to the interrupt control of the processor(s). For example, PGEN 500 is in data communication with the processor 200 and processor 300. As should thus be understood, FIG. 6 represents any number of processors “n” in data communication with the on-chip crossbar/switch 800, and a single PGEN in direct data communication with each of the respective processors. A memory controller 600 and memory map device 650 are also in data communication with the on-chip bus arbiter 400.
  • In the example of FIG. 6, in one non-limiting illustrative example, processor 1, 200, is desirous of transferring data to processor “n”, 300. In this example, processor 1, 200, requests authorization from the on-chip crossbar/switch 450 to transfer data to processor “n”, 300. Upon obtaining such authorization, the processor 1, 200, will write data directly to the PGEN 500, eliminating the need for a shared memory and processor “n”, 300, having to poll the system and request its own authorization to read data from a shared memory. Processor “n”, 300, reads the data directly from the PGEN 500. This same example can be implemented for any number of processors, each sharing the PGEN 500.
  • In the examples of FIGS. 3-6, the number of PGENs may be optimally matched to the system requirements. For example, the PGEN 500 may be a 32 bit, 64 bit or 128 bit channel; although, other bit channels are also contemplated by the invention. By way of example, if there is a packet with 64 bits, it may be desirous to have a 64 bit data channel; although a 32 bit data channel (or less) is also possible, with the understanding that two or more cycles will be required for data transfer.
  • Also, other considerations to be taken into account are the number of processors associated with the system. So, for example, if there are one hundred processors, it may be advantageous to have more than one PGEN with one hundred connections. Instead, as an example, there may be four PGENs connected to the system, each with 25 connections, in addition to a 32 bit (64 bit, etc. channel to the bus.
  • It is also contemplated by the invention to have the PGEN partitioned such that a portion (partitioned section) of each PGEN is associated with a single or multiple processors or function(s). By way of example, a 32 bit PGEN can be partitioned into four partitions of 8 bits each, with each partition being responsible for a single processor and/or function. In this way, data can be written into a single partition of a single PGEN, which can be dedicated to a single processor (or multiple processors). The partitions can be repartitioned across any processor or within one processor such that, in one example, a single partition (8 bit channel) can be dedicated to all processors, with the remaining channels dedicated to variations of different processors and/or functions. Thus, depending on the system requirements, it is possible to reduce the number of PGENs to an optimal level. In the embodiments, it is contemplated that the number of PGEN is less than or equal to a number of processors and equal to or greater than 1.
  • In any of the above examples, using the PGEN of the present invention, structure is provided for interrupt-based IPC signaling between peer processors in a real-time, multiprocessor SoC design. Also, the use of the system and method of the present invention provides scalability, thus eliminating any concerns about providing additional processors and/or processes within the system. Accordingly, regardless of the amount of processors and/or processes, response time for the data transfer can be increased.
  • FIG. 7 is a flow diagram implementing steps of the invention. FIG. 7 (and any other flow diagrams) may equally represent a high-level block diagram of the system, implementing the steps thereof. The steps of FIG. 7 (and FIG. 8) may be implemented on computer program code in combination with the appropriate hardware. This computer program code may be stored on storage media such as a diskette, hard disk, CD-ROM, DVD-ROM or tape, as well as a memory storage device or collection of memory storage devices such as read-only memory (ROM) or random access memory (RAM).
  • Referring to FIG. 7, at step 700, the sending processor obtains application requirements. These application requirements may be, for example, application code, data transmission rates, amount of time and data type to transmit. At step 705, the sending processor selects at least one channel to transfer data. For example, the sending processor may select all channels if there is a broadcast message; however, if there is not a broadcast message, the sending processor may select one or more channels. The selection of one or more channels may depend on such variables as the partitioning of the PGEN, the number of PGEN on the system, etc., all readily implemented by those of skill in the art. At step 710, the sending processor obtains ordering requirements. By way of example, if processor “A” would like to write data in a certain order (e.g., 1, 2, 3, 4 . . . ), the PGEN must comply with this ordering during the data transfer cycle.
  • At step 715, the sending processor will select the arbitration algorithm. This arbitration algorithm may provide priority to certain complex messages, e.g., priority to data transfer over the on-chip bus or on-chip crossbar/switch. This may be especially relevant when using the on-chip crossbar/switch since multiple communications can occur at the same time. At step 720, an interrupt type is selected. The interrupt type may be, for example, a fast interrupt type, a normal interrupt type or a non-maskable interrupt type, all known to those of skill in the art. At step 725, the complex message is sent over the bus (or switch) to the PGEN. At step 730, the receipt of the complex message is acknowledged. The step(s) described herein ensures that data is read in a priority order and that data from multiple processors are read in a deterministic manner.
  • FIG. 8 is a flow diagram implementing steps of the invention. At step 800, the PGEN decodes the message. At step 805, the PGEN receives the complex message. At step 810, the PGEN decodes the interrupt type, e.g., fast interrupt type, normal interrupt type or a non-maskable interrupt type. At step 815, the PGEN applies the arbitration algorithm such that it can determine the priority given to the complex message. At step 820, the PGEN applies the register ordering requirement. At step 825, the PGEN registers the channel and, at step 830, the PGEN applies the ordering requirements. In this way, the PGEN can provide the information directly to the receiving processor in a reliable manner. Also, the step(s) described herein ensures that data is read in a priority order and that data from multiple processors are read in a deterministic manner.
  • Accordingly, the system and method of the invention provides a flexible hardware pulse generator, arranged to connect to the interrupt inputs of each processor in the SoC. The system of the invention provides the ability to issue complex, real-time messages between the processors in a multiple processor SoC design. The system and method of the invention further provides global access from each processor to any of the interrupt pulse controls, and allows for broadcast, sub-broadcast and individual shoulder taps, with automatic interlocking/deterministic mechanism. That is, the system and method of the invention ensures that data is read in a priority order and that data from multiple processors are read in a deterministic manner.
  • The method as described herein is used in the fabrication of integrated circuit chips. The resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare die, or in a packaged form. In the latter case the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multichip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections). In any case the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product. The end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard or other input device, and a central processor.
  • The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • While the invention has been described in terms of exemplary embodiments, those skilled in the art will recognize that the invention can be practiced with modifications and in the spirit and scope of the appended claims.

Claims (35)

1. A method comprising:
transferring data from a first processor to at least one pulse generator directly connected to an interrupt control of at least a second processor, the transferring of the data bypasses memory; and
reading the transferred data directly from the at least one pulse generator by the at least a second processor.
2. The method of claim 1, wherein the transferring of the data and reading of the transferred data is in real-time in a multiprocessor SoC design.
3. The method of claim 1, wherein the transferring step is provided in a single clock cycle.
4. The method of claim 1, wherein the reading step is provided in a single clock cycle.
5. The method of claim 1, further comprising providing write access directly to the at least one pulse generator.
6. The method of claim 1, wherein the transferring data includes sending interrupts to the first processor or at least a second processor through the at least one pulse generator.
7. The method of claim 1, further comprising obtaining authorization from an arbiter to begin the transferring of data.
8. The method of claim 1, wherein the reading of the transferred data is directly from the at least one pulse generator.
9. The method of claim 1, wherein the transferring of the data and reading of the transferred data is at least (i) processor to processor data traffic, (ii) processor to enabled interrupt device data traffic and (iii) processor to non-enable interrupt devices data.
10. The method of claim 1, further comprising partitioning the at least one pulse generator such that each partition is dedicated to at least one of separate functions and separate processors.
11. The method of claim 1, further comprising fabricating an integrated circuit chip using the method of claim 1.
12. The method of claim 8, further comprising distributing the integrated circuit chip.
13. A method comprising:
obtaining application and ordering requirements;
selecting at least one channel, an arbitration algorithm and an interrupt type;
upon completion of the obtaining and selecting step, sending a complex message over a bus to a pulse generator; and
reading, by a processor, the complex message directly from the pulse generator.
14. The method of claim 13, wherein the obtaining the application requirement is at least one of an application code, data transmission rates, and amount of time and data type to transmit.
15. The method of claim 13, wherein the selecting the channel includes selecting all channels if there is a broadcast message or one or more channels if there is no broadcast message.
16. The method of claim 15, wherein the selection of the one or more channels is based on at least a partitioning of the pulse generator.
17. The method of claim 13, wherein the selecting of the arbitration algorithm provides priority to the complex message.
18. The method of claim 13, wherein the selecting of the interrupt type is one of a fast interrupt type, a normal interrupt type or a non-maskable interrupt type.
19. The method of claim 13, further comprising the pulse generator:
decoding the complex message and the interrupt type;
applying the arbitration algorithm and register ordering requirement;
registering the at least one channel; and
upon completion of the decoding, applying and registering steps, applying the ordering requirement.
20. The method of claim 19, wherein the reading step is provided after the steps of claim 19.
21. The method of claim 19, wherein the reading of the transferred data is in real time.
22. A system, comprising:
at least two processors connected to a bus system; and
at least one pulse generator connected to the bus system and each of the at least two processors,
wherein the at least one pulse generator is a write-only device receiving data from the at least two processors which has bypassed memory, and the at least two processors read data directly from the at least one pulse generator, bypassing the memory.
23. The system of claim 22, wherein the at least one pulse generator is equal to an amount of the at least two processors.
24. The system of claim 22, wherein the at least one pulse generator is equal to or less than the number of the at least two processors and equal to or greater than 1.
25. The system of claim 22, wherein the bus system is an on-chip bus arbiter or an on-chip crossbar/switch.
26. The system of claim 22, wherein the at least one pulse generator is connected directly to an interrupt controller of each of the at least two processors.
27. The system of claim 22, further comprising at least one interrupt enabled device and one non-interrupt enabled device which write data directly to the at least one pulse generator.
28. The system of claim 22, wherein the at least one pulse generator is partitioned for at least one of each of the at least two processors, functions and a combination thereof.
29. The system of claim 22, wherein the at least one pulse generator is a single pulse generator connected to the at least two processors.
30. A system, comprising:
peer processors connected to a bus system; and
at least one pulse generator receiving data over the bus system and connected to an interrupt control of the peer processors such that data from one of the peer processors is read directly from the at least one pulse generator by the one of the peer processors or another of the peer processors.
31. The system of claim 30, wherein the at least one pulse generator is a write-only device receiving data from the one of the peer processors.
32. The system of claim 30, wherein the at least one pulse generator bypasses memory such that the one or the another of the peer processors read data directly from the at least one pulse generator, bypassing the memory.
33. The system of claim 30, wherein:
the at least one pulse generator is equal to or less than the number of the peer processors and equal to or greater than 1; and
the at least one pulse generator is partitioned such that the partition is dedicated to one or more of the peer processors, functions or a combination thereof.
34. The system of claim 30, wherein the bus system is an on-chip bus arbiter or an on-chip crossbar/switch.
35. A computer program product comprising a computer useable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
provide a signal in one clock cycle to a pulse generator, the signal having data associated therewith; and
a processor reading the data directly from the pulse generator in one clock cycle and bypassing memory.
US11/275,091 2005-12-09 2005-12-09 Method and system of communicating between peer processors in SoC environment Expired - Fee Related US9367493B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US11/275,091 US9367493B2 (en) 2005-12-09 2005-12-09 Method and system of communicating between peer processors in SoC environment
PCT/EP2006/068523 WO2007065777A1 (en) 2005-12-09 2006-11-15 Method and system of communicating between processors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/275,091 US9367493B2 (en) 2005-12-09 2005-12-09 Method and system of communicating between peer processors in SoC environment

Publications (2)

Publication Number Publication Date
US20070136559A1 true US20070136559A1 (en) 2007-06-14
US9367493B2 US9367493B2 (en) 2016-06-14

Family

ID=36127019

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/275,091 Expired - Fee Related US9367493B2 (en) 2005-12-09 2005-12-09 Method and system of communicating between peer processors in SoC environment

Country Status (2)

Country Link
US (1) US9367493B2 (en)
WO (1) WO2007065777A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144480A1 (en) * 2007-12-03 2009-06-04 Jun-Dong Cho Multi-processor system on chip platform and dvb-t baseband receiver using the same
WO2016003544A1 (en) * 2014-06-30 2016-01-07 Intel Corporation Data distribution fabric in scalable gpus
CN113051213A (en) * 2021-03-02 2021-06-29 长沙景嘉微电子股份有限公司 Processor, data transmission method, device and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10591638B2 (en) * 2013-03-06 2020-03-17 Exxonmobil Upstream Research Company Inversion of geophysical data on computer system having parallel processors

Citations (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3786430A (en) * 1971-11-15 1974-01-15 Ibm Data processing system including a small auxiliary processor for overcoming the effects of faulty hardware
US4583222A (en) * 1983-11-07 1986-04-15 Digital Equipment Corporation Method and apparatus for self-testing of floating point accelerator processors
US4873656A (en) * 1987-06-26 1989-10-10 Daisy Systems Corporation Multiple processor accelerator for logic simulation
US4959781A (en) * 1988-05-16 1990-09-25 Stardent Computer, Inc. System for assigning interrupts to least busy processor that already loaded same class of interrupt routines
US5097412A (en) * 1987-04-24 1992-03-17 Hitachi, Ltd. Method for simulating the operation of programs in a distributed processing system
US5167023A (en) * 1988-02-01 1992-11-24 International Business Machines Translating a dynamic transfer control instruction address in a simulated CPU processor
US5283902A (en) * 1990-09-20 1994-02-01 Siemens Aktiengesellschaft Multiprocessor system having time slice bus arbitration for controlling bus access based on determined delay time among processors requesting access to common bus
US5301302A (en) * 1988-02-01 1994-04-05 International Business Machines Corporation Memory mapping and special write detection in a system and method for simulating a CPU processor
US5488713A (en) * 1989-12-27 1996-01-30 Digital Equipment Corporation Computer simulation technique for predicting program performance
US5805867A (en) * 1994-04-06 1998-09-08 Fujitsu Limited Multi-processor simulation apparatus and method
US5862366A (en) * 1996-09-12 1999-01-19 Advanced Micro Devices, Inc. System and method for simulating a multiprocessor environment for testing a multiprocessing interrupt controller
US5923887A (en) * 1996-05-20 1999-07-13 Advanced Micro Devices, Inc. Interrupt request that defines resource usage
US6014512A (en) * 1996-10-18 2000-01-11 Samsung Electronics Co., Ltd. Method and apparatus for simulation of a multi-processor circuit
US6115763A (en) * 1998-03-05 2000-09-05 International Business Machines Corporation Multi-core chip providing external core access with regular operation function interface and predetermined service operation services interface comprising core interface units and masters interface unit
US6154785A (en) * 1998-07-17 2000-11-28 Network Equipment Technologies, Inc. Inter-processor communication system
US6199031B1 (en) * 1998-08-31 2001-03-06 Vlsi Technology, Inc. HDL simulation interface for testing and verifying an ASIC model
US6208954B1 (en) * 1994-09-16 2001-03-27 Wind River Systems, Inc. Method for scheduling event sequences
US6321181B1 (en) * 1998-08-24 2001-11-20 Agere Systems Guardian Corp. Device and method for parallel simulation
US20020083387A1 (en) * 2000-12-22 2002-06-27 Miner David E. Test access port
US6467082B1 (en) * 1998-12-02 2002-10-15 Agere Systems Guardian Corp. Methods and apparatus for simulating external linkage points and control transfers in source translation systems
US6510531B1 (en) * 1999-09-23 2003-01-21 Lucent Technologies Inc. Methods and systems for testing parallel queues
US6606676B1 (en) * 1999-11-08 2003-08-12 International Business Machines Corporation Method and apparatus to distribute interrupts to multiple interrupt handlers in a distributed symmetric multiprocessor system
US6625679B1 (en) * 1999-04-19 2003-09-23 Hewlett-Packard Company Apparatus and method for converting interrupt transactions to interrupt signals to distribute interrupts to IA-32 processors
US6633940B1 (en) * 1999-10-11 2003-10-14 Ati International Srl Method and apparatus for processing interrupts in a computing system
US6718294B1 (en) * 2000-05-16 2004-04-06 Mindspeed Technologies, Inc. System and method for synchronized control of system simulators with multiple processor cores
US6732338B2 (en) * 2002-03-20 2004-05-04 International Business Machines Corporation Method for comprehensively verifying design rule checking runsets
US20040176059A1 (en) * 2002-12-18 2004-09-09 Frederic Hayem Multi-processor platform for wireless communication terminal having partitioned protocol stack
US6904398B1 (en) * 1998-06-29 2005-06-07 Stmicroelectronics Limited Design of an application specific processor (ASP)
US6937611B1 (en) * 2000-04-21 2005-08-30 Sun Microsystems, Inc. Mechanism for efficient scheduling of communication flows
US20070067528A1 (en) * 2005-08-19 2007-03-22 Schaffer Mark M Weighted bus arbitration based on transfer direction and consumed bandwidth
US7664928B1 (en) * 2005-01-19 2010-02-16 Tensilica, Inc. Method and apparatus for providing user-defined interfaces for a configurable processor

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62157961A (en) 1985-12-30 1987-07-13 Fanuc Ltd Method of controlling interruption in multiprocessor system
JPH0273435A (en) 1988-09-09 1990-03-13 Nec Corp Test system for external subroutine of information processor
US5193187A (en) 1989-12-29 1993-03-09 Supercomputer Systems Limited Partnership Fast interrupt mechanism for interrupting processors in parallel in a multiprocessor system wherein processors are assigned process ID numbers
JPH04148461A (en) 1990-10-12 1992-05-21 Hitachi Ltd System for multiprocessor system test
JP2855298B2 (en) 1990-12-21 1999-02-10 インテル・コーポレーション Arbitration method of interrupt request and multiprocessor system
JPH09325946A (en) 1996-06-05 1997-12-16 Toshiba Corp Test circuit for multiprocessor
CA2180231C (en) 1996-06-28 2006-10-31 William Gordon Parr Portable semi-automatic computer code key cutting machine
KR100456630B1 (en) * 2001-12-11 2004-11-10 한국전자통신연구원 Method and apparatus for interrupt redirection for arm processors

Patent Citations (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3786430A (en) * 1971-11-15 1974-01-15 Ibm Data processing system including a small auxiliary processor for overcoming the effects of faulty hardware
US4583222A (en) * 1983-11-07 1986-04-15 Digital Equipment Corporation Method and apparatus for self-testing of floating point accelerator processors
US5097412A (en) * 1987-04-24 1992-03-17 Hitachi, Ltd. Method for simulating the operation of programs in a distributed processing system
US4873656A (en) * 1987-06-26 1989-10-10 Daisy Systems Corporation Multiple processor accelerator for logic simulation
US5301302A (en) * 1988-02-01 1994-04-05 International Business Machines Corporation Memory mapping and special write detection in a system and method for simulating a CPU processor
US5167023A (en) * 1988-02-01 1992-11-24 International Business Machines Translating a dynamic transfer control instruction address in a simulated CPU processor
US4959781A (en) * 1988-05-16 1990-09-25 Stardent Computer, Inc. System for assigning interrupts to least busy processor that already loaded same class of interrupt routines
US5488713A (en) * 1989-12-27 1996-01-30 Digital Equipment Corporation Computer simulation technique for predicting program performance
US5283902A (en) * 1990-09-20 1994-02-01 Siemens Aktiengesellschaft Multiprocessor system having time slice bus arbitration for controlling bus access based on determined delay time among processors requesting access to common bus
US5805867A (en) * 1994-04-06 1998-09-08 Fujitsu Limited Multi-processor simulation apparatus and method
US6208954B1 (en) * 1994-09-16 2001-03-27 Wind River Systems, Inc. Method for scheduling event sequences
US5923887A (en) * 1996-05-20 1999-07-13 Advanced Micro Devices, Inc. Interrupt request that defines resource usage
US5862366A (en) * 1996-09-12 1999-01-19 Advanced Micro Devices, Inc. System and method for simulating a multiprocessor environment for testing a multiprocessing interrupt controller
US6014512A (en) * 1996-10-18 2000-01-11 Samsung Electronics Co., Ltd. Method and apparatus for simulation of a multi-processor circuit
US6115763A (en) * 1998-03-05 2000-09-05 International Business Machines Corporation Multi-core chip providing external core access with regular operation function interface and predetermined service operation services interface comprising core interface units and masters interface unit
US6904398B1 (en) * 1998-06-29 2005-06-07 Stmicroelectronics Limited Design of an application specific processor (ASP)
US6154785A (en) * 1998-07-17 2000-11-28 Network Equipment Technologies, Inc. Inter-processor communication system
US6321181B1 (en) * 1998-08-24 2001-11-20 Agere Systems Guardian Corp. Device and method for parallel simulation
US6199031B1 (en) * 1998-08-31 2001-03-06 Vlsi Technology, Inc. HDL simulation interface for testing and verifying an ASIC model
US6467082B1 (en) * 1998-12-02 2002-10-15 Agere Systems Guardian Corp. Methods and apparatus for simulating external linkage points and control transfers in source translation systems
US6625679B1 (en) * 1999-04-19 2003-09-23 Hewlett-Packard Company Apparatus and method for converting interrupt transactions to interrupt signals to distribute interrupts to IA-32 processors
US6510531B1 (en) * 1999-09-23 2003-01-21 Lucent Technologies Inc. Methods and systems for testing parallel queues
US6633940B1 (en) * 1999-10-11 2003-10-14 Ati International Srl Method and apparatus for processing interrupts in a computing system
US6606676B1 (en) * 1999-11-08 2003-08-12 International Business Machines Corporation Method and apparatus to distribute interrupts to multiple interrupt handlers in a distributed symmetric multiprocessor system
US6937611B1 (en) * 2000-04-21 2005-08-30 Sun Microsystems, Inc. Mechanism for efficient scheduling of communication flows
US6718294B1 (en) * 2000-05-16 2004-04-06 Mindspeed Technologies, Inc. System and method for synchronized control of system simulators with multiple processor cores
US20020083387A1 (en) * 2000-12-22 2002-06-27 Miner David E. Test access port
US6732338B2 (en) * 2002-03-20 2004-05-04 International Business Machines Corporation Method for comprehensively verifying design rule checking runsets
US20040176059A1 (en) * 2002-12-18 2004-09-09 Frederic Hayem Multi-processor platform for wireless communication terminal having partitioned protocol stack
US7664928B1 (en) * 2005-01-19 2010-02-16 Tensilica, Inc. Method and apparatus for providing user-defined interfaces for a configurable processor
US20070067528A1 (en) * 2005-08-19 2007-03-22 Schaffer Mark M Weighted bus arbitration based on transfer direction and consumed bandwidth

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144480A1 (en) * 2007-12-03 2009-06-04 Jun-Dong Cho Multi-processor system on chip platform and dvb-t baseband receiver using the same
WO2016003544A1 (en) * 2014-06-30 2016-01-07 Intel Corporation Data distribution fabric in scalable gpus
US9330433B2 (en) 2014-06-30 2016-05-03 Intel Corporation Data distribution fabric in scalable GPUs
CN106462939A (en) * 2014-06-30 2017-02-22 英特尔公司 Data distribution fabric in scalable GPU
US10346946B2 (en) 2014-06-30 2019-07-09 Intel Corporation Data distribution fabric in scalable GPUs
US10580109B2 (en) 2014-06-30 2020-03-03 Intel Corporation Data distribution fabric in scalable GPUs
CN113051213A (en) * 2021-03-02 2021-06-29 长沙景嘉微电子股份有限公司 Processor, data transmission method, device and system

Also Published As

Publication number Publication date
WO2007065777A1 (en) 2007-06-14
US9367493B2 (en) 2016-06-14

Similar Documents

Publication Publication Date Title
US11681645B2 (en) Independent control of multiple concurrent application graphs in a reconfigurable data processor
US4112490A (en) Data transfer control apparatus and method
US4698753A (en) Multiprocessor interface device
US9495290B2 (en) Various methods and apparatus to support outstanding requests to multiple targets while maintaining transaction ordering
CN100499556C (en) High-speed asynchronous interlinkage communication network of heterogeneous multi-nucleus processor
US10802995B2 (en) Unified address space for multiple hardware accelerators using dedicated low latency links
US11080220B2 (en) System on chip having semaphore function and method for implementing semaphore function
US8805926B2 (en) Common idle state, active state and credit management for an interface
JPS6218949B2 (en)
KR101056153B1 (en) Method and apparatus for conditional broadcast of barrier operations
US7581049B2 (en) Bus controller
CN1816012B (en) Scalable, high-performance, global interconnect scheme for multi-threaded, multiprocessing system-on-a-chip network processor unit
US7162573B2 (en) Communication registers for processing elements
US9367493B2 (en) Method and system of communicating between peer processors in SoC environment
US7370311B1 (en) Generating components on a programmable device using a high-level language
EP4020246A1 (en) Micro-network-on-chip and microsector infrastructure
US20100152866A1 (en) Information processing apparatus, information processing method and computer-readable medium having an information processing program
GB2483884A (en) Parallel processing system using dual port memories to communicate between each processor and the public memory bus
US20090144480A1 (en) Multi-processor system on chip platform and dvb-t baseband receiver using the same
US9424073B1 (en) Transaction handling between soft logic and hard logic components of a memory controller
US20080270748A1 (en) Hardware simulation accelerator design and method that exploits a parallel structure of user models to support a larger user model size
KR102584507B1 (en) Link layer data packing and packet flow control techniques
Cummings PivotPoint: Clockless crossbar switch for high-performance embedded systems
Hitanshu Optimized design of ahb multiple master slave memory controller using VHDL
CN107273312B (en) Direct memory access control device for computing unit with working memory

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DEVINS, ROBERT J.;MILTON, DAVID W.;NSAME, PASCAL A.;SIGNING DATES FROM 20051202 TO 20051206;REEL/FRAME:016893/0758

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DEVINS, ROBERT J.;MILTON, DAVID W.;NSAME, PASCAL A.;REEL/FRAME:016893/0758;SIGNING DATES FROM 20051202 TO 20051206

AS Assignment

Owner name: GLOBALFOUNDRIES U.S. 2 LLC, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:036550/0001

Effective date: 20150629

AS Assignment

Owner name: GLOBALFOUNDRIES INC., CAYMAN ISLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GLOBALFOUNDRIES U.S. 2 LLC;GLOBALFOUNDRIES U.S. INC.;REEL/FRAME:036779/0001

Effective date: 20150910

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: WILMINGTON TRUST, NATIONAL ASSOCIATION, DELAWARE

Free format text: SECURITY AGREEMENT;ASSIGNOR:GLOBALFOUNDRIES INC.;REEL/FRAME:049490/0001

Effective date: 20181127

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20200614

AS Assignment

Owner name: GLOBALFOUNDRIES INC., CAYMAN ISLANDS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION;REEL/FRAME:054636/0001

Effective date: 20201117

AS Assignment

Owner name: GLOBALFOUNDRIES U.S. INC., NEW YORK

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION;REEL/FRAME:056987/0001

Effective date: 20201117