WO2008095201A1 - Processor chip architecture having integrated high-speed packet switched serial interface - Google Patents

Processor chip architecture having integrated high-speed packet switched serial interface Download PDF

Info

Publication number
WO2008095201A1
WO2008095201A1 PCT/US2008/052969 US2008052969W WO2008095201A1 WO 2008095201 A1 WO2008095201 A1 WO 2008095201A1 US 2008052969 W US2008052969 W US 2008052969W WO 2008095201 A1 WO2008095201 A1 WO 2008095201A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
packet
processor core
semiconductor die
die package
Prior art date
Application number
PCT/US2008/052969
Other languages
French (fr)
Inventor
Viswa Sharma
William Chu
Bart Stuck
Original Assignee
Psimast, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Psimast, Inc. filed Critical Psimast, Inc.
Priority to KR1020097018172A priority Critical patent/KR101453581B1/en
Priority to CN2008800038694A priority patent/CN101918931B/en
Publication of WO2008095201A1 publication Critical patent/WO2008095201A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/382Information transfer, e.g. on bus using universal interface adapter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4604LAN interconnection over a backbone network, e.g. Internet, Frame Relay
    • H04L12/462LAN interconnection over a bridge based backbone
    • H04L12/4625Single bridge functionality, e.g. connection of two networks over a single bridge
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/54Store-and-forward switching systems 
    • H04L12/56Packet switching systems

Definitions

  • the present invention relates generally to the field of computing and communication architectures, and more specifically to an architecture for processor and memory access using an integrated a high speed packet switched serial interface directly onto the same chip as the processor arrangement.
  • the term computer architecture in a very broad sense connotes the interconnection of a core set of functional units that include a processing subsystem that executes instructions and acts upon data, a memory subsystem that cooperates with the processing subsystem to enable selected data and instructions to be stored and transferred between the two subsystems, and an input/output (I/O) subsystem that allows at least the processing subsystem to exchange data and instructions with the network and peripheral environment external to the computer.
  • This core set of functional units can be configured into different computer system topologies using various communication interconnection arrangements that govern the interchange of communications between the functional units. For example, a processor and its memories can be locally coupled in a circuit card or it could be geographically spread over a system chassis via a back plane interconnection.
  • PC The Personal Computer
  • a typical PC is comprised of a single circuit board, referred to as a motherboard, that includes a microprocessor which acts as the central processing unit (CPU), a system memory and a local or system bus that provides the interconnection between the CPU chip and the system memory chips located on the motherboard and the I/O ports that are typically defined by connectors along an edge of the motherboard.
  • CPU central processing unit
  • system memory a single circuit board
  • I/O ports that are typically defined by connectors along an edge of the motherboard.
  • HPC high performance computing
  • blades such as server blades, memory blades, I/O blades, PC blades are plugged into a common rack that is based on industry standards.
  • the functional elements of the computer system are broken out into smaller circuit cards referred to as blades that are then coupled together by a backplane that routes the larges amounts of data among different blades.
  • the backplane fabric for the common rack has been implemented by a standardized parallel bus interconnection technology such as the PCI bus.
  • processors There are two fundamental architectures to access memory.
  • One of the architectures is the Von Neumann architecture wherein one shared memory is used to store instructions (program) and data with one data bus and one address bus between processor and memory.
  • This architecture requires instructions and data be fetched sequentially introducing a limitation in operation bandwidth which is often termed the "Von Neuman Bottleneck".
  • the second architecture to access memory is referred to as the Harvard architecture which uses physically separate memories and dedicated buses for their instructions and data. Instructions and operands can therefore be fetched simultaneously.
  • Both architectures involve a bus or buses to transfer information between the processor and memory. It will be appreciated by those skilled in the art that regardless of the processor and memory speeds, the speed of information transfer between the processor and memory can substantially impact the performance of the computer system.
  • I/O devices external to the motherboard communicate over a slow speed I/O bus, such as the (Peripheral Component Interconnect (PCI) Bus, that is connected to a chipset on the motherboard, referred to as a bridge, which in turn communicates with the CPU over the front side bus.
  • PCI Peripheral Component Interconnect
  • I/O technologies such as Infiniband and Multi Gigabit Ethernet, can deliver I/O communications at rates approaching upwards of several gigabits per second. These developments have blurred the conventional distinctions between CPU-memory and CPU-I/O transactions and negated the rationale for relegating I/O communications to a separate, slower legacy I/O bus such as the PCI bus.
  • the problem created by this divergence between processor speeds and memory access speeds is well known and has been referred to in the prior art as the memory gap or memory wall problem. See, e.g., Cuppa et al. , "Organizational Design Trade-Offs at the DRAM, Memory Bus and Memory Controller Level: Initial Results", University of Maryland Systems & Computer Architecture Group Technical Report UMD-SCA- 1999-2, November 1999.
  • the memory gap problem is further compounded by the need to address a large memory capacity.
  • One solution employed in the prior art to overcome the memory wall/memory gap problem is to eliminate the parallel bus interface between the processor and memory and use a serial backplane interface instead of a parallel bus like the PCI bus.
  • serial chip-to-chip interfaces such as described by Trynosky, "Serial Backplane Interface to a Shared Memory," Application Note: Virtex -II Pro FPGA Family, XILINX, November 30, 2004 or and multiple single byte serial processor to memory interfaces as described by Davis, "The Memory Channel,” Summit Computer Systems, Inc. September 19, 2004.
  • Serial interfaces have also become the standard for almost all I/O communication channels, including back planes.
  • Advanced Switching Interconnect (ASI) switching fabrics that utilizes hierarchies and multiple high speed clocked serial data lanes channels or proprietary packet switched DMA techniques as described, for example, in US Patent No. 6,766,383.
  • Industry standard I/O protocols such as Infiniband, Fibre Channel and Gigabit Ethernet, can deliver I/O communications at rates approaching upwards of several gigabits per second.
  • Serial I/O communication protocols generally have larger packet and address sizes that are better suited for accessing large amounts of data stored on disk or over a network. The larger packet and address sizes results in an increased communication overhead penalty.
  • the processor/memory interface conventionally has required the ability to transfer data between the processor and memory for a single address location, a requirement for which the overhead of I/O transfers and protocols has been seen as massive overkill.
  • U.S. Publ. Appl. No. 20050091304 discloses a control system for a telecommunication portal that includes a modular chassis having an Ethernet backplane and a platform management bus which houses at least one application module, at least one functional module, and a portal executive.
  • a lOOOBaseT (Gigabit Ethernet) backplane provides a packet-switched network wherein each of the connected modules acts as an individual node on a network in contrast to a conventional parallel bus connection such as a PCI bus.
  • U.S. Publ. Appl. No. 20060123021 discloses a hierarchical packaging arrangement for electronic equipment that utilizes an Advanced Telecommunication Computing Architecture (TCA) arrangement of daughter boards in the for an Advanced Mezzanine Card (AMC) that are interconnected with a hierarchical packet-based interconnection fabric such as Ethernet, RapidIO, PCI Express or Infiniband.
  • TCA Advanced Telecommunication Computing Architecture
  • AMC Advanced Mezzanine Card
  • a hierarchical packet-based interconnection fabric such as Ethernet, RapidIO, PCI Express or Infiniband.
  • the AMCs in each local cube are connected in a hierarchical configuration by a first, lower speed interface such a Gigabit Ethernet for connections within the local cube and by a second, higher speed interface such as 1OG Ethernet for connections among cubes.
  • Ethernet switched backplane architectures in terms of latency, flow control, congestion management and quality of service are well known and described, for example, by Lee, "Computation and Communication Systems Need Advanced Switching," Embedded Intel Solutions, Winter 2005. These issues have generally discouraged the adoption of serial I/O protocols for communications between processors and memory even as such serial I/O protocols are being used in the smaller physical dimensions of a circuit board or a computer or communication rack or cabinet having multiple cards/blades interconnected by a backplane. Instead, the trend has been to increase the capacity of individual chips and the physical size of each of the server blades in order to accommodate more processors and memory on a single chip or circuit board, thereby reducing the need for processor and memory interconnection that must be mediated across the backplane.
  • the present invention is directed to a computing and communication chip architecture wherein the off-chip interfaces of processor and memory chips are implemented as a highspeed packet switched serial interfaces as part of each chip in a semiconductor package.
  • the high-speed packet switched serial interface is a gigabit Ethernet interface implemented by a packet processor co-located with at least one processor core within the chip package.
  • the serial interface is configured to transfer data, address and control information, required to fetch and write data from and to an external memory device such as a system main memory using a serial packetized protocol.
  • Communications between at least one processor and the external memory device may be mediated by at least one bridge device capable of translating between multiple serialized protocols and optionally a switch device adapted to mediate communications between on-chip entities such as processor cores, caches, and the packet processor, as well as the communications between on-chip entities and off-chip devices such as the system main memory.
  • bridge device capable of translating between multiple serialized protocols
  • switch device adapted to mediate communications between on-chip entities such as processor cores, caches, and the packet processor, as well as the communications between on-chip entities and off-chip devices such as the system main memory.
  • the packet processor is implemented as a on the fly programmable bit stream protocol processor integrated as part of the chip.
  • a processor chip with cache can connect to a system or main memory chipset via a bit stream protocol processor incorporated as part of the microprocessor chip.
  • the processor serial interface can be a lOGiga bit Ethernet interface.
  • the protocol processor encapsulates the memory address and control information like Read, Write, number of successive bytes etc, as an Ethernet packet for communication among the processor(s) and memory chips that are located on the same chip, or on the motherboard, or alternatively on different circuit cards.
  • the communication overhead of the Ethernet protocol is further reduced by using an enhanced Ethernet protocol with shortened data frames within a constrained neighborhood, and/or by utilizing a bit stream switch where direct connection paths can be established between elements that comprise the computing or communication architecture.
  • Figures IA, IB, 1C, ID, and IE illustrate various configurations of front side bus arrangements for prior art processor chipset architectures.
  • Figures 2 A depicts a chip architecture according to one aspect of the present invention wherein the processor chip package communicates externally via at least one serial line extending from a packet processor based parallel bus to serial interface converter located on the die.
  • Figure 2B is a block diagram representation of a multi-core processor chip package according to one embodiment of the present invention that is communicatively coupled to the devices external to the chip via at least one programmable serial interconnect extending from a switch and a parallel bus to serial interface module located within the chip package.
  • Figure 2C is a block diagram representation of a multi-core processor chip package according to one embodiment of the present invention that is communicatively coupled to the devices external to the chip via at least one serial line extending from a module located within the package and adapted to function as a combination switch and a parallel bus to serial interface.
  • Figure 2D is a block diagram representation of a packet processor based Ethernet bridge that provides protocol translation and the serves as a "Southbridge" in a processor chip that features a unified computing, backplane, and network architecture.
  • Figure 3A illustrates a more detailed block diagram of a packet processor based parallel bus to serial interface converter that incorporates a token based, point-to-point communication in Ethernet between communications generating and consuming nodes in the system in accordance with one embodiment of the present invention.
  • Figure 3B illustrates a detailed block diagram of a packet processor based parallel bus to a serial interface that converts to and from parallel bus communications and serial packetized communications based on a pre-defined serial packet protocol in accordance with one embodiment of the present invention.
  • Figure 3 C is a schematic representation of a packet processor based parallel bus to serial interface converter in which the serial packet protocol output from the converter is programmable.
  • Figure 4 illustrates an embodiment of the present invention incorporated into a three- dimensional chip architecture.
  • FIG. 5A is a block diagram of a processor chip package containing a single processor "core" that communicated externally via at least one serial line according to one embodiment of the present invention
  • Figure 5B is a block diagram of a processor chip package containing multiple processor "cores" each of which is placed in serial communication with a port on an external switch that in turn communicates with devices external to the chip package.
  • Figure 5C is a block diagram of a processor chip package containing multiple processor "cores" each of which communicates with a multi-port parallel bus to serial interface converter contained within the chip package and placed in serial communication with devices external to the chip package via at least one serial line.
  • FIGS IA, IB, 1C, ID, and IE illustrate various configurations of front side bus (alternately "Channel") arrangements for prior art processor chipset architectures.
  • a clocked bus interface 10 is used between the processor chip 15 and one or more support chips 20 for purposes of routing data and instructions among the various elements of the computer architecture 5.
  • FSB Front Side Bus
  • Processor Side Bus the Memory Bus
  • Data Bus or the System Bus
  • the Northbridge 25 interconnects the CPU 15 to the RAM memory 30 via the FSB.
  • the Northbridge also connects peripherals such as the graphics card 35 via high speed channels such as the AGP and the PCI Express.
  • the Southbridge controller 40 handles I/O including hard drives, USB, serial and parallel ports and external storage devices via other channels running communication protocols such as Ethernet and PCI Express.
  • FSB front-side bus
  • AMC PICMG® Advanced Mezzanine Card
  • AMC card interconnect is specified at 12. 5 Gbps per differential pair.
  • Xilinx operates at 8 Gbps and Fujitsu offers a 10-Gigabit Ethernet Switch.
  • Intel Itanium 2 processor front-side bus (FSB) speed is approximately 667 MHz
  • TM AMD Opteron
  • FIG. 8 GHz 8 GHz
  • the Intel Hub Architecture which substitutes the Memory Controller and the I/O Controller for the Northbridge and Southbridge controllers, features a system bus between the CPU and the Memory Controller that is capable of operating at speeds of 400 GHz, even though the dual RDRAM operate through the Memory Controller Hub (MCH) 25, to deliver a memory bandwidth of 3. 2 GB/s as illustrated in Figure ID.
  • Figure IE illustrates the IHA based multiprocessor architecture known to the art.
  • FIG. 2A-2C there is illustrated a multi-core processor architecture 50 according to a primary embodiment of the present invention.
  • One aspect of the illustrated multi-core processor architecture 50 takes the form of a single physical package 55 (alternately "Processor Chip Package") that is received into a single processor socket (not illustrated).
  • This single physical package 55 includes a plurality of execution cores (alternatively, computational engines, or processing engines) 60 but an external operating system perceives the package as a single processor.
  • the core 60 can be pin compatible with existing processor sockets.
  • Each execution core 60 includes its own processor-specific functional blocks such as, for example, caches, arithmetic logic units (ALU)s, priority interrupt controller, architectural registers, pipeline prediction mechanisms, and instruction set as seen in the illustrations of Figures 5A-5C.
  • Each execution core is capable of independently executing program instructions and a plurality of threads under the direction of the external operating system.
  • the cores can execute internal and/ or external instructions in cooperation with the remaining core or cores in the package, an operating system can differentiate between the services provided by each of the cores and the cores can access shared resources such as cache and external system memory 70 as seen in Figures 2A and 2C for example.
  • the operating system may be capable of supporting parallel execution among multiple cores and each core, or various combinations of cores, can be seen by the operating system as separate parallel processing units.
  • the present invention is not limited by any particular core or number of cores that might reside within a single physical package 55.
  • the execution cores can be one or more of the Smithfield core used in Intel's 90 nanometer Pentium D's and Pentium Extreme Edition 840, the Presler core used in Intel's 65 nanometer Pentium Extreme Edition 955 processor, AMD's 90 nanometer Egypt and Denmark cores. Other cores can be used within the scope of the present invention.
  • An important feature of the present invention is that data-communication between the processor 55 and the system devices 80 occur via at least one serial interconnect 90 mediated by a bridge-architecture 100 that in at least one embodiment communicates with a switch- architecture 105 as seen in figure 5C for example.
  • the switch-architecture 5C is the gateway via which the rest of the devices 80 in the system and the processor communicate.
  • the bridge-architecture 100 and optionally the switch-architecture 105 are located on the processor die in an integrated configuration as illustrated in Figure 2C. In such cases, one or more of the bridge-architecture and switch-architecture may be implemented in the form of additional core or cores on the die.
  • FIG. 5A-5C Exemplary embodiments of the processor die configurations are illustrated in Figures 5A-5C.
  • the switch maybe located outside the die as illustrated in Figure 2 A, 2B, 5 A and 5B.
  • the bridge-architecture and switch- architecture are included within the scope of the present invention. It must be emphasized that although the aforementioned embodiments are described for a multi-core architecture, the disclosed invention is equally applicable to the case where the processor package includes only one core (single processor) and to the case where the bridge-architecture and the switch- architecture is a single module, such as the parallel bus to serial interface converter 120 in Figures 2C and 5C for example.
  • the bridge-architecture is implemented using a packet processor architecture as shown in Figure 3A-3C.
  • Figure 3B is a specific embodiment of a typical packet processor according to the present invention.
  • Communications from the processor transferred over a parallel bus 150 such as the data, address and control information related to a "write" command to external system memory issued by a processor core 60 in an exemplary processor chip package 55, is processed by the packet processor portion 180 to generate serial packetized communications 155 (165) that are transferred via one or more serial lines 90 outside the chip package 55.
  • Serial communications 160 (170) received from outside the chip package 55 are processed by the packet processor portion 188 into parallel communications transferred over parallel bus 150 to the processor as exemplified in Figures 3A and 3B.
  • serial-to-parallel transformations may be applied to communications between processor cores within the chip package, processor-core and external devices including other chip packages and I/O devices within the scope of the present invention.
  • the functional blocks of Figures 3A and 3B may be adapted according to a bitstream processor (BSP) architecture illustrated in Figure 3C for instance.
  • BSP bitstream processor
  • the Bitstream processor is an on the fly programmable integrated packet processor, security engine and traffic manager using high performance pipelined packet switching architecture.
  • the Bitstream processor may be physically implemented as an additional "core", integrated with other logic devices on the processor die or on a stand-alone chip while remaining within the scope of the present invention.
  • the Bit Stream processor performs a forward and reverse bridging function using a programmable pipelined architecture that provides high degree of flexibility for adaptation to legacy, existing and emerging board-level and network-level data communication / signaling protocols.
  • Each stage/block within the pipeline has specific functions or responsibilities that make available any relevant information to the subsequent blocks.
  • the architecture for each stage is different and is optimized to handle a given function.
  • Each stage can be dynamically programmed on a packet by packet basis while the processor cores transfer data/instructions by sending several bits at one time over a parallel communications link.
  • the intra-core data/instructions use signaling that is native to the processor-core and the associated system bus characterizing a vendor-specific CPU architecture such as for example, signaling compatible with the front side bus by Intel, the hyper transport technology based interconnect protocol by AMD or other proprietary /non-proprietary bus protocols.
  • the Bit Stream processor bridges between the intra-processor protocol and one of a set of board-level or network level serial communication protocols. Upstream information transfers to the processor from the bridge are parallelized, formatted and clocked so that they represent the native signaling used by the processor cores. Responses from the cores (i. e. the downstream information transfers such as for example, the memory requests or other system requests, are serialized and packetized by the Bitstream processor.
  • the Bitstream processor that processes the packets takes the form as described in more detail in the U.S. Application Serial No. 11/466,367, filed August 23, 2006, entitled "Omni-Protocol Engine for Reconfigurable Bit-Stream Processing in High- Speed Networks," the disclosure of which is hereby incorporated by reference.
  • the packet processing by the Bitstream processor causes the packets to be bridged to a desired board level or network level protocol/bus-architectures and forwarded to the switch-architecture.
  • Exemplary protocols include, without limitation, PCI-Express, 10 Gigabit Ethernet, Infiniband, Advanced Switching, RapidIO, SPI 4. 2, XAUI and Serial I/O. Other protocols may be advantageously used without limiting the scope of the present invention.
  • FIG. 5A and 5B contemplates an arrangement of the processor and bridge wherein the packet processor enables on-die connections for each of the plurality of protocols via separate ports comprised by one or more processor pins. Each port is configured to provide serial input/output to the processor in accordance with a specific pre-defined protocol.
  • the Bitstream processor is programmable to allow software based programming of the protocols characterizing communications at any particular serial interconnect or port.
  • Each of the cores can be specialized to be application specific - such as packet processing for telecommunications, graphics engine functionality for gaming, and parallel computations for high performance computing.
  • the Bitstream processor can be programmed to assign all traffic associated with a particular core to a specified port.
  • the aforementioned port can couple to an Advanced Mezzanine Card (AMC) module and provide processor support to the module where applicable or provide all or part of the Module Management Controller (MMC) functionality in an AdvancedTCA® (ATCA) based open modular system architecture.
  • AMC Advanced Mezzanine Card
  • ATCA AdvancedTCA®
  • the packet processor based bridge-architecture is coupled via a serial interconnect to a switch-architecture.
  • the switch- architecture is a non-blocking switch that provides serial, high-speed, point-to-point connections in a cut-thorough mode between multiple devices and the processor.
  • the switch- architecture may be implemented through merchant switches such as, for example, the GigPCI-Express switch, model 6468- 8-port Gigabit Ethernet switch by DSS networks, or the MB8AA3020 20-port, lOGbps Ethernet (10GbE) switch IC by Fujitsu Microelectronics America.
  • FIGS 5B and 5C there is illustrated a multi core embodiment of the Ether PC of the present invention with dual cores in which one of the cores is dedicated for communication applications.
  • this illustrated multi core embodiment there is separate program space and data space.
  • the cores can access any space by switching between the two.
  • the Data to I/O is switched.
  • the switch allows a memory request originating at an execution core to be switched to one or more external memory resources thereby overcoming memory bandwidth limitations inherent in conventional architectures where memory requests traverse a single data communication bus to and fro from a single system memory resource.
  • Another embodiment of the present invention contemplates a switching architecture implementation using the packet processor illustrated in Figures 2C and 5C.
  • One of the features of such an embodiment is combined bridge-switch architecture located on the processor die and capable of providing the services described above.
  • Another embodiment contemplates integrating the architectures disclosed in U.S. Application Serial No. 11/828,329, filed July 25, 2007, entitled “Telecommunication and Computing Platforms with Serial Packet Switched Integrated Memory Access Technology,” (the disclosure of which is hereby incorporated by reference) into a single die / processor package.
  • the packet protocol processor allows line speed QoS packet switching which is utilized to implement a simple token based communication in Ethernet between the processor and the devices in the system as set forth in U.S. Application Serial No. 1 1/838,198, filed August 13, 2007, entitled "Enhanced Ethernet Protocol for Shortened Data Frames Within a Constrained Neighborhood Based on Unique ID,” the disclosure of which is hereby incorporated by reference.
  • the packetized communication over the bridge-switch-architecture is further specialized to speed-up sustained, point-to-point communications in the system.
  • Each packet is provided with a source address (SA) and destination address (DA) and E-type like VLAN Tag for use in negotiating a unique token between end points on a communication link.
  • SA source address
  • DA destination address
  • E-type like VLAN Tag for use in negotiating a unique token between end points on a communication link.
  • the E-type extensions may be, for example, Request for UNIQUE ID or TOKEN GRANT; data communication with the granted token and request to retire the TOKEN.
  • the SA and DA fields are used along with the E-type to pass short date. This may also be extended to include large blocks of data for STA, and SAS.
  • a fixed frame size is used to endow the link with predictable performance in transferring the fixed frame and consequently meet various latency requirements.
  • the S A/DA pair could be used to transmit 12 bytes of data, 2 E- Type bytes and 2 bytes TAG.
  • the processor card is provided with two switchable caches (like two register files for threads). On a cache miss, the processor switches over from the first cache to the second cache to begin processing a second program thread associated with the second case. In another embodiment, there could be a cache per extended memory.
  • control is provided as part of the extended Ethernet protocol. This could also "add" to the CPU wait cycles if more than one processor requests the same block of memory. In a sense that would be a component of latency because the processor and the instructions scheduled for execution cannot distinguish between data locality dependent latency (speed of access and transfer) versus concurrency control based data access "gap" because barring data mirroring concurrent access is not instantaneous access.
  • the memory modules of the illustration of Figure 2A and 2C comprise four Channel Fully-Buffered Dual Inline Memory Modules (FB-DIMM)s.
  • FB- DIMM memory uses a bi-directional serial memory bus which passes through each memory module.
  • the FB-DIMM transmits memory data in packets, precisely controlled by the AMB (Advanced Memory Buffer) chips built into each FB-DIMM module.
  • the four Channel FB-DIMMS are connected to 4OG lines and terminated to FB-DIMM lanes.
  • the AMB is 10 lanes serial south bound and 14 lanes serial North bound.
  • the AMB is configured to be a 16 Lane Fabric having less than 5 Gbps total bandwidth coming out of the memory Controller of Figure 4A.
  • the Fujitsu Axel X by Fujitsu Microelectronics America
  • the aforementioned requirements can be met by the use of a single 1OG lane. Additional bandwidth in excess of than 5 Gbps is provided by the use of multiple AMCs or multiple lanes.
  • Serialization and De-serialization on the DRAM end and serialization and deserialization on the processor side.
  • the latency penalty of the Switch, and any overhead in the serialization and de- serialization methods due to the serialization/de-serialization can be overcome in the manner set forth in the succeeding paragraphs.
  • latency and contention/concurrency issues within the Ethernet switched fabric are resolved within a "contained network.” Deterministic latency (tolerable margin jitter) through a "well contained network” (such as the packaging arrangement as described herein) is indeed possible. Switching priority, dedicated ports (a pseudo port to dedicated memory ports, communicating over Unique IDs between these ports and other techniques disclosed in the previously identified application entitled “Enhanced Ethernet Protocol for Shortened Data Frames Within a Constrained Neighborhood Based on Unique ID,” are advantageously utilized to overcome latency and contention/concurrency related issues.
  • the present invention can be adapted to support a mesh architecture of processor-to-processor interconnection via the switched Ethernet fabric.
  • N-I connections are made to each node with each node have 2 connections to all other nodes, n other embodiments, different combinations of number of Ethernet ports/card, number of ports/switch and number of switches/packaging arrangement can provide for various combinations of connections per node.
  • the bit stream protocol processor enables prioritized switching.
  • the present invention allows the creation of an N-layered hierarchy of multiprocessors where N is both hardware independent and dynamically selectable by altering the prioritization afforded to different subsets of processors in the bit stream protocol processor mediated fabric.
  • This embodiment enables the chip architecture to be configured as a shared memory model machine as well as a message passing model multiprocessor machine.
  • the architecture in accordance with one embodiment of the present invention may be configured as a server, a storage area network controller, a high performance network node in a grid computing based model, or a switch/router in a telecommunication network. It will be recognized that the same basic machine may be programmatically or manually altered into one or more of the aforementioned special purpose machines as and when desired.

Abstract

A computing and communication chip architecture is provided wherein the interfaces of processor access to the memory chips are implemented as a high-speed packet switched serial interface as part of each chip. In one embodiment, the interface is accomplished through a gigabit Ethernet interface provided by protocol processor integrated as part of the chip. The protocol processor encapsulates the memory address and control information like Read, Write, number of successive bytes etc, as an Ethernet packet for communication among the processor and memory chips that are located on the same motherboard, or even on different circuit cards. In one embodiment, the communication over head of the Ethernet protocol is further reduced by using an enhanced Ethernet protocol with shortened data frames within a constrained neighborhood, and/or by utilizing a bit stream switch where direct connection paths can be established between elements that comprise the computing or communication architecture.

Description

PROCESSOR CHIP ARCHITECTURE HAVING INTEGRATED HIGH-SPEED PACKET SWITCHED SERIAL INTERFACE
Field of the Invention
The present invention relates generally to the field of computing and communication architectures, and more specifically to an architecture for processor and memory access using an integrated a high speed packet switched serial interface directly onto the same chip as the processor arrangement.
Background of the Invention
The term computer architecture in a very broad sense connotes the interconnection of a core set of functional units that include a processing subsystem that executes instructions and acts upon data, a memory subsystem that cooperates with the processing subsystem to enable selected data and instructions to be stored and transferred between the two subsystems, and an input/output (I/O) subsystem that allows at least the processing subsystem to exchange data and instructions with the network and peripheral environment external to the computer. This core set of functional units can be configured into different computer system topologies using various communication interconnection arrangements that govern the interchange of communications between the functional units. For example, a processor and its memories can be locally coupled in a circuit card or it could be geographically spread over a system chassis via a back plane interconnection.
The Personal Computer (PC) represents the most successful and widely used computer architecture. Architecturally, not much has changed since the PC was first introduced in the 1980s. At its core, a typical PC is comprised of a single circuit board, referred to as a motherboard, that includes a microprocessor which acts as the central processing unit (CPU), a system memory and a local or system bus that provides the interconnection between the CPU chip and the system memory chips located on the motherboard and the I/O ports that are typically defined by connectors along an edge of the motherboard. One of the key reasons for the success of the PC architecture was the industry- standardized manner by which the components were interconnected.
A more recent example of a popular chassis-based computer architecture can be found in the area of high performance computing (HPC). One of the architectural innovations in the HPC area has been the adoption of server blade configuration where one or more blades - such as server blades, memory blades, I/O blades, PC blades are plugged into a common rack that is based on industry standards. Instead of putting all of the chips for a computer system on a single motherboard, the functional elements of the computer system are broken out into smaller circuit cards referred to as blades that are then coupled together by a backplane that routes the larges amounts of data among different blades. In most of these HPC blade configurations, the backplane fabric for the common rack has been implemented by a standardized parallel bus interconnection technology such as the PCI bus. Breaking out the functional components onto blades permits more flexibility in terms of configurations of components, while the use of a standardized interconnection such as the PCI bus permits blades from different providers to be configured together in the same common rack. Like the successful PC architecture, the use of a standardized local or system bus interface such as the PCI bus has been critical to the success of the blade architecture for HPC and server computer systems.
One of the parameters that have a significant impact on the system performance and implementation is the memory access method used by processors. There are two fundamental architectures to access memory. One of the architectures is the Von Neumann architecture wherein one shared memory is used to store instructions (program) and data with one data bus and one address bus between processor and memory. This architecture requires instructions and data be fetched sequentially introducing a limitation in operation bandwidth which is often termed the "Von Neuman Bottleneck". The second architecture to access memory is referred to as the Harvard architecture which uses physically separate memories and dedicated buses for their instructions and data. Instructions and operands can therefore be fetched simultaneously. Both architectures involve a bus or buses to transfer information between the processor and memory. It will be appreciated by those skilled in the art that regardless of the processor and memory speeds, the speed of information transfer between the processor and memory can substantially impact the performance of the computer system.
While there have been significant strides with respect to the available CPU power, memory capacity, and memory speeds for the individual components of a computer system, progress in processor-memory interconnections and memory access in terms of the speed of the local or system parallel bus has lagged far behind. Processors and memories that can operate at upwards of 3 GHz clock are known, but local system buses that can operate as a parallel bus interconnection at speeds that match the processor speeds are very rare as such high speed buses are difficult to implement. For example, the system bus, referred to as the front side bus, that is used to externally interface to a Pentium 4 microprocessor chip operates slower than the speed of the processor. Conventionally, I/O devices external to the motherboard communicate over a slow speed I/O bus, such as the (Peripheral Component Interconnect (PCI) Bus, that is connected to a chipset on the motherboard, referred to as a bridge, which in turn communicates with the CPU over the front side bus. While this approach has worked well when I/O devices communicate at speeds that are much slower than the speeds of processors and main memory, current developments in I/O technologies, such as Infiniband and Multi Gigabit Ethernet, can deliver I/O communications at rates approaching upwards of several gigabits per second. These developments have blurred the conventional distinctions between CPU-memory and CPU-I/O transactions and negated the rationale for relegating I/O communications to a separate, slower legacy I/O bus such as the PCI bus.
One of the challenges in attempting to increase the speed of I/O buses, such as the PCI bus and PCI Extended (PCI X) bus, is that a parallel bus arrangement is prone to problems of clock skew between data flowing in the separate parallel data paths that may, for example, differ from each other by a very small path length. Clock recovery and data reconstruction prove to be increasingly problematic and unreliable as path lengths, data transfer speeds and/or the number of parallel paths are increased. Additionally, parallel buses take up considerable circuit board real estate.
Prior art solutions to the problems posed by increasing speeds on parallel buses for both front side buses and I/O buses have involved, for the most part, the use of proprietary protocols that are specific to a given provider of microprocessor chips and chipsets. For example, an advanced version of the front side bus on the Athelon 64/FX/Opteron, by Advanced Micro Devices, can operate at speeds approaching lGhz for a theoretical bandwidth of 14400 MB/s for a parallel bus that is 32 bits wide. Unfortunately, this is a proprietary solution that is incompatible with the general trend of migrating to the adoption of industry wide standards that encourage vendors to develop products which are interoperable with other vendors' solutions so as to reduce time and cost to market for new products.
The problem created by this divergence between processor speeds and memory access speeds is well known and has been referred to in the prior art as the memory gap or memory wall problem. See, e.g., Cuppa et al. , "Organizational Design Trade-Offs at the DRAM, Memory Bus and Memory Controller Level: Initial Results", University of Maryland Systems & Computer Architecture Group Technical Report UMD-SCA- 1999-2, November 1999. The memory gap problem is further compounded by the need to address a large memory capacity. One solution employed in the prior art to overcome the memory wall/memory gap problem is to eliminate the parallel bus interface between the processor and memory and use a serial backplane interface instead of a parallel bus like the PCI bus.
One early attempt to establish a standardized serial backplane interface between processors and memories was the Scalable Coherent Interface. Gustavson, D. and Li, Q., "The Scalable Coherent Interface (SCI)". IEEE Communications (Aug. 1996). Unfortunately, this proposal was not widely adopted.
More recently, proprietary high-speed serial interfaces between processors and memory have been developed by chip manufacturers, such as the AMD® HyperTransport and the Intel® Fully buffered Dimm (FB DIMM). Other alternatives have been proposed in the form serial chip-to-chip interfaces such as described by Trynosky, "Serial Backplane Interface to a Shared Memory," Application Note: Virtex -II Pro FPGA Family, XILINX, November 30, 2004 or and multiple single byte serial processor to memory interfaces as described by Davis, "The Memory Channel," Summit Computer Systems, Inc. September 19, 2004.
The migration from parallel to serial interfaces among components in a computing architecture is not unique to the processor/memory interface. Serial interfaces have also become the standard for almost all I/O communication channels, including back planes. Advanced Switching Interconnect (ASI) switching fabrics that utilizes hierarchies and multiple high speed clocked serial data lanes channels or proprietary packet switched DMA techniques as described, for example, in US Patent No. 6,766,383. Industry standard I/O protocols, such as Infiniband, Fibre Channel and Gigabit Ethernet, can deliver I/O communications at rates approaching upwards of several gigabits per second.
While the speeds of a serial I/O protocol theoretically could approach the speeds needed for the processor/memory interface, the communication overhead associated with serial I/O protocols has curtailed any serious attempts to consider using serial I/O protocols as a basis for a processor/memory interface. Serial I/O communication protocols generally have larger packet and address sizes that are better suited for accessing large amounts of data stored on disk or over a network. The larger packet and address sizes results in an increased communication overhead penalty. The processor/memory interface conventionally has required the ability to transfer data between the processor and memory for a single address location, a requirement for which the overhead of I/O transfers and protocols has been seen as massive overkill. In addition, there are many more transmission blocking and memory contention concerns that need to be addressed for I/O communications than for processor-to- memory interfaces.
Some alternatives that utilize a serial I/O interface protocol for backplane connections instead of parallel bus interconnection technologies have been proposed. U.S. Publ. Appl. No. 20050091304 discloses a control system for a telecommunication portal that includes a modular chassis having an Ethernet backplane and a platform management bus which houses at least one application module, at least one functional module, and a portal executive. In this patent application, a lOOOBaseT (Gigabit Ethernet) backplane provides a packet-switched network wherein each of the connected modules acts as an individual node on a network in contrast to a conventional parallel bus connection such as a PCI bus.
U.S. Publ. Appl. No. 20060123021 discloses a hierarchical packaging arrangement for electronic equipment that utilizes an Advanced Telecommunication Computing Architecture (TCA) arrangement of daughter boards in the for an Advanced Mezzanine Card (AMC) that are interconnected with a hierarchical packet-based interconnection fabric such as Ethernet, RapidIO, PCI Express or Infiniband. In this arrangement, the AMCs in each local cube are connected in a hierarchical configuration by a first, lower speed interface such a Gigabit Ethernet for connections within the local cube and by a second, higher speed interface such as 1OG Ethernet for connections among cubes.
The problems of Ethernet switched backplane architectures in terms of latency, flow control, congestion management and quality of service are well known and described, for example, by Lee, "Computation and Communication Systems Need Advanced Switching," Embedded Intel Solutions, Winter 2005. These issues have generally discouraged the adoption of serial I/O protocols for communications between processors and memory even as such serial I/O protocols are being used in the smaller physical dimensions of a circuit board or a computer or communication rack or cabinet having multiple cards/blades interconnected by a backplane. Instead, the trend has been to increase the capacity of individual chips and the physical size of each of the server blades in order to accommodate more processors and memory on a single chip or circuit board, thereby reducing the need for processor and memory interconnection that must be mediated across the backplane.
As processor speeds, memory speeds and network speeds continue to increase, and as the external I/O is increasingly capable of delivering data at rates exceeding gigabit speeds, the current architectures for arranging the subsystems within a computing and communication architecture are no longer efficient. The problem of memory access like the Von Newman and Harvard architectures, in the light of multiple processor cores with in a chip further aggravates the processor and memory interconnect technology. There is therefore a need for a computing and communication chip architecture that is not constrained by the current architectural limitations and can provide a solution that is compatible with industry configuration standards and is scalable to match the speed, capacity and processing core requirements of a converged computing environment of the next generation computers and communications equipment. Summary of the Invention
The present invention is directed to a computing and communication chip architecture wherein the off-chip interfaces of processor and memory chips are implemented as a highspeed packet switched serial interfaces as part of each chip in a semiconductor package. In one embodiment, the high-speed packet switched serial interface is a gigabit Ethernet interface implemented by a packet processor co-located with at least one processor core within the chip package. The serial interface is configured to transfer data, address and control information, required to fetch and write data from and to an external memory device such as a system main memory using a serial packetized protocol. Communications between at least one processor and the external memory device may be mediated by at least one bridge device capable of translating between multiple serialized protocols and optionally a switch device adapted to mediate communications between on-chip entities such as processor cores, caches, and the packet processor, as well as the communications between on-chip entities and off-chip devices such as the system main memory.
In an exemplary embodiment, the packet processor is implemented as a on the fly programmable bit stream protocol processor integrated as part of the chip. In one embodiment, a processor chip with cache can connect to a system or main memory chipset via a bit stream protocol processor incorporated as part of the microprocessor chip. In one embodiment the processor serial interface can be a lOGiga bit Ethernet interface. In these embodiments, the protocol processor encapsulates the memory address and control information like Read, Write, number of successive bytes etc, as an Ethernet packet for communication among the processor(s) and memory chips that are located on the same chip, or on the motherboard, or alternatively on different circuit cards. In one embodiment, the communication overhead of the Ethernet protocol is further reduced by using an enhanced Ethernet protocol with shortened data frames within a constrained neighborhood, and/or by utilizing a bit stream switch where direct connection paths can be established between elements that comprise the computing or communication architecture.
The above summary of the various embodiments of the invention is not intended to describe each illustrated embodiment or every implementation of the invention. The figures in the detailed description that follow more particularly exemplify these embodiments.
Brief Description of the Drawings
The invention may be more completely understood in consideration of the following detailed description of various embodiments of the invention in connection with the accompanying drawings, in which:
Figures IA, IB, 1C, ID, and IE illustrate various configurations of front side bus arrangements for prior art processor chipset architectures.
Figures 2 A depicts a chip architecture according to one aspect of the present invention wherein the processor chip package communicates externally via at least one serial line extending from a packet processor based parallel bus to serial interface converter located on the die.
Figure 2B is a block diagram representation of a multi-core processor chip package according to one embodiment of the present invention that is communicatively coupled to the devices external to the chip via at least one programmable serial interconnect extending from a switch and a parallel bus to serial interface module located within the chip package.
Figure 2C is a block diagram representation of a multi-core processor chip package according to one embodiment of the present invention that is communicatively coupled to the devices external to the chip via at least one serial line extending from a module located within the package and adapted to function as a combination switch and a parallel bus to serial interface. Figure 2D is a block diagram representation of a packet processor based Ethernet bridge that provides protocol translation and the serves as a "Southbridge" in a processor chip that features a unified computing, backplane, and network architecture.
Figure 3A illustrates a more detailed block diagram of a packet processor based parallel bus to serial interface converter that incorporates a token based, point-to-point communication in Ethernet between communications generating and consuming nodes in the system in accordance with one embodiment of the present invention.
Figure 3B illustrates a detailed block diagram of a packet processor based parallel bus to a serial interface that converts to and from parallel bus communications and serial packetized communications based on a pre-defined serial packet protocol in accordance with one embodiment of the present invention.
Figure 3 C is a schematic representation of a packet processor based parallel bus to serial interface converter in which the serial packet protocol output from the converter is programmable.
Figure 4 illustrates an embodiment of the present invention incorporated into a three- dimensional chip architecture.
Figure 5A is a block diagram of a processor chip package containing a single processor "core" that communicated externally via at least one serial line according to one embodiment of the present invention
Figure 5B is a block diagram of a processor chip package containing multiple processor "cores" each of which is placed in serial communication with a port on an external switch that in turn communicates with devices external to the chip package.
Figure 5C is a block diagram of a processor chip package containing multiple processor "cores" each of which communicates with a multi-port parallel bus to serial interface converter contained within the chip package and placed in serial communication with devices external to the chip package via at least one serial line.
While the invention is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit the invention to the particular embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Detailed Description of the Preferred Embodiments
Figures IA, IB, 1C, ID, and IE illustrate various configurations of front side bus (alternately "Channel") arrangements for prior art processor chipset architectures. In each of these configurations, a clocked bus interface 10 is used between the processor chip 15 and one or more support chips 20 for purposes of routing data and instructions among the various elements of the computer architecture 5.
Conventional architectures feature a channel, variously referred to as the Front Side Bus ("FSB"), the Processor Side Bus, the Memory Bus, the Data Bus, or the System Bus, over which the CPU communicates with, for example, a motherboard chipset such as the Northbridge and Southbridge controllers illustrated, for example, in figure ID. The Northbridge 25 interconnects the CPU 15 to the RAM memory 30 via the FSB. The Northbridge also connects peripherals such as the graphics card 35 via high speed channels such as the AGP and the PCI Express. The Southbridge controller 40 handles I/O including hard drives, USB, serial and parallel ports and external storage devices via other channels running communication protocols such as Ethernet and PCI Express. Currently, most front-side bus (FSB) speeds cannot deliver the performance required of telecommunication and computing applications designed to comply with contemporary industry wide standards. For example, the PICMG® Advanced Mezzanine Card (AMC) specification defines the base-level requirements for a wide range of next generation highspeed mezzanine cards. For example, the AMC card interconnect is specified at 12. 5 Gbps per differential pair. Xilinx operates at 8 Gbps and Fujitsu offers a 10-Gigabit Ethernet Switch. In comparison, the Intel Itanium 2 processor front-side bus (FSB) speed is approximately 667 MHz, the AMD Opteron (TM) Front Side Bus frequency is approximately 1. 4 - 2. 8 GHz, and the Intel Hub Architecture (IHA), which substitutes the Memory Controller and the I/O Controller for the Northbridge and Southbridge controllers, features a system bus between the CPU and the Memory Controller that is capable of operating at speeds of 400 GHz, even though the dual RDRAM operate through the Memory Controller Hub (MCH) 25, to deliver a memory bandwidth of 3. 2 GB/s as illustrated in Figure ID. Figure IE illustrates the IHA based multiprocessor architecture known to the art.
One skilled in the art will appreciate that communication over the FSB and through the memory controller hub 45 of Figure IE, for example, introduces latency in RAM memory read operations. Furthermore, the RAM memory access and I/O share the FSB bandwidth which can further degrade performance of the FSB. Clearly, the telecommunication and high performance computing applications designed to conform to the aforementioned industry specifications require an architecture that is faster than the performance limits of the aforementioned interconnects and is capable of operation under a wide range of industry standard protocols such as Ethernet and PCI Express.
Referring to Figures 2A-2C, there is illustrated a multi-core processor architecture 50 according to a primary embodiment of the present invention. One aspect of the illustrated multi-core processor architecture 50 takes the form of a single physical package 55 (alternately "Processor Chip Package") that is received into a single processor socket (not illustrated). This single physical package 55 includes a plurality of execution cores (alternatively, computational engines, or processing engines) 60 but an external operating system perceives the package as a single processor. In one embodiment, the core 60 can be pin compatible with existing processor sockets. Each execution core 60 includes its own processor-specific functional blocks such as, for example, caches, arithmetic logic units (ALU)s, priority interrupt controller, architectural registers, pipeline prediction mechanisms, and instruction set as seen in the illustrations of Figures 5A-5C. Each execution core is capable of independently executing program instructions and a plurality of threads under the direction of the external operating system. In associated embodiments, the cores can execute internal and/ or external instructions in cooperation with the remaining core or cores in the package, an operating system can differentiate between the services provided by each of the cores and the cores can access shared resources such as cache and external system memory 70 as seen in Figures 2A and 2C for example. In other embodiments, the operating system may be capable of supporting parallel execution among multiple cores and each core, or various combinations of cores, can be seen by the operating system as separate parallel processing units.
It will be appreciated that the present invention is not limited by any particular core or number of cores that might reside within a single physical package 55. In particular, the execution cores can be one or more of the Smithfield core used in Intel's 90 nanometer Pentium D's and Pentium Extreme Edition 840, the Presler core used in Intel's 65 nanometer Pentium Extreme Edition 955 processor, AMD's 90 nanometer Egypt and Denmark cores. Other cores can be used within the scope of the present invention.
An important feature of the present invention is that data-communication between the processor 55 and the system devices 80 occur via at least one serial interconnect 90 mediated by a bridge-architecture 100 that in at least one embodiment communicates with a switch- architecture 105 as seen in figure 5C for example. The switch-architecture 5C is the gateway via which the rest of the devices 80 in the system and the processor communicate. In one embodiment, the bridge-architecture 100 and optionally the switch-architecture 105 (alternately collectively "Parallel bus to serial interface converter") are located on the processor die in an integrated configuration as illustrated in Figure 2C. In such cases, one or more of the bridge-architecture and switch-architecture may be implemented in the form of additional core or cores on the die. Exemplary embodiments of the processor die configurations are illustrated in Figures 5A-5C. In another configuration, the switch maybe located outside the die as illustrated in Figure 2 A, 2B, 5 A and 5B. One of skill in the art will readily recognize that all such configurations of the bridge-architecture and switch- architecture are included within the scope of the present invention. It must be emphasized that although the aforementioned embodiments are described for a multi-core architecture, the disclosed invention is equally applicable to the case where the processor package includes only one core (single processor) and to the case where the bridge-architecture and the switch- architecture is a single module, such as the parallel bus to serial interface converter 120 in Figures 2C and 5C for example.
In one embodiment, the bridge-architecture is implemented using a packet processor architecture as shown in Figure 3A-3C. Figure 3B is a specific embodiment of a typical packet processor according to the present invention. Communications from the processor transferred over a parallel bus 150, such as the data, address and control information related to a "write" command to external system memory issued by a processor core 60 in an exemplary processor chip package 55, is processed by the packet processor portion 180 to generate serial packetized communications 155 (165) that are transferred via one or more serial lines 90 outside the chip package 55. Serial communications 160 (170) received from outside the chip package 55 are processed by the packet processor portion 188 into parallel communications transferred over parallel bus 150 to the processor as exemplified in Figures 3A and 3B. It will be appreciated that the serial-to-parallel transformations may be applied to communications between processor cores within the chip package, processor-core and external devices including other chip packages and I/O devices within the scope of the present invention. The functional blocks of Figures 3A and 3B may be adapted according to a bitstream processor (BSP) architecture illustrated in Figure 3C for instance. The Bitstream processor is an on the fly programmable integrated packet processor, security engine and traffic manager using high performance pipelined packet switching architecture. The Bitstream processor may be physically implemented as an additional "core", integrated with other logic devices on the processor die or on a stand-alone chip while remaining within the scope of the present invention.
In one embodiment of the present invention, the Bit Stream processor performs a forward and reverse bridging function using a programmable pipelined architecture that provides high degree of flexibility for adaptation to legacy, existing and emerging board-level and network-level data communication / signaling protocols. Each stage/block within the pipeline has specific functions or responsibilities that make available any relevant information to the subsequent blocks. As a consequence, the architecture for each stage is different and is optimized to handle a given function. Each stage can be dynamically programmed on a packet by packet basis while the processor cores transfer data/instructions by sending several bits at one time over a parallel communications link. The intra-core data/instructions use signaling that is native to the processor-core and the associated system bus characterizing a vendor-specific CPU architecture such as for example, signaling compatible with the front side bus by Intel, the hyper transport technology based interconnect protocol by AMD or other proprietary /non-proprietary bus protocols. The Bit Stream processor bridges between the intra-processor protocol and one of a set of board-level or network level serial communication protocols. Upstream information transfers to the processor from the bridge are parallelized, formatted and clocked so that they represent the native signaling used by the processor cores. Responses from the cores (i. e. the downstream information transfers such as for example, the memory requests or other system requests, are serialized and packetized by the Bitstream processor.
In one embodiment the Bitstream processor that processes the packets takes the form as described in more detail in the U.S. Application Serial No. 11/466,367, filed August 23, 2006, entitled "Omni-Protocol Engine for Reconfigurable Bit-Stream Processing in High- Speed Networks," the disclosure of which is hereby incorporated by reference. The packet processing by the Bitstream processor causes the packets to be bridged to a desired board level or network level protocol/bus-architectures and forwarded to the switch-architecture. Exemplary protocols include, without limitation, PCI-Express, 10 Gigabit Ethernet, Infiniband, Advanced Switching, RapidIO, SPI 4. 2, XAUI and Serial I/O. Other protocols may be advantageously used without limiting the scope of the present invention.
An alternate embodiment of the present invention, illustrated in Figure 5A and 5B, contemplates an arrangement of the processor and bridge wherein the packet processor enables on-die connections for each of the plurality of protocols via separate ports comprised by one or more processor pins. Each port is configured to provide serial input/output to the processor in accordance with a specific pre-defined protocol.
In another related embodiment, the Bitstream processor is programmable to allow software based programming of the protocols characterizing communications at any particular serial interconnect or port. Each of the cores can be specialized to be application specific - such as packet processing for telecommunications, graphics engine functionality for gaming, and parallel computations for high performance computing. The Bitstream processor can be programmed to assign all traffic associated with a particular core to a specified port. In another embodiment of the present invention the aforementioned port can couple to an Advanced Mezzanine Card (AMC) module and provide processor support to the module where applicable or provide all or part of the Module Management Controller (MMC) functionality in an AdvancedTCA® (ATCA) based open modular system architecture.
Referring again to Figure 2C, there is shown a block diagram representation of another feature of the present invention. As seen in Figure 2C, the packet processor based bridge-architecture is coupled via a serial interconnect to a switch-architecture. The switch- architecture is a non-blocking switch that provides serial, high-speed, point-to-point connections in a cut-thorough mode between multiple devices and the processor. The switch- architecture may be implemented through merchant switches such as, for example, the GigPCI-Express switch, model 6468- 8-port Gigabit Ethernet switch by DSS networks, or the MB8AA3020 20-port, lOGbps Ethernet (10GbE) switch IC by Fujitsu Microelectronics America.
In Figures 5B and 5C there is illustrated a multi core embodiment of the Ether PC of the present invention with dual cores in which one of the cores is dedicated for communication applications. In this illustrated multi core embodiment, there is separate program space and data space. The cores can access any space by switching between the two. The Data to I/O is switched. The switch allows a memory request originating at an execution core to be switched to one or more external memory resources thereby overcoming memory bandwidth limitations inherent in conventional architectures where memory requests traverse a single data communication bus to and fro from a single system memory resource.
Another embodiment of the present invention contemplates a switching architecture implementation using the packet processor illustrated in Figures 2C and 5C. One of the features of such an embodiment is combined bridge-switch architecture located on the processor die and capable of providing the services described above.
Another embodiment contemplates integrating the architectures disclosed in U.S. Application Serial No. 11/828,329, filed July 25, 2007, entitled "Telecommunication and Computing Platforms with Serial Packet Switched Integrated Memory Access Technology," (the disclosure of which is hereby incorporated by reference) into a single die / processor package.
In one embodiment illustrated in Figure 3A, the packet protocol processor allows line speed QoS packet switching which is utilized to implement a simple token based communication in Ethernet between the processor and the devices in the system as set forth in U.S. Application Serial No. 1 1/838,198, filed August 13, 2007, entitled "Enhanced Ethernet Protocol for Shortened Data Frames Within a Constrained Neighborhood Based on Unique ID," the disclosure of which is hereby incorporated by reference. In this embodiment, the packetized communication over the bridge-switch-architecture is further specialized to speed-up sustained, point-to-point communications in the system. Each packet is provided with a source address (SA) and destination address (DA) and E-type like VLAN Tag for use in negotiating a unique token between end points on a communication link. The E-type extensions may be, for example, Request for UNIQUE ID or TOKEN GRANT; data communication with the granted token and request to retire the TOKEN. Once the TOKEN has been granted, the SA and DA fields are used along with the E-type to pass short date. This may also be extended to include large blocks of data for STA, and SAS. In other embodiments, once a UNIQUE ID is negotiated between end-points and an intermediate node connecting these end-points, a fixed frame size is used to endow the link with predictable performance in transferring the fixed frame and consequently meet various latency requirements. For example, the S A/DA pair could be used to transmit 12 bytes of data, 2 E- Type bytes and 2 bytes TAG.
One of the embodiments to go along with multiple extended memories is multiple caches. In one embodiment, the processor card is provided with two switchable caches (like two register files for threads). On a cache miss, the processor switches over from the first cache to the second cache to begin processing a second program thread associated with the second case. In another embodiment, there could be a cache per extended memory.
In one embodiment, control is provided as part of the extended Ethernet protocol. This could also "add" to the CPU wait cycles if more than one processor requests the same block of memory. In a sense that would be a component of latency because the processor and the instructions scheduled for execution cannot distinguish between data locality dependent latency (speed of access and transfer) versus concurrency control based data access "gap" because barring data mirroring concurrent access is not instantaneous access.
In another embodiment, the memory modules of the illustration of Figure 2A and 2C comprise four Channel Fully-Buffered Dual Inline Memory Modules (FB-DIMM)s. FB- DIMM memory uses a bi-directional serial memory bus which passes through each memory module. The FB-DIMM transmits memory data in packets, precisely controlled by the AMB (Advanced Memory Buffer) chips built into each FB-DIMM module. In one embodiment of the present invention, the four Channel FB-DIMMS are connected to 4OG lines and terminated to FB-DIMM lanes. The AMB is 10 lanes serial south bound and 14 lanes serial North bound. In terms of the AMC card of Figure 2C, the AMB is configured to be a 16 Lane Fabric having less than 5 Gbps total bandwidth coming out of the memory Controller of Figure 4A. Using commercial chips, such as for example, the Fujitsu Axel X (by Fujitsu Microelectronics America) which can provide speeds of 1OG per lane, the aforementioned requirements can be met by the use of a single 1OG lane. Additional bandwidth in excess of than 5 Gbps is provided by the use of multiple AMCs or multiple lanes. It will be appreciated that there is Serialization and De-serialization on the DRAM end and serialization and deserialization on the processor side. The latency penalty of the Switch, and any overhead in the serialization and de- serialization methods due to the serialization/de-serialization can be overcome in the manner set forth in the succeeding paragraphs.
In one embodiment, latency and contention/concurrency issues within the Ethernet switched fabric are resolved within a "contained network." Deterministic latency (tolerable margin jitter) through a "well contained network" (such as the packaging arrangement as described herein) is indeed possible. Switching priority, dedicated ports (a pseudo port to dedicated memory ports, communicating over Unique IDs between these ports and other techniques disclosed in the previously identified application entitled "Enhanced Ethernet Protocol for Shortened Data Frames Within a Constrained Neighborhood Based on Unique ID," are advantageously utilized to overcome latency and contention/concurrency related issues.
In another embodiment, the present invention can be adapted to support a mesh architecture of processor-to-processor interconnection via the switched Ethernet fabric. In one embodiment, N-I connections are made to each node with each node have 2 connections to all other nodes, n other embodiments, different combinations of number of Ethernet ports/card, number of ports/switch and number of switches/packaging arrangement can provide for various combinations of connections per node.
In another embodiment, the bit stream protocol processor enables prioritized switching. In conjunction with the modular and scalable three-dimensional chip architecture of the previous paragraph, the present invention allows the creation of an N-layered hierarchy of multiprocessors where N is both hardware independent and dynamically selectable by altering the prioritization afforded to different subsets of processors in the bit stream protocol processor mediated fabric. This embodiment enables the chip architecture to be configured as a shared memory model machine as well as a message passing model multiprocessor machine. Alternately, the architecture in accordance with one embodiment of the present invention may be configured as a server, a storage area network controller, a high performance network node in a grid computing based model, or a switch/router in a telecommunication network. It will be recognized that the same basic machine may be programmatically or manually altered into one or more of the aforementioned special purpose machines as and when desired.
Finally, while the present invention has been described with reference to certain embodiments, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.
For purposes of interpreting the claims for the present invention, it is expressly intended that the provisions of Section 112, sixth paragraph of 35 U. S. C. are not to be invoked unless the specific terms "means for" or "step for" are recited in the subject claim.

Claims

CLAIMSWhat is claimed is:
1. An apparatus implementing a computing and communication chip architecture for integrated circuitry, comprising: at least one processor core; and at least one packet processor uniquely associated with each of the at least one processor core, the at least one packet processor adapted to provide a high-speed packet switched serial interface to the at least processor core, wherein the at least one processor core and the at least one packet processor are co-located on a semiconductor die package having at least one external port over which the high-speed packet switched serial interface is accessible, such that the high-speed packet switched serial interface transfers data, address and control information required to fetch and write data from and to an external memory device configured as a system main memory for the at least one processor core using a serial packetized protocol.
2. The apparatus of claim 1 further comprising: a plurality of processor cores, each processor core with at least one packet processor uniquely associated therewith; and at least one bridge interface operably connected to each packet processor and co- located on the semiconductor die package and adapted to translate between multiple serialized protocols communicated over the high-speed packet switched serial interface.
3. The apparatus of claim 1 further comprising: a plurality of processor cores, each processor core with at least one packet processor uniquely associated therewith; and at least one switch interface operably connected to each packet processor and to the at least one external port and co-located on the semiconductor die package and adapted to mediate serial packetized communications among the packet processors and the at least one external port.
4. The apparatus of claim 1 wherein the processor core further comprises a cache memory accessed via the associated at least packet processor for that processor core.
5. The apparatus of claim 1 wherein the packet processor is implemented as an on-the-fly programmable bit stream processor.
6. The apparatus of claim 1 wherein the high-speed packet switched serial interface is an Ethernet interface.
7. The apparatus of claim 1 wherein the high-speed packet switched serial interface is further adapted to transfer data from and to an external packet-switched network in addition to the system main memory.
8. The apparatus of claim 7 wherein the external packet-switched network is the Internet.
9. A method of implementing a computing and communication chip architecture for integrated circuitry, comprising: providing a semiconductor die package having co-located thereon at least one processor core with at least one packet processor uniquely associated with each of the at least one processor core, the at least one packet processor adapted to provide a highspeed packet switched serial interface to the at least processor core; and utilizing the high-speed packet switched serial interface to transfer data, address and control information required to fetch and write data from and to an external memory device configured as a system main memory for the at least one processor core using a serial packetized protocol.
10. The method of claim 9 further comprising: providing a plurality of processor cores on the semiconductor die package, each processor core with at least one packet processor uniquely associated therewith; and providing at least one bridge interface operably connected to each packet processor and co-located on the semiconductor die package; and utilizing the at least one bridge interface to translate between multiple serialized protocols communicated over the high-speed packet switched serial interface.
11. The method of claim 9 further comprising: providing a plurality of processor cores on the semiconductor die package, each processor core with at least one packet processor uniquely associated therewith; and providing at least one switch interface operably connected to each packet processor and to the at least one external port and co-located on the semiconductor die package; and utilizing the at least one switch interface to mediate serial packetized communications among the packet processors and the at least one external port.
12. The method of claim 9 further comprising providing on the semiconductor die package a cache memory for the at least one processor core adapted to be accessed via the associated at least packet processor for that processor core.
13. The method of claim 9 wherein the high-speed packet switched serial interface further transfers data from and to an external packet-switched network in addition to the system main memory.
14. A computer readable media having recorded thereon instructions for implementing a computing and communication chip architecture for integrated circuitry on a semiconductor die package, comprising: instructions defining at least one processor core co-located on the semiconductor die package with at least one packet processor uniquely associated with each of the at least one processor core, the at least one packet processor adapted to provide a highspeed packet switched serial interface to the at least processor core; and instructions defining at least one external port to the semiconductor die package over which the high-speed packet switched serial interface is accessible, such that the high-speed packet switched serial interface transfers data, address and control information required to fetch and write data from and to an external memory device configured as a system main memory for the at least one processor core using a serial packetized protocol.
15. The computer readable media of claim 14 further comprising: instructions defining a plurality of processor cores on the semiconductor die package, each processor core with at least one packet processor uniquely associated therewith; and instructions defining at least one bridge interface operably connected to each packet processor and co-located on the semiconductor die package adapted to be utilized to translate between multiple serialized protocols communicated over the high-speed packet switched serial interface.
16. The computer readable media of claim 14 further comprising: instructions defining a plurality of processor cores on the semiconductor die package, each processor core with at least one packet processor uniquely associated therewith; and instructions defining at least one switch interface operably connected to each packet processor and to the at least one external port and co-located on the semiconductor die package to mediate serial packetized communications among the packet processors and the at least one external port.
17. The computer readable media of claim 14 further comprising instructions defining a cache memory for the at least one processor core adapted to be accessed via the associated at least packet processor for that processor core.
18. The computer readable media of claim 14 wherein the semiconductor die package is a field programmable gate array (FPGA) and the instructions are firmware adapted to configure the FPGA.
19. The computer readable media of claim 14 wherein the semiconductor die package is an application specific integrated circuit (ASIC) and the instructions are firmware adapted to configure the ASIC.
PCT/US2008/052969 2007-02-02 2008-02-04 Processor chip architecture having integrated high-speed packet switched serial interface WO2008095201A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020097018172A KR101453581B1 (en) 2007-02-02 2008-02-04 Processor chip architecture having integrated high-speed packet switched serial interface
CN2008800038694A CN101918931B (en) 2007-02-02 2008-02-04 Processor chip architecture having integrated high-speed packet switched serial interface

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US88798907P 2007-02-02 2007-02-02
US60/887,989 2007-02-02

Publications (1)

Publication Number Publication Date
WO2008095201A1 true WO2008095201A1 (en) 2008-08-07

Family

ID=39674524

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/052969 WO2008095201A1 (en) 2007-02-02 2008-02-04 Processor chip architecture having integrated high-speed packet switched serial interface

Country Status (4)

Country Link
US (5) US7822946B2 (en)
KR (1) KR101453581B1 (en)
CN (1) CN101918931B (en)
WO (1) WO2008095201A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8771064B2 (en) 2010-05-26 2014-07-08 Aristocrat Technologies Australia Pty Limited Gaming system and a method of gaming
CN104794088A (en) * 2015-04-22 2015-07-22 成都为开微电子有限公司 Multi-interface bus converting expanding chip design
WO2015127327A1 (en) * 2014-02-23 2015-08-27 Rambus Inc. Distributed procedure execution and file systems on a memory interface
EP2933728A1 (en) * 2014-04-17 2015-10-21 ADVA Optical Networking SE Using serdes loopbacks for low latency functional modes with full monitoring capability
EP3001610A1 (en) * 2014-09-29 2016-03-30 F5 Networks, Inc Methods for sharing bandwidth across a packetized bus and systems thereof
WO2016160736A1 (en) * 2015-03-30 2016-10-06 Integrated Device Technology, Inc. Methods and apparatus for efficient network analytics and computing card
US10042809B2 (en) 2015-03-20 2018-08-07 Electronics And Telecommunications Research Institute Method for communication using PCI express dedicated communication module and network device including the same
CN114063520A (en) * 2021-11-17 2022-02-18 首都师范大学 Switch, communication system and control method

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912075B1 (en) * 2006-05-26 2011-03-22 Avaya Inc. Mechanisms and algorithms for arbitrating between and synchronizing state of duplicated media processing components
US8392634B2 (en) * 2007-03-09 2013-03-05 Omron Corporation Programmable controller with building blocks having modules that can be combined into a single unit
US8379656B2 (en) * 2009-09-04 2013-02-19 Equinix, Inc. Real time configuration and provisioning for a carrier ethernet exchange
US8612711B1 (en) * 2009-09-21 2013-12-17 Tilera Corporation Memory-mapped data transfers
US9082091B2 (en) 2009-12-10 2015-07-14 Equinix, Inc. Unified user login for co-location facilities
US9081653B2 (en) 2011-11-16 2015-07-14 Flextronics Ap, Llc Duplicated processing in vehicles
TW201322697A (en) * 2011-11-30 2013-06-01 Hon Hai Prec Ind Co Ltd Baseboard management controller electronic device and controlling method thereof
CN102591602B (en) * 2011-12-30 2014-07-09 浙江大学 High-speed digital printing processing system and method on basis of multi-core processor
US20140233582A1 (en) * 2012-08-29 2014-08-21 Marvell World Trade Ltd. Semaphore soft and hard hybrid architecture
US9973501B2 (en) * 2012-10-09 2018-05-15 Cupp Computing As Transaction security systems and methods
US9129071B2 (en) * 2012-10-24 2015-09-08 Texas Instruments Incorporated Coherence controller slot architecture allowing zero latency write commit
CN103838689B (en) * 2012-11-23 2016-11-23 普诚科技股份有限公司 Interface transmission method and data transmission system
US9817728B2 (en) 2013-02-01 2017-11-14 Symbolic Io Corporation Fast system state cloning
US10133636B2 (en) 2013-03-12 2018-11-20 Formulus Black Corporation Data storage and retrieval mediation system and methods for using same
US9304703B1 (en) 2015-04-15 2016-04-05 Symbolic Io Corporation Method and apparatus for dense hyper IO digital retention
US9628108B2 (en) 2013-02-01 2017-04-18 Symbolic Io Corporation Method and apparatus for dense hyper IO digital retention
US9201837B2 (en) * 2013-03-13 2015-12-01 Futurewei Technologies, Inc. Disaggregated server architecture for data centers
US9252131B2 (en) * 2013-10-10 2016-02-02 Globalfoundries Inc. Chip stack cache extension with coherency
CN103647708A (en) * 2013-11-29 2014-03-19 曙光信息产业(北京)有限公司 ATCA-based data message processing board
US10397634B2 (en) * 2014-03-25 2019-08-27 Synamedia Limited System and method for synchronized presentation of video timeline metadata
US9491886B1 (en) * 2014-08-29 2016-11-08 Znyx Networks, Inc. Compute and networking function consolidation
US9971733B1 (en) 2014-12-04 2018-05-15 Altera Corporation Scalable 2.5D interface circuitry
US10061514B2 (en) 2015-04-15 2018-08-28 Formulus Black Corporation Method and apparatus for dense hyper IO digital retention
JP2016225943A (en) * 2015-06-03 2016-12-28 富士通株式会社 System for restoring device status and restoration method of device status in system
US10091904B2 (en) * 2016-07-22 2018-10-02 Intel Corporation Storage sled for data center
US10225230B2 (en) 2016-12-14 2019-03-05 Raytheon Company System and method for address-mapped control of field programmable gate array (FPGA) via ethernet
CN108268940B (en) * 2017-01-04 2022-02-18 意法半导体股份有限公司 Tool for creating reconfigurable interconnect frameworks
US11095556B2 (en) * 2017-06-30 2021-08-17 Intel Corporation Techniques to support multiple protocols between computer system interconnects
US11249808B2 (en) 2017-08-22 2022-02-15 Intel Corporation Connecting accelerator resources using a switch
RU183879U1 (en) * 2017-10-25 2018-10-08 Публичное акционерное общество "Институт электронных управляющих машин им. И.С. Брука" Processor module
US10572186B2 (en) 2017-12-18 2020-02-25 Formulus Black Corporation Random access memory (RAM)-based computer systems, devices, and methods
CN110297793B (en) * 2018-03-22 2021-07-23 杭州海康威视数字技术股份有限公司 Chip identification method and device and electronic equipment
US11789883B2 (en) * 2018-08-14 2023-10-17 Intel Corporation Inter-die communication of programmable logic devices
US10871906B2 (en) 2018-09-28 2020-12-22 Intel Corporation Periphery shoreline augmentation for integrated circuits
KR20200066774A (en) 2018-12-03 2020-06-11 삼성전자주식회사 Semiconductor device
CN109669729B (en) * 2018-12-26 2022-11-01 杭州迪普科技股份有限公司 Starting guide method of processor
US10725853B2 (en) 2019-01-02 2020-07-28 Formulus Black Corporation Systems and methods for memory failure prevention, management, and mitigation
CN109522265A (en) * 2019-01-17 2019-03-26 蓝怡科技集团股份有限公司 A kind of DEU data exchange unit and method
CN110262993B (en) * 2019-06-11 2022-02-08 浙江华创视讯科技有限公司 Input information reading method and circuit, storage medium and electronic device
US11176065B2 (en) * 2019-08-12 2021-11-16 Micron Technology, Inc. Extended memory interface
US11164847B2 (en) 2019-12-03 2021-11-02 Intel Corporation Methods and apparatus for managing thermal behavior in multichip packages
CN112039805B (en) * 2020-08-27 2021-09-03 成都坤恒顺维科技股份有限公司 Low-delay jitter high-speed signal switching system
US20220114125A1 (en) * 2020-10-09 2022-04-14 Intel Corporation Low-latency optical connection for cxl for a server cpu
CN112395233A (en) * 2020-11-30 2021-02-23 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Software definition switching system and method based on CPU and SDI chip
RU208501U1 (en) * 2021-06-28 2021-12-22 Общество с ограниченной ответственностью "Форк" Motherboard
CN113594077B (en) * 2021-07-22 2024-03-08 重庆双芯科技有限公司 Multistage chip serial system chip positioning method and multistage chip serial system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5347514A (en) * 1993-03-26 1994-09-13 International Business Machines Corporation Processor-based smart packet memory interface
WO2002065717A1 (en) * 2001-02-14 2002-08-22 Dynarc Inc. Dba Dynamic Network Architecture Inc., In Ca Dynamic packet processor architecture
US6449273B1 (en) * 1997-09-04 2002-09-10 Hyundai Electronics America Multi-port packet processor
US7107402B1 (en) * 2001-10-23 2006-09-12 Stephen Waller Melvin Packet processor memory interface

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4979100A (en) * 1988-04-01 1990-12-18 Sprint International Communications Corp. Communication processor for a packet-switched network
CA2051222C (en) * 1990-11-30 1998-05-05 Pradeep S. Sindhu Consistent packet switched memory bus for shared memory multiprocessors
US6453388B1 (en) * 1992-06-17 2002-09-17 Intel Corporation Computer system having a bus interface unit for prefetching data from system memory
US5544162A (en) * 1995-01-10 1996-08-06 International Business Machines Corporation IP bridge for parallel machines
US5613071A (en) * 1995-07-14 1997-03-18 Intel Corporation Method and apparatus for providing remote memory access in a distributed memory multiprocessor system
US6434156B1 (en) * 1998-07-24 2002-08-13 Nortel Networks Limited Virtual switching for interconnected networks
US6928505B1 (en) * 1998-11-12 2005-08-09 Edwin E. Klingman USB device controller
US6718407B2 (en) * 1999-09-30 2004-04-06 Intel Corporation Multiplexer selecting one of input/output data from a low pin count interface and a program information to update a firmware device from a communication interface
US6778548B1 (en) * 2000-06-26 2004-08-17 Intel Corporation Device to receive, buffer, and transmit packets of data in a packet switching network
US7401126B2 (en) * 2001-03-23 2008-07-15 Neteffect, Inc. Transaction switch and network interface adapter incorporating same
JP2003084919A (en) * 2001-09-06 2003-03-20 Hitachi Ltd Control method of disk array device, and disk array device
US20040019704A1 (en) * 2002-05-15 2004-01-29 Barton Sano Multiple processor integrated circuit having configurable packet-based interfaces
US7412588B2 (en) * 2003-07-25 2008-08-12 International Business Machines Corporation Network processor system on chip with bridge coupling protocol converting multiprocessor macro core local bus to peripheral interfaces coupled system bus
US20050281282A1 (en) * 2004-06-21 2005-12-22 Gonzalez Henry J Internal messaging within a switch
US7453904B2 (en) * 2004-10-29 2008-11-18 Intel Corporation Cut-through communication protocol translation bridge
US7848825B2 (en) * 2007-01-03 2010-12-07 Apple Inc. Master/slave mode for sensor processing devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5347514A (en) * 1993-03-26 1994-09-13 International Business Machines Corporation Processor-based smart packet memory interface
US6449273B1 (en) * 1997-09-04 2002-09-10 Hyundai Electronics America Multi-port packet processor
WO2002065717A1 (en) * 2001-02-14 2002-08-22 Dynarc Inc. Dba Dynamic Network Architecture Inc., In Ca Dynamic packet processor architecture
US7107402B1 (en) * 2001-10-23 2006-09-12 Stephen Waller Melvin Packet processor memory interface

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8771064B2 (en) 2010-05-26 2014-07-08 Aristocrat Technologies Australia Pty Limited Gaming system and a method of gaming
WO2015127327A1 (en) * 2014-02-23 2015-08-27 Rambus Inc. Distributed procedure execution and file systems on a memory interface
EP2933728A1 (en) * 2014-04-17 2015-10-21 ADVA Optical Networking SE Using serdes loopbacks for low latency functional modes with full monitoring capability
US9407574B2 (en) 2014-04-17 2016-08-02 Adva Optical Networking Se Using SerDes loopbacks for low latency functional modes with full monitoring capability
EP3001610A1 (en) * 2014-09-29 2016-03-30 F5 Networks, Inc Methods for sharing bandwidth across a packetized bus and systems thereof
US10042809B2 (en) 2015-03-20 2018-08-07 Electronics And Telecommunications Research Institute Method for communication using PCI express dedicated communication module and network device including the same
WO2016160736A1 (en) * 2015-03-30 2016-10-06 Integrated Device Technology, Inc. Methods and apparatus for efficient network analytics and computing card
CN104794088A (en) * 2015-04-22 2015-07-22 成都为开微电子有限公司 Multi-interface bus converting expanding chip design
CN104794088B (en) * 2015-04-22 2018-05-01 成都为开微电子有限公司 A kind of multiplex roles general line system extended chip design
CN114063520A (en) * 2021-11-17 2022-02-18 首都师范大学 Switch, communication system and control method
CN114063520B (en) * 2021-11-17 2024-03-12 首都师范大学 Switch, communication system, and control method

Also Published As

Publication number Publication date
US7822946B2 (en) 2010-10-26
US20160147689A1 (en) 2016-05-26
US8234483B2 (en) 2012-07-31
US8924688B2 (en) 2014-12-30
KR20090104137A (en) 2009-10-05
CN101918931A (en) 2010-12-15
US10437764B2 (en) 2019-10-08
CN101918931B (en) 2013-09-04
KR101453581B1 (en) 2014-10-22
US20130007414A1 (en) 2013-01-03
US20180143930A1 (en) 2018-05-24
US9940279B2 (en) 2018-04-10
US20110035571A1 (en) 2011-02-10
US20080244150A1 (en) 2008-10-02

Similar Documents

Publication Publication Date Title
US10437764B2 (en) Multi protocol communication switch apparatus
US9323708B2 (en) Protocol translation method and bridge device for switched telecommunication and computing platforms
US7353362B2 (en) Multiprocessor subsystem in SoC with bridge between processor clusters interconnetion and SoC system bus
US10924430B2 (en) Streaming platform flow and architecture for an integrated circuit
CN112925735A (en) Easily expandable on-die fabric interface
US11003607B2 (en) NVMF storage to NIC card coupling over a dedicated bus
WO2020097013A1 (en) Streaming platform flow and architecture
CN111752607A (en) System, apparatus and method for bulk register access in a processor
US10657077B2 (en) HyperConverged NVMF storage-NIC card
Wu et al. A flexible FPGA-to-FPGA communication system
Sano et al. ESSPER: Elastic and scalable FPGA-cluster system for high-performance reconfigurable computing with supercomputer Fugaku
Chou et al. Sharma et al.
WO2008013888A2 (en) Telecommunication and computing platforms with seria packet switched integrated memory access technolog
JP2006513489A (en) System and method for scalable interconnection of adaptive processor nodes for clustered computer systems
Wu et al. A flexible FPGA-to-FPGA interconnect interface design and implementation
KR102654610B1 (en) Multistage boot image loading and configuration of programmable logic devices
Litz Improving the scalability of high performance computer systems
Duato Introduction to Network Architectures
Moh et al. KinCA: An InfiniBand host channel adapter based on dual processor cores
KR20180063128A (en) Configuration of multistage boot image loading and programmable logic devices

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880003869.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08728971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 5094/CHENP/2009

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 1020097018172

Country of ref document: KR

122 Ep: pct application non-entry in european phase

Ref document number: 08728971

Country of ref document: EP

Kind code of ref document: A1