US20080307196A1 - Integrated Processor Array, Instruction Sequencer And I/O Controller - Google Patents
Integrated Processor Array, Instruction Sequencer And I/O Controller Download PDFInfo
- Publication number
- US20080307196A1 US20080307196A1 US12/128,528 US12852808A US2008307196A1 US 20080307196 A1 US20080307196 A1 US 20080307196A1 US 12852808 A US12852808 A US 12852808A US 2008307196 A1 US2008307196 A1 US 2008307196A1
- Authority
- US
- United States
- Prior art keywords
- data
- instructions
- processing engines
- computer system
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012545 processing Methods 0.000 claims abstract description 131
- 230000015654 memory Effects 0.000 claims abstract description 40
- 238000012546 transfer Methods 0.000 claims abstract description 37
- 238000012163 sequencing technique Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 description 30
- 238000000034 method Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G06F15/8023—Two dimensional arrays, e.g. mesh, torus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
Definitions
- the invention relates generally to computer processors. More specifically, the invention relates to an integrated processor array, instruction sequencer, and I/O controller.
- processors are increasingly asked to perform mathematical operations, such as calculations and other data manipulation, at greater rates of speed.
- processors are also increasingly required to transfer more data at higher rates of speed, as multimedia and other applications employ larger files storing greater amounts of data.
- the invention can be implemented in numerous ways, including as a method, system, and device. Various embodiments of the invention are discussed below.
- a computer system comprises an instruction sequencing unit configured to sequence instructions for manipulating data and to transmit the sequenced instructions.
- the computer system also includes an array of processing engines configured to receive instructions corresponding to the sequenced instructions, each processing engine of the array of processing engines being configured to receive the data.
- Each processing engine has a first memory configured to store the data, a decision unit configured to store decision data, and a Boolean unit configured to store a logic state and to modify the logic state according to the received instructions.
- Each processing engine also has an integer unit configured to conditionally perform integer operations on the stored data according to the stored decision data, the received instructions, and the logic state, so as to generate integer result data, as well as a second memory configured to store I/O data.
- the Boolean unit is configured to modify the logic state in the same clock cycle as the integer unit performs the integer operations.
- the computer system also includes an I/O controller configured to transmit the I/O data to, and receive the I/O data from, the array of processing engines.
- a computer system comprises a processing array having processing engines serially interconnected in rows and columns so as to form rows of processing engines and columns of processing engines, the processing array configured to execute I/O operations by shifting I/O data sequentially through the columns of processing engines, to shift computation data sequentially across the rows of processing engines, and to execute computation operations upon the shifted computation data in parallel with the I/O operations.
- the computer system also includes an instruction sequencing unit configured to sequence instructions and to transfer the instructions to the processing engines of the processing array so as to control the computation operations. It also includes an I/O controller configured to exchange the I/O data with the processing engines of the processing array.
- FIG. 1 illustrates a block diagram representation of a processor constructed in accordance with the invention, and including an integrated instruction sequencer, an array of processing engines, and an I/O controller.
- FIG. 2 illustrates further details of processing engines constructed in accordance with the invention, as well as their interconnection.
- FIG. 3 illustrates a block diagram representation of an individual processing engine in accordance with the invention.
- FIG. 4 is a vector representation of commands to be executed by the processing engines of FIG. 3 .
- the invention relates to a computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller.
- the instruction sequencer sequences instructions from a host and transfers these instructions to the processing engines, thus directing their operation.
- the I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer.
- the processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to a logic state stored in the 1-bit ALU and data stored in the decision unit.
- the 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation, allowing for faster and more efficient processing.
- the processing engines also contain a local memory for storing instructions and data to be shifted among the engines.
- FIG. 1 illustrates a processor of the invention in block diagram form.
- the processor 100 includes an instruction sequencer 102 , an array 104 of processing engines, and an I/O controller 106 .
- the instruction sequencer 102 receives tasks from a host (not shown), and transforms each task into sequences of instructions for proper use by the array 104 .
- decoders 108 , 110 can decode instructions from the instruction sequencer 102 , translating the instructions for various applications to corresponding native instructions understood by the array 104 . Instructions are then fed to the pipeline registers 112 , where they are fed sequentially to the array 104 .
- the array 104 is also configured to handle I/O data.
- the I/O controller 106 receives I/O data from the host or from an external memory, and transfers it to an I/O interface 114 , where it is formatted for the local memories of individual processing engines of the array 104 .
- the processor 100 includes the ability to transfer I/O data to individual processing engines in a number of ways, to maximize efficiency and speed.
- the processing engines When the processing engines have finished performing their various operations on their data, including shifting the data amongst the processors, the data is shifted out of the array 104 . I/O data is shifted out to the I/O controller 106 , while other data is shifted out to the instruction sequencer 102 for transfer to the host, via an adder 116 if desired.
- the processing engines have the capacity to simultaneously transfer I/O data and perform operations on other data, adding to the speed and efficiency of the processor 100 .
- FIG. 2 illustrates the interconnections between processing engines in the array 104 .
- the array 104 is constructed as a two dimensional array of processing engines PE ij .
- the processing engines PE ij are serially interconnected in rows and columns. That is, the processing engines PE ij are arranged in rows and columns, with each processing engine PE ij able to exchange data with its neighboring processing engines, both in its row and in its adjacent columns.
- the processing engines at the end of each row are able to exchange data with the first processing engine of the next row, and vice versa.
- the processing engines at the end of each column are able to transfer data to the first processing engine in the same column.
- the processing engines can thus be configured to transfer I/O data and other data both column-wise and row-wise.
- the I/O controller 106 transfers I/O data (perhaps after formatting by the I/O interface 114 ) to various processing engines, which transfer the I/O data serially down their respective columns. Simultaneously, this I/O data, or other data inserted into the various processing engines accompanying instructions from the instruction sequencer 102 , can be operated on by each processing engine and shifted row-wise. In this manner, the array 104 can both transfer I/O data as well as simultaneously perform various operations on that or other data.
- I/O bounded processes are dominated by the need to transfer large amounts of data without performing significant computational operations upon that data, e.g., multimedia file playback, file copying, or other transfers of large amounts of data).
- FIG. 3 illustrates a block diagram representation of an individual processing engine PE ij in accordance with the invention.
- each processing engine 300 includes an integer ALU 302 , a 1-bit ALU 304 , and a decision unit 306 that either execute, or facilitate the execution of, various operations.
- the processing engine 300 also includes a local data memory 308 and registers 310 .
- the integer ALU 302 , 1-bit ALU 304 , and decision unit 306 are connected so as to operate in parallel with each other.
- the 1-bit ALU 304 and decision unit 306 can send their current logic states to the integer ALU 302 as well as modify those states in the same clock cycle.
- the processing engine 300 receives sequenced instructions from the instruction sequencer 102 .
- the instructions are sent to the integer ALU 302 , as well as to the registers 310 and local data memory 308 .
- the instructions are also sent to the 1-bit ALU 304 and decision unit 306 .
- Instructions requiring computation direct the registers 310 and/or local data memory 308 to transfer data to the integer ALU 302 for processing.
- the data can be transferred from the registers 308 to the integer ALU 302 as left and right operands, although the invention includes any form of data transfer among the local data memory 308 , registers 310 , and integer ALU 302 .
- the instructions also modify the logic state of the 1-bit ALU 304 .
- the 1-bit ALU 304 stores a single bit whose two binary logic states are read by the integer ALU 302 . Instructions from the instruction sequencer 302 can direct the integer ALU 302 to read the logic state of the 1-bit ALU 304 and execute different operations depending on the logic state.
- an instruction can direct the integer ALU 302 to add its data to data from a neighboring processing engine 300 if the logic state is binary “0”, or subtract its data from that of the neighboring processing engine 300 if the logic state is binary “1.”
- the 1-bit ALU 304 allows a single instruction to represent more than one operation.
- the instructions also modify a decision state stored in the decision unit 306 . This decision state indicates whether the particular processor is “marked” for execution of its instruction, or “unmarked” and thus directed not to execute its instruction. This allows the instruction sequencer 102 to selectively instruct individual processing engines 300 to carry out operations, or to avoid carrying out operations, as necessary. This allows the array 104 to execute more complex and detailed processes.
- integer ALU 302 1-bit ALU 304 , and decision unit 306 are arranged in parallel, so that the 1-bit ALU 304 and decision unit 306 can modify their states in the same clock cycle as the integer ALU 302 carries out its operations. This speeds the operation of each processing engine 300 , as the integer ALU 302 can thus carry out a new operation each clock cycle, rather than having to wait for the 1-bit ALU 304 and decision unit 306 to update first.
- the local memory 308 and registers 310 store data and instructions needed for the operations performed by the integer ALU 302 .
- the registers 310 are in electronic communication with the registers of adjacent processing engines 300 (both row-wise and column-wise), and thus allow data to be exchanged between adjacent processing engines 300 .
- the local memory 308 can exchange data with the registers 310 , so that data can be shifted from the registers 310 into the local data memory 308 for storage as necessary. This data can then be retrieved by the registers and either sent to the integer ALU 302 for processing, or shifted into the registers of adjacent processing engines 300 for eventual transfer out of the array 104 .
- the local data memory 308 and registers 310 also allow for the transfer of I/O data.
- the I/O controller 106 and/or I/O interface 114 can place I/O data into various processing engines 300 , typically by transferring data to the registers 310 . If calculations are required on this I/O data, they can be performed as above, and if not, the I/O data can be shifted down column-wise out of the array 104 and to the host. Alternatively, it can be shifted into the local data memory 308 for future processing or transfer.
- the processing engine 300 has a local data memory 308 that can hold at least 256 16-bit words.
- the register 310 can hold at least 8 16-bit words, as well as 8 Boolean bits for selecting the active components of the integer vectors for processing in the integer ALU 302 .
- FIG. 4 illustrates a vector representation of such an embodiment (a vector being simply a representation of data), where 1024 processing engines 300 are shown along the top of the chart, while the various vectors, registers, and Boolean bits of each engine 300 run along the side.
- instructions and data can be thought of as being transmitted to the processing engines 300 as vectors, e.g., vector — 000 is a 1024-component vector of data, each component of which is 16-bits long and is sent to one processing engine 300 .
- vector Boolean — 0 is a 1024-component vector of single bits, each of which is transmitted to the 1-bit ALU 304 of a processing engine 300 .
- each processing engine 300 can be represented as a column of FIG. 4 , able to store 256 16-bit words of data, 8 16-bit words of register information, and 8 Boolean bits.
- processing engine “0” can store the first 16-bit word from each of vector — 000-vector — 255 in its local data memory 308 for shifting down column-wise or for manipulation in its integer ALU 302 . It can also store the first 16-bit word from each of register — 0-register — 7 in its registers 310 as queued instructions or transferred data, and the first bit from each of boolean — 0-boolean — 7 in its registers 310 or 1-bit ALU 302 as queued logic states.
- the first such feature relates to the decoding of instructions.
- the instruction sequencer 102 can include decoders 108 , 110 for decoding instructions. These decoders 108 , 110 can store microcode instructions corresponding to the instruction sets of any applications. The instruction sequencer 102 then transmits sequenced instructions to the decoders 108 , 110 , which retrieve the corresponding microcode instructions and transmit them to the processing engines 300 of the array 104 . This allows the processor 100 to be compatible with any application, so long as microcode corresponding to instructions for that application can be stored in the decoders 108 , 110 .
- the decoders 108 , 110 are SRAM decoders, which allows users to periodically update or otherwise alter the stored instruction sets, although the invention encompasses decoders 108 , 110 that employ any form of memory for storing microcode instructions corresponding to the instructions for various applications. Also, it is sometimes preferred that one decoder 108 is dedicated to storing the operation codes of the integer ALU 302 , while the other decoder 110 is dedicated to storing Boolean operation codes for the 1-bit ALU 304 .
- the invention is not limited to embodiments including two separate decoders 108 , 110 , although it is sometimes preferable to include separate decoders 108 , 110 for integer and Boolean operation codes, so as to allow for independent changes to be made to either.
- the decoders 108 , 110 can store microcode corresponding to multiple applications, the stored microcode is often longer than the instructions received from the host. Thus, it is often the case that the decoders 108 , 110 act to effectively expand these received instructions.
- the expanded microcode instructions stored in the decoders 108 , 110 can be 64-bit microcode instructions (allowing for 264 possible unique instructions).
- the processor 100 may receive relatively small instructions like 8- or 16-bit instructions, it may work internally with larger 64-bit instructions.
- the second such feature concerns data addressing.
- the I/O controller 106 and/or I/O interface 114 can transmit I/O data to any processing engine 300 . That is, data can be transmitted to any arbitrarily selected processing engine 300 . This allows for more efficient use of the array 104 , as I/O data can be preferentially sent to those processing engines 300 that are less active and able to more immediately handle the data.
- the arbitrary selection of particular processing engines 300 is accomplished by first instructing each processing engine 300 to transmit an available address in its local memory 308 to the I/O controller 106 .
- the addresses can be any format, but it is often convenient to transmit the addresses as a vector, where each element of the vector represents a different processing engine 300 . Each element can thus be filled by the position in the local data memory 308 that is available to hold data, if any. A zero value can represent a processing engine 300 that is unavailable for I/O data.
- each processing engine 300 is directed to transmit a position in its memory 308 , and these positions are assembled into a vector that effectively contains the identities of each available processing engine 300 and the available memory positions of each. This vector allows the I/O controller 106 to quickly determine where it can transfer I/O data.
- vectors can also be used in the transfer of data to/from memories external to the processor 100 .
- the array 104 can be instructed to construct a vector containing addresses to be used in accessing an external memory. This vector can then be transferred out through the I/O controller 106 to address desired portions of the external memory for data transfer to/from that external memory.
- processing engines 300 can be instructed to transmit memory positions of I/O data they store, and these positions can be assembled into a vector informing the I/O controller 106 of the addresses at which it can retrieve data from the processing engines 300 .
- this approach increases the overall efficiency of the processor 100 , as a single instruction from the instruction sequencer 102 allows all available processing engines 300 to be identified, and data to be transferred to/from only those processing engines.
- the third such feature concerns data formatting.
- the I/O controller 106 and/or I/O interface 114 can format data to fit the local data memories 308 of the processing engines 300 .
- the invention encompasses the use of any data format.
- the I/O controller 106 can load/store data in shuffled mode, direct transfer mode, and indirect transfer mode.
- the I/O controller 106 can also perform byte expanded loads and byte compacted stores, as well as word expanded loads and word compacted stores.
- shuffled mode data from the host is divided into two vectors, one vector having the even-numbered words and one vector having the odd-numbered words. That is, if the host transmits data in 16-byte word format, each processing engine 300 stores data in 16-bit format, and the array 104 contains 1024 processing engines 300 , then the I/O controller 106 can accumulate a 2048-component double-length vector of data from the host, [w0, w1, . . . , w2047], where each component wi is a 2-byte word. The I/O controller 106 then breaks this vector up into two 1024-component vectors:
- v1 [w0, w2, . . . , w2046]
- v2 [w1, w3, . . . , w2047]
- the two 1024-component vectors are then sent to the 1024 processing engines 300 , where each 2-byte (i.e., 16-bit) component is already formatted for storage in the registers 310 and local data memory 308 .
- the I/O controller 106 breaks up host-formatted data into two 1024-component vectors, each component of which contains data formatted for the processing engines 300 .
- the I/O controller 106 can accumulate 512 2-byte words [w0, w1, . . . , w511], which are then divided into 1024 2-byte words, with the most significant byte of each word set to zero:
- each byte from external memory is stored as a 16-bit number with the most significant byte zero.
- a vector of stored 16-bit numbers [w0, w1, . . . , w1023] is retrieved, and the zero-value most significant bytes are stripped out to yield 1024 2-byte words again: ⁇ w0[7:0], w1[7:0], . . . , w1023[7:0] ⁇ .
- the I/O controller 106 can accumulate a vector of 512 2-byte words [w0, w1, . . . , w511], which are then converted to 1024 2-byte words, where every other 2-byte word is set to zero.
- the 1024 2-byte words are then loaded into the array 104 as vector:
- every other 2-byte word i.e., the zero-value words
- every other 2-byte word is stripped out to once again achieve a vector of 512 2-byte words: [w0, w2, . . . , w1020, w1022].
- direct transfer mode the I/O controller 106 uses a specified increment, and transfers data to the processing engines 300 based on this increment. For example, if the increment is 2, the I/O controller 106 transfers its data to every other processing engine 300 .
- indirect transfer mode involves addresses provided by each processing engine 300 , similar to the data addressing techniques described above. For instance, each processing engine 300 is instructed to provide its address based on whether it is sufficiently available to receive data. The I/O controller 106 then transmits its data to the processing engines 300 that it has received addresses from.
- each processing engine 300 to shift data to and from adjacent processing engines 300 , coupled with the ability of the instruction sequencer 102 to selectively mark engines 300 for executing computational operations, allows for great flexibility and speed in computation, providing for much faster computation bounded processes.
- a single instruction from the instruction sequencer 102 can instruct every processing engine 300 in the array 104 to execute varying operations, with different engines 300 instructed to perform different operations according to the logic states set individually by the instruction, or instructed not to perform any calculations at all.
- each individual instruction can control a “global” set of operations that can vary as necessary from engine 300 to engine 300 .
- the array 104 can perform functions such as sequential multiplication algorithms much faster.
- Multiplication can be performed using a process which inspects 2 bits in each step, decides the appropriate addition, and performs two position shifts. This can be accomplished with only three instructions (init_mult, mult, end_mult, each having specific microcode generated by the programmable decoders 108 and 110 ) in the processor 100 , thus greatly speeding multiplication.
- two bits of multiplicand can be tested in each cycle:
- the array 104 need not be limited to a two dimensional array of rows and columns, but can be organized in any manner.
- certain components such as the SRAM decoders 108 , 110 and I/O interface 114 may be desirable in certain embodiments, they are not required for the practice of the invention.
- the embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Abstract
A computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller. The instruction sequencer sequences instructions from a host, and transfers these instructions to the processing engines, thus directing their operation. The I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer. The processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to logic states stored in the 1-bit ALU and data stored in the decision unit. The 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation. The processing engines also contain a local memory for storing instructions and data.
Description
- This application is a divisional of U.S. patent application Ser. No. 11/584,480, filed on Oct. 19, 2006, which claims the benefit of U.S. Provisional Patent Application No. 60/729,178, filed on Oct. 21, 2005, the disclosure of which is hereby incorporated by reference in its entirety and for all purposes.
- The invention relates generally to computer processors. More specifically, the invention relates to an integrated processor array, instruction sequencer, and I/O controller.
- The ever-increasing requirements for computational speed have generated unyielding demand for ever-faster and more efficient processors. In particular, processors are increasingly asked to perform mathematical operations, such as calculations and other data manipulation, at greater rates of speed. Processors are also increasingly required to transfer more data at higher rates of speed, as multimedia and other applications employ larger files storing greater amounts of data.
- Accordingly, continuing efforts exist to improve the speed and performance of computer processors. In particular, efforts exist to improve both the speed and efficiency with which processors manipulate data, and the speed at which processors transfer I/O data.
- The invention can be implemented in numerous ways, including as a method, system, and device. Various embodiments of the invention are discussed below.
- In one embodiment, a computer system comprises an instruction sequencing unit configured to sequence instructions for manipulating data and to transmit the sequenced instructions. The computer system also includes an array of processing engines configured to receive instructions corresponding to the sequenced instructions, each processing engine of the array of processing engines being configured to receive the data. Each processing engine has a first memory configured to store the data, a decision unit configured to store decision data, and a Boolean unit configured to store a logic state and to modify the logic state according to the received instructions. Each processing engine also has an integer unit configured to conditionally perform integer operations on the stored data according to the stored decision data, the received instructions, and the logic state, so as to generate integer result data, as well as a second memory configured to store I/O data. The Boolean unit is configured to modify the logic state in the same clock cycle as the integer unit performs the integer operations. The computer system also includes an I/O controller configured to transmit the I/O data to, and receive the I/O data from, the array of processing engines.
- In another embodiment, a computer system comprises a processing array having processing engines serially interconnected in rows and columns so as to form rows of processing engines and columns of processing engines, the processing array configured to execute I/O operations by shifting I/O data sequentially through the columns of processing engines, to shift computation data sequentially across the rows of processing engines, and to execute computation operations upon the shifted computation data in parallel with the I/O operations. The computer system also includes an instruction sequencing unit configured to sequence instructions and to transfer the instructions to the processing engines of the processing array so as to control the computation operations. It also includes an I/O controller configured to exchange the I/O data with the processing engines of the processing array.
- Other aspects and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings which illustrate, by way of example, the principles of the invention.
- For a better understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
-
FIG. 1 illustrates a block diagram representation of a processor constructed in accordance with the invention, and including an integrated instruction sequencer, an array of processing engines, and an I/O controller. -
FIG. 2 illustrates further details of processing engines constructed in accordance with the invention, as well as their interconnection. -
FIG. 3 illustrates a block diagram representation of an individual processing engine in accordance with the invention. -
FIG. 4 is a vector representation of commands to be executed by the processing engines ofFIG. 3 . - Like reference numerals refer to corresponding parts throughout the drawings.
- In one sense, the invention relates to a computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller. The instruction sequencer sequences instructions from a host and transfers these instructions to the processing engines, thus directing their operation. The I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer. To facilitate the efficient execution of instructions from the instruction sequencer and the exchange of I/O data with the I/O controller, the processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to a logic state stored in the 1-bit ALU and data stored in the decision unit. The 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation, allowing for faster and more efficient processing. The processing engines also contain a local memory for storing instructions and data to be shifted among the engines.
-
FIG. 1 illustrates a processor of the invention in block diagram form. Theprocessor 100 includes aninstruction sequencer 102, anarray 104 of processing engines, and an I/O controller 106. Theinstruction sequencer 102 receives tasks from a host (not shown), and transforms each task into sequences of instructions for proper use by thearray 104. To facilitate the support of multiple different applications,decoders instruction sequencer 102, translating the instructions for various applications to corresponding native instructions understood by thearray 104. Instructions are then fed to thepipeline registers 112, where they are fed sequentially to thearray 104. - The
array 104 is also configured to handle I/O data. The I/O controller 106 receives I/O data from the host or from an external memory, and transfers it to an I/O interface 114, where it is formatted for the local memories of individual processing engines of thearray 104. As will be further explained below, theprocessor 100 includes the ability to transfer I/O data to individual processing engines in a number of ways, to maximize efficiency and speed. - When the processing engines have finished performing their various operations on their data, including shifting the data amongst the processors, the data is shifted out of the
array 104. I/O data is shifted out to the I/O controller 106, while other data is shifted out to theinstruction sequencer 102 for transfer to the host, via anadder 116 if desired. - As can be seen from the above description, the processing engines have the capacity to simultaneously transfer I/O data and perform operations on other data, adding to the speed and efficiency of the
processor 100. This is accomplished partly by the structure of the processing engines within thearray 104 itself.FIG. 2 illustrates the interconnections between processing engines in thearray 104. In this embodiment, thearray 104 is constructed as a two dimensional array of processing engines PEij. The processing engines PEij are serially interconnected in rows and columns. That is, the processing engines PEij are arranged in rows and columns, with each processing engine PEij able to exchange data with its neighboring processing engines, both in its row and in its adjacent columns. The processing engines at the end of each row are able to exchange data with the first processing engine of the next row, and vice versa. Similarly, the processing engines at the end of each column are able to transfer data to the first processing engine in the same column. The processing engines can thus be configured to transfer I/O data and other data both column-wise and row-wise. - In this manner, the I/
O controller 106 transfers I/O data (perhaps after formatting by the I/O interface 114) to various processing engines, which transfer the I/O data serially down their respective columns. Simultaneously, this I/O data, or other data inserted into the various processing engines accompanying instructions from theinstruction sequencer 102, can be operated on by each processing engine and shifted row-wise. In this manner, thearray 104 can both transfer I/O data as well as simultaneously perform various operations on that or other data. - It can be seen that this ability to handle both I/O data and other forms of data, as well as the ability to perform operations on both, confers advantages over other systems. First, it yields faster and more efficient processing, as data transfers can be performed in parallel with calculations and other data manipulations. Second, it allows the
processor 100 to be effectively optimized to handle the computational processes most often seen by modern computers. That is, it has been found that many computational processes are either “I/O bounded” or “computation bounded.” I/O bounded processes are dominated by the need to transfer large amounts of data without performing significant computational operations upon that data, e.g., multimedia file playback, file copying, or other transfers of large amounts of data). Conversely, computation bounded processes are dominated by the need to perform calculations, e.g., graphics rendering, simulations, and the like. By incorporating dedicated hardware for both I/O data transfer and computations, theprocessor 100 handles I/O bounded processes and computation bounded processes faster and more efficiently than other processors. - While one aspect of the
processor 100 includes a dedicated I/O controller 106 andinstruction sequencer 102 for handling I/O data and instructions, the construction of the processing engines PEij themselves also contributes to advantageous handling of I/O bounded and computation bounded processes.FIG. 3 illustrates a block diagram representation of an individual processing engine PEij in accordance with the invention. In this embodiment, eachprocessing engine 300 includes aninteger ALU 302, a 1-bit ALU 304, and adecision unit 306 that either execute, or facilitate the execution of, various operations. Theprocessing engine 300 also includes alocal data memory 308 and registers 310. As shown, theinteger ALU 302, 1-bit ALU 304, anddecision unit 306 are connected so as to operate in parallel with each other. In particular, the 1-bit ALU 304 anddecision unit 306 can send their current logic states to theinteger ALU 302 as well as modify those states in the same clock cycle. - In operation, the
processing engine 300 receives sequenced instructions from theinstruction sequencer 102. The instructions are sent to theinteger ALU 302, as well as to theregisters 310 andlocal data memory 308. The instructions are also sent to the 1-bit ALU 304 anddecision unit 306. - Instructions requiring computation direct the
registers 310 and/orlocal data memory 308 to transfer data to theinteger ALU 302 for processing. In the embodiment shown, the data can be transferred from theregisters 308 to theinteger ALU 302 as left and right operands, although the invention includes any form of data transfer among thelocal data memory 308, registers 310, andinteger ALU 302. The instructions also modify the logic state of the 1-bit ALU 304. In this embodiment, the 1-bit ALU 304 stores a single bit whose two binary logic states are read by theinteger ALU 302. Instructions from theinstruction sequencer 302 can direct theinteger ALU 302 to read the logic state of the 1-bit ALU 304 and execute different operations depending on the logic state. For example, an instruction can direct theinteger ALU 302 to add its data to data from a neighboringprocessing engine 300 if the logic state is binary “0”, or subtract its data from that of the neighboringprocessing engine 300 if the logic state is binary “1.” In this manner, the 1-bit ALU 304 allows a single instruction to represent more than one operation. The instructions also modify a decision state stored in thedecision unit 306. This decision state indicates whether the particular processor is “marked” for execution of its instruction, or “unmarked” and thus directed not to execute its instruction. This allows theinstruction sequencer 102 to selectively instructindividual processing engines 300 to carry out operations, or to avoid carrying out operations, as necessary. This allows thearray 104 to execute more complex and detailed processes. - It should be noted that the
integer ALU 302, 1-bit ALU 304, anddecision unit 306 are arranged in parallel, so that the 1-bit ALU 304 anddecision unit 306 can modify their states in the same clock cycle as theinteger ALU 302 carries out its operations. This speeds the operation of eachprocessing engine 300, as theinteger ALU 302 can thus carry out a new operation each clock cycle, rather than having to wait for the 1-bit ALU 304 anddecision unit 306 to update first. - The
local memory 308 andregisters 310 store data and instructions needed for the operations performed by theinteger ALU 302. Theregisters 310 are in electronic communication with the registers of adjacent processing engines 300 (both row-wise and column-wise), and thus allow data to be exchanged betweenadjacent processing engines 300. Thelocal memory 308 can exchange data with theregisters 310, so that data can be shifted from theregisters 310 into thelocal data memory 308 for storage as necessary. This data can then be retrieved by the registers and either sent to theinteger ALU 302 for processing, or shifted into the registers ofadjacent processing engines 300 for eventual transfer out of thearray 104. - In addition to helping improve the computational abilities of the
processing engines 300, thelocal data memory 308 andregisters 310 also allow for the transfer of I/O data. As above, the I/O controller 106 and/or I/O interface 114 can place I/O data intovarious processing engines 300, typically by transferring data to theregisters 310. If calculations are required on this I/O data, they can be performed as above, and if not, the I/O data can be shifted down column-wise out of thearray 104 and to the host. Alternatively, it can be shifted into thelocal data memory 308 for future processing or transfer. - One of ordinary skill in the art will realize that the invention encompasses any size for the various memories and instructions of the invention. However, in at least one embodiment, the
processing engine 300 has alocal data memory 308 that can hold at least 256 16-bit words. Theregister 310 can hold at least 8 16-bit words, as well as 8 Boolean bits for selecting the active components of the integer vectors for processing in theinteger ALU 302.FIG. 4 illustrates a vector representation of such an embodiment (a vector being simply a representation of data), where 1024processing engines 300 are shown along the top of the chart, while the various vectors, registers, and Boolean bits of eachengine 300 run along the side. From this, it can be seen that instructions and data can be thought of as being transmitted to theprocessing engines 300 as vectors, e.g., vector—000 is a 1024-component vector of data, each component of which is 16-bits long and is sent to oneprocessing engine 300. Similarly,vector Boolean —0 is a 1024-component vector of single bits, each of which is transmitted to the 1-bit ALU 304 of aprocessing engine 300. It can also be seen that eachprocessing engine 300 can be represented as a column ofFIG. 4 , able to store 256 16-bit words of data, 8 16-bit words of register information, and 8 Boolean bits. For example, processing engine “0” can store the first 16-bit word from each of vector—000-vector—255 in itslocal data memory 308 for shifting down column-wise or for manipulation in itsinteger ALU 302. It can also store the first 16-bit word from each of register—0-register—7 in itsregisters 310 as queued instructions or transferred data, and the first bit from each of boolean—0-boolean—7 in itsregisters 310 or 1-bit ALU 302 as queued logic states. - The basic operation of the
processor 100 having been illustrated, attention now turns to a more detailed explanation of certain noteworthy features of the invention that convey particular advantages. - The first such feature relates to the decoding of instructions. As mentioned above, the
instruction sequencer 102 can includedecoders decoders instruction sequencer 102 then transmits sequenced instructions to thedecoders processing engines 300 of thearray 104. This allows theprocessor 100 to be compatible with any application, so long as microcode corresponding to instructions for that application can be stored in thedecoders - In some embodiments, it is preferred that the
decoders decoders decoder 108 is dedicated to storing the operation codes of theinteger ALU 302, while theother decoder 110 is dedicated to storing Boolean operation codes for the 1-bit ALU 304. One of ordinary skill in the art will realize that the invention is not limited to embodiments including twoseparate decoders separate decoders decoders decoders decoders processor 100 may receive relatively small instructions like 8- or 16-bit instructions, it may work internally with larger 64-bit instructions. - The second such feature concerns data addressing. The I/
O controller 106 and/or I/O interface 114 can transmit I/O data to anyprocessing engine 300. That is, data can be transmitted to any arbitrarily selectedprocessing engine 300. This allows for more efficient use of thearray 104, as I/O data can be preferentially sent to those processingengines 300 that are less active and able to more immediately handle the data. - In one embodiment, the arbitrary selection of
particular processing engines 300 is accomplished by first instructing eachprocessing engine 300 to transmit an available address in itslocal memory 308 to the I/O controller 106. The addresses can be any format, but it is often convenient to transmit the addresses as a vector, where each element of the vector represents adifferent processing engine 300. Each element can thus be filled by the position in thelocal data memory 308 that is available to hold data, if any. A zero value can represent aprocessing engine 300 that is unavailable for I/O data. In this manner, eachprocessing engine 300 is directed to transmit a position in itsmemory 308, and these positions are assembled into a vector that effectively contains the identities of each available processingengine 300 and the available memory positions of each. This vector allows the I/O controller 106 to quickly determine where it can transfer I/O data. - One of ordinary skill in the art will realize that these vectors can also be used in the transfer of data to/from memories external to the
processor 100. For instance, thearray 104 can be instructed to construct a vector containing addresses to be used in accessing an external memory. This vector can then be transferred out through the I/O controller 106 to address desired portions of the external memory for data transfer to/from that external memory. - One of ordinary skill in the art will also realize that these vectors can be used in the retrieval of data, i.e., processing
engines 300 can be instructed to transmit memory positions of I/O data they store, and these positions can be assembled into a vector informing the I/O controller 106 of the addresses at which it can retrieve data from theprocessing engines 300. One of ordinary skill will also realize that this approach increases the overall efficiency of theprocessor 100, as a single instruction from theinstruction sequencer 102 allows all available processingengines 300 to be identified, and data to be transferred to/from only those processing engines. - The third such feature concerns data formatting. As above, the I/
O controller 106 and/or I/O interface 114 can format data to fit thelocal data memories 308 of theprocessing engines 300. The invention encompasses the use of any data format. For example, the I/O controller 106 can load/store data in shuffled mode, direct transfer mode, and indirect transfer mode. The I/O controller 106 can also perform byte expanded loads and byte compacted stores, as well as word expanded loads and word compacted stores. - The above mentioned data formats are known. However, illustrative examples are beneficial. In shuffled mode, data from the host is divided into two vectors, one vector having the even-numbered words and one vector having the odd-numbered words. That is, if the host transmits data in 16-byte word format, each
processing engine 300 stores data in 16-bit format, and thearray 104 contains 1024processing engines 300, then the I/O controller 106 can accumulate a 2048-component double-length vector of data from the host, [w0, w1, . . . , w2047], where each component wi is a 2-byte word. The I/O controller 106 then breaks this vector up into two 1024-component vectors: -
v1=[w0, w2, . . . , w2046] - and
-
v2=[w1, w3, . . . , w2047] - The two 1024-component vectors are then sent to the 1024
processing engines 300, where each 2-byte (i.e., 16-bit) component is already formatted for storage in theregisters 310 andlocal data memory 308. In this manner, the I/O controller 106 breaks up host-formatted data into two 1024-component vectors, each component of which contains data formatted for theprocessing engines 300. - For byte expanded loads, the I/
O controller 106 can accumulate 512 2-byte words [w0, w1, . . . , w511], which are then divided into 1024 2-byte words, with the most significant byte of each word set to zero: -
{8′b0, w0[7:0]}, {8′b0, w0[15:8]}, -
{8′b0,w1[7:0]}, {8′b0,w1[15:8]}, -
{8′b0, w510[7:0]}, {8′b0, w510[15:8]}, -
{8′b0, w511[7:0]}, {8′b0, w511[15:8]}, - In other words, each byte from external memory is stored as a 16-bit number with the most significant byte zero. Conversely, for byte compacted stores, a vector of stored 16-bit numbers [w0, w1, . . . , w1023] is retrieved, and the zero-value most significant bytes are stripped out to yield 1024 2-byte words again: {w0[7:0], w1[7:0], . . . , w1023[7:0]}.
- For word expanded loads, the I/
O controller 106 can accumulate a vector of 512 2-byte words [w0, w1, . . . , w511], which are then converted to 1024 2-byte words, where every other 2-byte word is set to zero. The 1024 2-byte words are then loaded into thearray 104 as vector: -
[w0, 16′b0, w1, 16′b0, . . . , w510, 16′b0, w511, 16′b0] - Conversely, for word compacted stores, every other 2-byte word (i.e., the zero-value words) is stripped out to once again achieve a vector of 512 2-byte words: [w0, w2, . . . , w1020, w1022].
- In direct transfer mode, the I/
O controller 106 uses a specified increment, and transfers data to theprocessing engines 300 based on this increment. For example, if the increment is 2, the I/O controller 106 transfers its data to everyother processing engine 300. In contrast, indirect transfer mode involves addresses provided by eachprocessing engine 300, similar to the data addressing techniques described above. For instance, eachprocessing engine 300 is instructed to provide its address based on whether it is sufficiently available to receive data. The I/O controller 106 then transmits its data to theprocessing engines 300 that it has received addresses from. - It should be recognized that the ability of each
processing engine 300 to shift data to and fromadjacent processing engines 300, coupled with the ability of theinstruction sequencer 102 to selectively markengines 300 for executing computational operations, allows for great flexibility and speed in computation, providing for much faster computation bounded processes. In particular, a single instruction from theinstruction sequencer 102 can instruct everyprocessing engine 300 in thearray 104 to execute varying operations, withdifferent engines 300 instructed to perform different operations according to the logic states set individually by the instruction, or instructed not to perform any calculations at all. In this manner, each individual instruction can control a “global” set of operations that can vary as necessary fromengine 300 toengine 300. For example, thearray 104 can perform functions such as sequential multiplication algorithms much faster. Multiplication can be performed using a process which inspects 2 bits in each step, decides the appropriate addition, and performs two position shifts. This can be accomplished with only three instructions (init_mult, mult, end_mult, each having specific microcode generated by theprogrammable decoders 108 and 110) in theprocessor 100, thus greatly speeding multiplication. Here, two bits of multiplicand can be tested in each cycle: -
- If {b(i), b(i−1)}=00, then the partial result is shifted two binary positions right.
- If {b(i), b(i−1)}=01, then the multiplier is added, and the result is shifted two binary positions right.
- If {b(i), b(i−1)}=10, then the multiplier is shifted one binary position left, and the result is shifted two binary positions right.
- If {b(i), b(i−1)}=11, then the multiplier is subtracted, the result is shifted two binary positions right, and the multiplier is added in the next clock cycle.
In each cycle, the result is stored back in two registers, with the final result stored in a pair of registers as well.
- The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. For example, the
array 104 need not be limited to a two dimensional array of rows and columns, but can be organized in any manner. Also, while certain components such as theSRAM decoders O interface 114 may be desirable in certain embodiments, they are not required for the practice of the invention. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Claims (15)
1. A computer system, comprising:
a processing array having processing engines serially interconnected in rows and columns so as to form rows of processing engines and columns of processing engines, the processing array configured to execute I/O operations by shifting I/O data sequentially through the columns of processing engines, to shift computation data sequentially across the rows of processing engines, and to execute computation operations upon the shifted computation data in parallel with the I/O operations;
an instruction sequencing unit configured to sequence instructions and to transfer the instructions to the processing engines of the processing array so as to control the computation operations; and
an I/O controller configured to exchange the I/O data with the processing engines of the processing array.
2. The computer system of claim 1 , wherein each of the processing engines further comprises:
a logic unit configured to store a logic state and to modify the logic state according to the transferred instructions;
a decision unit configured to store a decision state and to modify the decision state according to the transferred instructions;
an integer unit configured to conditionally perform integer operations based on the decision state, the integer operations performed upon the shifted computation data according to the logic state and the transferred instructions;
registers in communication with the logic unit, the decision unit, and the integer unit, the registers configured to receive the shifted computation data and the logic state, and to transmit the shifted computation data and the logic state to the instruction sequencing unit; and
a local memory in communication with the registers and configured to store the shifted computation data.
3. The computer system of claim 2 , wherein:
the decision state is a state selectively designating the processing engines as marked processing engines and unmarked processing engines;
the integer units are configured to perform the integer operations upon the designating as marked processing engines; and
the integer units are configured to suspend the integer operations upon the designating as unmarked processing engines.
4. The computer system of claim 2 wherein each of the integer units is further configured to perform:
a first operation upon the shifted computation data according to one of the transferred instructions, when the logic state is a first logic state; and
a second operation upon the shifted computation data according to the one of the transferred instructions, when the logic state is a second logic state.
5. The computer system of claim 4 wherein the first operation and the second operation each are a shift operation shifting the stored data to another one of the processing engines, or an arithmetic operation.
6. The computer system of claim 5 further comprising an I/O interface in communication with the processing array and the I/O controller, the I/O interface configured to format the I/O data for storage in the local memories of the processing engines.
7. The computer system of claim 6 wherein the I/O interface is further configured to format the I/O data for loading in the local memories in shuffled mode.
8. The computer system of claim 6 wherein the I/O interface is further configured to format the I/O data by byte expanding the I/O data.
9. The computer system of claim 6 wherein the I/O interface is further configured to format the I/O data by word expanding the I/O data.
10. The computer system of claim 6 wherein the I/O interface is further configured to format the I/O data for loading in the local memories in direct transfer mode.
11. The computer system of claim 6 wherein the I/O interface is further configured to format the I/O data for loading in the local memories in indirect transfer mode.
12. The computer system of claim 1 further comprising a decoder unit in communication with the processing array and the instruction sequencing unit, the decoder unit having a decoder memory storing an instruction set having expanded instructions corresponding to the sequenced instructions received from the instruction sequencing unit, the decoder unit further configured to:
receive the sequenced instructions from the instruction sequencing unit;
retrieve from the decoder memory those expanded instructions corresponding to the sequenced instructions received from the instruction sequencing unit; and
transmit the retrieved expanded instructions to the array of processing engines.
13. The computer system of claim 12 wherein the decoder memory is an SRAM memory.
14. The computer system of claim 12 wherein the sequenced instructions are 8-bit instructions, and the expanded instructions are 64-bit microcode instructions.
15. The computer system of claim 1 wherein:
the instruction sequencing unit is further configured to instruct ones of the processing engines to generate addresses; and
the ones of the processing engines are further configured to generate addresses and to transmit the generated addresses to the I/O controller.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/128,528 US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72917805P | 2005-10-21 | 2005-10-21 | |
US11/584,480 US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
US12/128,528 US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/584,480 Division US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080307196A1 true US20080307196A1 (en) | 2008-12-11 |
Family
ID=37968408
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/584,480 Expired - Fee Related US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
US12/128,528 Abandoned US20080307196A1 (en) | 2005-10-21 | 2008-05-28 | Integrated Processor Array, Instruction Sequencer And I/O Controller |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/584,480 Expired - Fee Related US7451293B2 (en) | 2005-10-21 | 2006-10-19 | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
Country Status (7)
Country | Link |
---|---|
US (2) | US7451293B2 (en) |
EP (1) | EP1941380A2 (en) |
JP (1) | JP2009512920A (en) |
KR (1) | KR20080091754A (en) |
CA (1) | CA2626184A1 (en) |
TW (1) | TW200745876A (en) |
WO (1) | WO2007050444A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9864717B2 (en) | 2011-04-13 | 2018-01-09 | Hewlett Packard Enterprise Development Lp | Input/output processing |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7383421B2 (en) * | 2002-12-05 | 2008-06-03 | Brightscale, Inc. | Cellular engine for a data processing system |
US8427490B1 (en) | 2004-05-14 | 2013-04-23 | Nvidia Corporation | Validating a graphics pipeline using pre-determined schedules |
US8624906B2 (en) * | 2004-09-29 | 2014-01-07 | Nvidia Corporation | Method and system for non stalling pipeline instruction fetching from memory |
US8683184B1 (en) | 2004-11-15 | 2014-03-25 | Nvidia Corporation | Multi context execution on a video processor |
US9092170B1 (en) | 2005-10-18 | 2015-07-28 | Nvidia Corporation | Method and system for implementing fragment operation processing across a graphics bus interconnect |
US7451293B2 (en) * | 2005-10-21 | 2008-11-11 | Brightscale Inc. | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
WO2007082042A2 (en) * | 2006-01-10 | 2007-07-19 | Brightscale, Inc. | Method and apparatus for processing sub-blocks of multimedia data in parallel processing systems |
US20080028192A1 (en) * | 2006-07-31 | 2008-01-31 | Nec Electronics Corporation | Data processing apparatus, and data processing method |
US20080059467A1 (en) * | 2006-09-05 | 2008-03-06 | Lazar Bivolarski | Near full motion search algorithm |
US8683126B2 (en) | 2007-07-30 | 2014-03-25 | Nvidia Corporation | Optimal use of buffer space by a storage controller which writes retrieved data directly to a memory |
US8659601B1 (en) | 2007-08-15 | 2014-02-25 | Nvidia Corporation | Program sequencer for generating indeterminant length shader programs for a graphics processor |
US9024957B1 (en) | 2007-08-15 | 2015-05-05 | Nvidia Corporation | Address independent shader program loading |
US8411096B1 (en) * | 2007-08-15 | 2013-04-02 | Nvidia Corporation | Shader program instruction fetch |
US8698819B1 (en) | 2007-08-15 | 2014-04-15 | Nvidia Corporation | Software assisted shader merging |
US8028150B2 (en) * | 2007-11-16 | 2011-09-27 | Shlomo Selim Rakib | Runtime instruction decoding modification in a multi-processing array |
US9064333B2 (en) | 2007-12-17 | 2015-06-23 | Nvidia Corporation | Interrupt handling techniques in the rasterizer of a GPU |
US8780123B2 (en) | 2007-12-17 | 2014-07-15 | Nvidia Corporation | Interrupt handling techniques in the rasterizer of a GPU |
US8681861B2 (en) | 2008-05-01 | 2014-03-25 | Nvidia Corporation | Multistandard hardware video encoder |
US8923385B2 (en) | 2008-05-01 | 2014-12-30 | Nvidia Corporation | Rewind-enabled hardware encoder |
US8489851B2 (en) | 2008-12-11 | 2013-07-16 | Nvidia Corporation | Processing of read requests in a memory controller using pre-fetch mechanism |
CN103392165B (en) | 2011-06-24 | 2016-04-06 | 株式会社日立制作所 | Storage system |
JP5739758B2 (en) * | 2011-07-21 | 2015-06-24 | ルネサスエレクトロニクス株式会社 | Memory controller and SIMD processor |
CN107748674B (en) * | 2017-09-07 | 2021-08-31 | 中国科学院微电子研究所 | Information processing system oriented to bit granularity |
Citations (95)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4212076A (en) * | 1976-09-24 | 1980-07-08 | Giddings & Lewis, Inc. | Digital computer structure providing arithmetic and boolean logic operations, the latter controlling the former |
US4575818A (en) * | 1983-06-07 | 1986-03-11 | Tektronix, Inc. | Apparatus for in effect extending the width of an associative memory by serial matching of portions of the search pattern |
US4780811A (en) * | 1985-07-03 | 1988-10-25 | Hitachi, Ltd. | Vector processing apparatus providing vector and scalar processor synchronization |
US4873626A (en) * | 1986-12-17 | 1989-10-10 | Massachusetts Institute Of Technology | Parallel processing system with processor array having memory system included in system memory |
US4876644A (en) * | 1987-10-30 | 1989-10-24 | International Business Machines Corp. | Parallel pipelined processor |
US4907148A (en) * | 1985-11-13 | 1990-03-06 | Alcatel U.S.A. Corp. | Cellular array processor with individual cell-level data-dependent cell control and multiport input memory |
US4922341A (en) * | 1987-09-30 | 1990-05-01 | Siemens Aktiengesellschaft | Method for scene-model-assisted reduction of image data for digital television signals |
US4983958A (en) * | 1988-01-29 | 1991-01-08 | Intel Corporation | Vector selectable coordinate-addressable DRAM array |
US4992933A (en) * | 1986-10-27 | 1991-02-12 | International Business Machines Corporation | SIMD array processor with global instruction control and reprogrammable instruction decoders |
US5122984A (en) * | 1987-01-07 | 1992-06-16 | Bernard Strehler | Parallel associative memory system |
US5150430A (en) * | 1991-03-15 | 1992-09-22 | The Board Of Trustees Of The Leland Stanford Junior University | Lossless data compression circuit and method |
US5228098A (en) * | 1991-06-14 | 1993-07-13 | Tektronix, Inc. | Adaptive spatio-temporal compression/decompression of video image signals |
US5241635A (en) * | 1988-11-18 | 1993-08-31 | Massachusetts Institute Of Technology | Tagged token data processing system with operand matching in activation frames |
US5319762A (en) * | 1990-09-07 | 1994-06-07 | The Mitre Corporation | Associative memory capable of matching a variable indicator in one string of characters with a portion of another string |
US5329405A (en) * | 1989-01-23 | 1994-07-12 | Codex Corporation | Associative cam apparatus and method for variable length string matching |
US5373290A (en) * | 1991-09-25 | 1994-12-13 | Hewlett-Packard Corporation | Apparatus and method for managing multiple dictionaries in content addressable memory based data compression |
US5440753A (en) * | 1992-11-13 | 1995-08-08 | Motorola, Inc. | Variable length string matcher |
US5446915A (en) * | 1993-05-25 | 1995-08-29 | Intel Corporation | Parallel processing system virtual connection method and apparatus with protection and flow control |
US5448733A (en) * | 1993-07-16 | 1995-09-05 | International Business Machines Corp. | Data search and compression device and method for searching and compressing repeating data |
US5450599A (en) * | 1992-06-04 | 1995-09-12 | International Business Machines Corporation | Sequential pipelined processing for the compression and decompression of image data |
US5490264A (en) * | 1993-09-30 | 1996-02-06 | Intel Corporation | Generally-diagonal mapping of address space for row/column organizer memories |
US5497488A (en) * | 1990-06-12 | 1996-03-05 | Hitachi, Ltd. | System for parallel string search with a function-directed parallel collation of a first partition of each string followed by matching of second partitions |
US5602764A (en) * | 1993-12-22 | 1997-02-11 | Storage Technology Corporation | Comparing prioritizing memory for string searching in a data compression system |
US5631849A (en) * | 1994-11-14 | 1997-05-20 | The 3Do Company | Decompressor and compressor for simultaneously decompressing and compressng a plurality of pixels in a pixel array in a digital image differential pulse code modulation (DPCM) system |
US5640582A (en) * | 1992-05-21 | 1997-06-17 | Intel Corporation | Register stacking in a computer system |
US5682491A (en) * | 1994-12-29 | 1997-10-28 | International Business Machines Corporation | Selective processing and routing of results among processors controlled by decoding instructions using mask value derived from instruction tag and processor identifier |
US5706290A (en) * | 1994-12-15 | 1998-01-06 | Shaw; Venson | Method and apparatus including system architecture for multimedia communication |
US5758176A (en) * | 1994-09-28 | 1998-05-26 | International Business Machines Corporation | Method and system for providing a single-instruction, multiple-data execution unit for performing single-instruction, multiple-data operations within a superscalar data processing system |
US5818873A (en) * | 1992-08-03 | 1998-10-06 | Advanced Hardware Architectures, Inc. | Single clock cycle data compressor/decompressor with a string reversal mechanism |
US5822608A (en) * | 1990-11-13 | 1998-10-13 | International Business Machines Corporation | Associative parallel processing system |
US5828593A (en) * | 1996-07-11 | 1998-10-27 | Northern Telecom Limited | Large-capacity content addressable memory |
US5867598A (en) * | 1996-09-26 | 1999-02-02 | Xerox Corporation | Method and apparatus for processing of a JPEG compressed image |
US5870619A (en) * | 1990-11-13 | 1999-02-09 | International Business Machines Corporation | Array processor with asynchronous availability of a next SIMD instruction |
US5909686A (en) * | 1997-06-30 | 1999-06-01 | Sun Microsystems, Inc. | Hardware-assisted central processing unit access to a forwarding database |
US5951672A (en) * | 1997-07-02 | 1999-09-14 | International Business Machines Corporation | Synchronization method for work distribution in a multiprocessor system |
US5963746A (en) * | 1990-11-13 | 1999-10-05 | International Business Machines Corporation | Fully distributed processing memory element |
US5963210A (en) * | 1996-03-29 | 1999-10-05 | Stellar Semiconductor, Inc. | Graphics processor, system and method for generating screen pixels in raster order utilizing a single interpolator |
US6073185A (en) * | 1993-08-27 | 2000-06-06 | Teranex, Inc. | Parallel data processor |
US6085283A (en) * | 1993-11-19 | 2000-07-04 | Kabushiki Kaisha Toshiba | Data selecting memory device and selected data transfer device |
US6088044A (en) * | 1998-05-29 | 2000-07-11 | International Business Machines Corporation | Method for parallelizing software graphics geometry pipeline rendering |
US6089453A (en) * | 1997-10-10 | 2000-07-18 | Display Edge Technology, Ltd. | Article-information display system using electronically controlled tags |
US6119215A (en) * | 1998-06-29 | 2000-09-12 | Cisco Technology, Inc. | Synchronization and control system for an arrayed processing engine |
US6128720A (en) * | 1994-12-29 | 2000-10-03 | International Business Machines Corporation | Distributed processing array with component processors performing customized interpretation of instructions |
US6145075A (en) * | 1998-02-06 | 2000-11-07 | Ip-First, L.L.C. | Apparatus and method for executing a single-cycle exchange instruction to exchange contents of two locations in a register file |
US6212237B1 (en) * | 1997-06-17 | 2001-04-03 | Nippon Telegraph And Telephone Corporation | Motion vector search methods, motion vector search apparatus, and storage media storing a motion vector search program |
US6226710B1 (en) * | 1997-11-14 | 2001-05-01 | Utmc Microelectronic Systems Inc. | Content addressable memory (CAM) engine |
US20010008563A1 (en) * | 2000-01-19 | 2001-07-19 | Ricoh Company, Ltd. | Parallel processor and image processing apparatus |
US6269354B1 (en) * | 1998-11-30 | 2001-07-31 | David W. Arathorn | General purpose recognition e-circuits capable of translation-tolerant recognition, scene segmentation and attention shift, and their application to machine vision |
US6295534B1 (en) * | 1998-05-28 | 2001-09-25 | 3Com Corporation | Apparatus for maintaining an ordered list |
US6317819B1 (en) * | 1996-01-11 | 2001-11-13 | Steven G. Morton | Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction |
US6337929B1 (en) * | 1997-09-29 | 2002-01-08 | Canon Kabushiki Kaisha | Image processing apparatus and method and storing medium |
US6389446B1 (en) * | 1996-07-12 | 2002-05-14 | Nec Corporation | Multi-processor system executing a plurality of threads simultaneously and an execution method therefor |
US6405302B1 (en) * | 1995-05-02 | 2002-06-11 | Hitachi, Ltd. | Microcomputer |
US20020090128A1 (en) * | 2000-12-01 | 2002-07-11 | Ron Naftali | Hardware configuration for parallel data processing without cross communication |
US20020107990A1 (en) * | 2000-03-03 | 2002-08-08 | Surgient Networks, Inc. | Network connected computing system including network switch |
US20020114394A1 (en) * | 2000-12-06 | 2002-08-22 | Kai-Kuang Ma | System and method for motion vector generation and analysis of digital video clips |
US20020174318A1 (en) * | 1999-04-09 | 2002-11-21 | Dave Stuttard | Parallel data processing apparatus |
US20030041163A1 (en) * | 2001-02-14 | 2003-02-27 | John Rhoades | Data processing architectures |
US20030044074A1 (en) * | 2001-03-26 | 2003-03-06 | Ramot University Authority For Applied Research And Industrial Development Ltd. | Device and method for decoding class-based codewords |
US6542989B2 (en) * | 1999-06-15 | 2003-04-01 | Koninklijke Philips Electronics N.V. | Single instruction having op code and stack control field |
US20030085902A1 (en) * | 2001-11-02 | 2003-05-08 | Koninklijke Philips Electronics N.V. | Apparatus and method for parallel multimedia processing |
US6611524B2 (en) * | 1999-06-30 | 2003-08-26 | Cisco Technology, Inc. | Programmable data packet parser |
US20030208657A1 (en) * | 2002-05-06 | 2003-11-06 | Hywire Ltd. | Variable key type search engine and method therefor |
US20030206466A1 (en) * | 2001-09-25 | 2003-11-06 | Fujitsu Limited | Associative memory circuit judging whether or not a memory cell content matches search data by performing a differential amplification to a potential of a match line and a reference potential |
US20040030872A1 (en) * | 2002-08-08 | 2004-02-12 | Schlansker Michael S. | System and method using differential branch latency processing elements |
US20040057620A1 (en) * | 1999-01-22 | 2004-03-25 | Intermec Ip Corp. | Process and device for detection of straight-line segments in a stream of digital data that are representative of an image in which the contour points of said image are identified |
US20040071215A1 (en) * | 2001-04-20 | 2004-04-15 | Bellers Erwin B. | Method and apparatus for motion vector estimation |
US20040081238A1 (en) * | 2002-10-25 | 2004-04-29 | Manindra Parhy | Asymmetric block shape modes for motion estimation |
US20040081239A1 (en) * | 2002-10-28 | 2004-04-29 | Andrew Patti | System and method for estimating motion between images |
US6745317B1 (en) * | 1999-07-30 | 2004-06-01 | Broadcom Corporation | Three level direct communication connections between neighboring multiple context processing elements |
US6760821B2 (en) * | 2001-08-10 | 2004-07-06 | Gemicer, Inc. | Memory engine for the inspection and manipulation of data |
US6772268B1 (en) * | 2000-12-22 | 2004-08-03 | Nortel Networks Ltd | Centralized look up engine architecture and interface |
US20040170201A1 (en) * | 2001-06-15 | 2004-09-02 | Kazuo Kubo | Error-correction multiplexing apparatus, error-correction demultiplexing apparatus, optical transmission system using them, and error-correction multiplexing transmission method |
US20040190632A1 (en) * | 2003-03-03 | 2004-09-30 | Cismas Sorin C. | Memory word array organization and prediction combination for memory access |
US20040215927A1 (en) * | 2003-04-23 | 2004-10-28 | Mark Beaumont | Method for manipulating data in a group of processing elements |
US20040223656A1 (en) * | 1999-07-30 | 2004-11-11 | Indinell Sociedad Anonima | Method and apparatus for processing digital images |
US6848041B2 (en) * | 1997-12-18 | 2005-01-25 | Pts Corporation | Methods and apparatus for scalable instruction set architecture with dynamic compact instructions |
US20050163220A1 (en) * | 2004-01-26 | 2005-07-28 | Kentaro Takakura | Motion vector detection device and moving picture camera |
US20060018562A1 (en) * | 2004-01-16 | 2006-01-26 | Ruggiero Carl J | Video image processing with parallel processing |
US7013302B2 (en) * | 2000-12-22 | 2006-03-14 | Nortel Networks Limited | Bit field manipulation |
US7020671B1 (en) * | 2000-03-21 | 2006-03-28 | Hitachi America, Ltd. | Implementation of an inverse discrete cosine transform using single instruction multiple data instructions |
US20060072674A1 (en) * | 2004-07-29 | 2006-04-06 | Stmicroelectronics Pvt. Ltd. | Macro-block level parallel video decoder |
US20060098229A1 (en) * | 2004-11-10 | 2006-05-11 | Canon Kabushiki Kaisha | Image processing apparatus and method of controlling an image processing apparatus |
US20060174236A1 (en) * | 2005-01-28 | 2006-08-03 | Yosef Stein | Method and apparatus for accelerating processing of a non-sequential instruction stream on a processor with multiple compute units |
US7098437B2 (en) * | 2002-01-25 | 2006-08-29 | Semiconductor Technology Academic Research Center | Semiconductor integrated circuit device having a plurality of photo detectors and processing elements |
US20060222078A1 (en) * | 2005-03-10 | 2006-10-05 | Raveendran Vijayalakshmi R | Content classification for multimedia processing |
US20060227883A1 (en) * | 2005-04-11 | 2006-10-12 | Intel Corporation | Generating edge masks for a deblocking filter |
US20060262985A1 (en) * | 2005-05-03 | 2006-11-23 | Qualcomm Incorporated | System and method for scalable encoding and decoding of multimedia data using multiple layers |
US7196708B2 (en) * | 2004-03-31 | 2007-03-27 | Sony Corporation | Parallel vector processing |
US20070071404A1 (en) * | 2005-09-29 | 2007-03-29 | Honeywell International Inc. | Controlled video event presentation |
US20070162722A1 (en) * | 2006-01-10 | 2007-07-12 | Lazar Bivolarski | Method and apparatus for processing algorithm steps of multimedia data in parallel processing systems |
US20080126278A1 (en) * | 2006-11-29 | 2008-05-29 | Alexander Bronstein | Parallel processing motion estimation for H.264 video codec |
US7428628B2 (en) * | 2004-03-02 | 2008-09-23 | Imagination Technologies Limited | Method and apparatus for management of control flow in a SIMD device |
US7451293B2 (en) * | 2005-10-21 | 2008-11-11 | Brightscale Inc. | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
US7454593B2 (en) * | 2002-09-17 | 2008-11-18 | Micron Technology, Inc. | Row and column enable signal activation of processing array elements with interconnection logic to simulate bus effect |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3308436A (en) * | 1963-08-05 | 1967-03-07 | Westinghouse Electric Corp | Parallel computer system control |
US4783738A (en) * | 1986-03-13 | 1988-11-08 | International Business Machines Corporation | Adaptive instruction processing by array processor having processor identification and data dependent status registers in each processing element |
US5926642A (en) * | 1995-10-06 | 1999-07-20 | Advanced Micro Devices, Inc. | RISC86 instruction set |
EP0992916A1 (en) * | 1998-10-06 | 2000-04-12 | Texas Instruments Inc. | Digital signal processor |
US6173386B1 (en) * | 1998-12-14 | 2001-01-09 | Cisco Technology, Inc. | Parallel processor with debug capability |
GB0019341D0 (en) * | 2000-08-08 | 2000-09-27 | Easics Nv | System-on-chip solutions |
US20020133688A1 (en) * | 2001-01-29 | 2002-09-19 | Ming-Hau Lee | SIMD/MIMD processing on a reconfigurable array |
US6938183B2 (en) * | 2001-09-21 | 2005-08-30 | The Boeing Company | Fault tolerant processing architecture |
-
2006
- 2006-10-19 US US11/584,480 patent/US7451293B2/en not_active Expired - Fee Related
- 2006-10-20 WO PCT/US2006/040975 patent/WO2007050444A2/en active Application Filing
- 2006-10-20 KR KR1020087009137A patent/KR20080091754A/en not_active Application Discontinuation
- 2006-10-20 JP JP2008534793A patent/JP2009512920A/en not_active Abandoned
- 2006-10-20 CA CA002626184A patent/CA2626184A1/en not_active Abandoned
- 2006-10-20 EP EP06836411A patent/EP1941380A2/en not_active Withdrawn
- 2006-10-20 TW TW095138731A patent/TW200745876A/en unknown
-
2008
- 2008-05-28 US US12/128,528 patent/US20080307196A1/en not_active Abandoned
Patent Citations (99)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4212076A (en) * | 1976-09-24 | 1980-07-08 | Giddings & Lewis, Inc. | Digital computer structure providing arithmetic and boolean logic operations, the latter controlling the former |
US4575818A (en) * | 1983-06-07 | 1986-03-11 | Tektronix, Inc. | Apparatus for in effect extending the width of an associative memory by serial matching of portions of the search pattern |
US4780811A (en) * | 1985-07-03 | 1988-10-25 | Hitachi, Ltd. | Vector processing apparatus providing vector and scalar processor synchronization |
US4907148A (en) * | 1985-11-13 | 1990-03-06 | Alcatel U.S.A. Corp. | Cellular array processor with individual cell-level data-dependent cell control and multiport input memory |
US4992933A (en) * | 1986-10-27 | 1991-02-12 | International Business Machines Corporation | SIMD array processor with global instruction control and reprogrammable instruction decoders |
US4873626A (en) * | 1986-12-17 | 1989-10-10 | Massachusetts Institute Of Technology | Parallel processing system with processor array having memory system included in system memory |
US5122984A (en) * | 1987-01-07 | 1992-06-16 | Bernard Strehler | Parallel associative memory system |
US4922341A (en) * | 1987-09-30 | 1990-05-01 | Siemens Aktiengesellschaft | Method for scene-model-assisted reduction of image data for digital television signals |
US4876644A (en) * | 1987-10-30 | 1989-10-24 | International Business Machines Corp. | Parallel pipelined processor |
US4983958A (en) * | 1988-01-29 | 1991-01-08 | Intel Corporation | Vector selectable coordinate-addressable DRAM array |
US5241635A (en) * | 1988-11-18 | 1993-08-31 | Massachusetts Institute Of Technology | Tagged token data processing system with operand matching in activation frames |
US5329405A (en) * | 1989-01-23 | 1994-07-12 | Codex Corporation | Associative cam apparatus and method for variable length string matching |
US5497488A (en) * | 1990-06-12 | 1996-03-05 | Hitachi, Ltd. | System for parallel string search with a function-directed parallel collation of a first partition of each string followed by matching of second partitions |
US5319762A (en) * | 1990-09-07 | 1994-06-07 | The Mitre Corporation | Associative memory capable of matching a variable indicator in one string of characters with a portion of another string |
US5822608A (en) * | 1990-11-13 | 1998-10-13 | International Business Machines Corporation | Associative parallel processing system |
US5870619A (en) * | 1990-11-13 | 1999-02-09 | International Business Machines Corporation | Array processor with asynchronous availability of a next SIMD instruction |
US5963746A (en) * | 1990-11-13 | 1999-10-05 | International Business Machines Corporation | Fully distributed processing memory element |
US5150430A (en) * | 1991-03-15 | 1992-09-22 | The Board Of Trustees Of The Leland Stanford Junior University | Lossless data compression circuit and method |
US5228098A (en) * | 1991-06-14 | 1993-07-13 | Tektronix, Inc. | Adaptive spatio-temporal compression/decompression of video image signals |
US5373290A (en) * | 1991-09-25 | 1994-12-13 | Hewlett-Packard Corporation | Apparatus and method for managing multiple dictionaries in content addressable memory based data compression |
US5640582A (en) * | 1992-05-21 | 1997-06-17 | Intel Corporation | Register stacking in a computer system |
US5450599A (en) * | 1992-06-04 | 1995-09-12 | International Business Machines Corporation | Sequential pipelined processing for the compression and decompression of image data |
US5818873A (en) * | 1992-08-03 | 1998-10-06 | Advanced Hardware Architectures, Inc. | Single clock cycle data compressor/decompressor with a string reversal mechanism |
US5440753A (en) * | 1992-11-13 | 1995-08-08 | Motorola, Inc. | Variable length string matcher |
US5446915A (en) * | 1993-05-25 | 1995-08-29 | Intel Corporation | Parallel processing system virtual connection method and apparatus with protection and flow control |
US5448733A (en) * | 1993-07-16 | 1995-09-05 | International Business Machines Corp. | Data search and compression device and method for searching and compressing repeating data |
US6073185A (en) * | 1993-08-27 | 2000-06-06 | Teranex, Inc. | Parallel data processor |
US5490264A (en) * | 1993-09-30 | 1996-02-06 | Intel Corporation | Generally-diagonal mapping of address space for row/column organizer memories |
US6085283A (en) * | 1993-11-19 | 2000-07-04 | Kabushiki Kaisha Toshiba | Data selecting memory device and selected data transfer device |
US5602764A (en) * | 1993-12-22 | 1997-02-11 | Storage Technology Corporation | Comparing prioritizing memory for string searching in a data compression system |
US5758176A (en) * | 1994-09-28 | 1998-05-26 | International Business Machines Corporation | Method and system for providing a single-instruction, multiple-data execution unit for performing single-instruction, multiple-data operations within a superscalar data processing system |
US5631849A (en) * | 1994-11-14 | 1997-05-20 | The 3Do Company | Decompressor and compressor for simultaneously decompressing and compressng a plurality of pixels in a pixel array in a digital image differential pulse code modulation (DPCM) system |
US5706290A (en) * | 1994-12-15 | 1998-01-06 | Shaw; Venson | Method and apparatus including system architecture for multimedia communication |
US6128720A (en) * | 1994-12-29 | 2000-10-03 | International Business Machines Corporation | Distributed processing array with component processors performing customized interpretation of instructions |
US5682491A (en) * | 1994-12-29 | 1997-10-28 | International Business Machines Corporation | Selective processing and routing of results among processors controlled by decoding instructions using mask value derived from instruction tag and processor identifier |
US6405302B1 (en) * | 1995-05-02 | 2002-06-11 | Hitachi, Ltd. | Microcomputer |
US6317819B1 (en) * | 1996-01-11 | 2001-11-13 | Steven G. Morton | Digital signal processor containing scalar processor and a plurality of vector processors operating from a single instruction |
US5963210A (en) * | 1996-03-29 | 1999-10-05 | Stellar Semiconductor, Inc. | Graphics processor, system and method for generating screen pixels in raster order utilizing a single interpolator |
US5828593A (en) * | 1996-07-11 | 1998-10-27 | Northern Telecom Limited | Large-capacity content addressable memory |
US6389446B1 (en) * | 1996-07-12 | 2002-05-14 | Nec Corporation | Multi-processor system executing a plurality of threads simultaneously and an execution method therefor |
US5867598A (en) * | 1996-09-26 | 1999-02-02 | Xerox Corporation | Method and apparatus for processing of a JPEG compressed image |
US6212237B1 (en) * | 1997-06-17 | 2001-04-03 | Nippon Telegraph And Telephone Corporation | Motion vector search methods, motion vector search apparatus, and storage media storing a motion vector search program |
US5909686A (en) * | 1997-06-30 | 1999-06-01 | Sun Microsystems, Inc. | Hardware-assisted central processing unit access to a forwarding database |
US5951672A (en) * | 1997-07-02 | 1999-09-14 | International Business Machines Corporation | Synchronization method for work distribution in a multiprocessor system |
US6337929B1 (en) * | 1997-09-29 | 2002-01-08 | Canon Kabushiki Kaisha | Image processing apparatus and method and storing medium |
US6089453A (en) * | 1997-10-10 | 2000-07-18 | Display Edge Technology, Ltd. | Article-information display system using electronically controlled tags |
US6473846B1 (en) * | 1997-11-14 | 2002-10-29 | Aeroflex Utmc Microelectronic Systems, Inc. | Content addressable memory (CAM) engine |
US6226710B1 (en) * | 1997-11-14 | 2001-05-01 | Utmc Microelectronic Systems Inc. | Content addressable memory (CAM) engine |
US6848041B2 (en) * | 1997-12-18 | 2005-01-25 | Pts Corporation | Methods and apparatus for scalable instruction set architecture with dynamic compact instructions |
US6145075A (en) * | 1998-02-06 | 2000-11-07 | Ip-First, L.L.C. | Apparatus and method for executing a single-cycle exchange instruction to exchange contents of two locations in a register file |
US6295534B1 (en) * | 1998-05-28 | 2001-09-25 | 3Com Corporation | Apparatus for maintaining an ordered list |
US6088044A (en) * | 1998-05-29 | 2000-07-11 | International Business Machines Corporation | Method for parallelizing software graphics geometry pipeline rendering |
US6119215A (en) * | 1998-06-29 | 2000-09-12 | Cisco Technology, Inc. | Synchronization and control system for an arrayed processing engine |
US6269354B1 (en) * | 1998-11-30 | 2001-07-31 | David W. Arathorn | General purpose recognition e-circuits capable of translation-tolerant recognition, scene segmentation and attention shift, and their application to machine vision |
US20040057620A1 (en) * | 1999-01-22 | 2004-03-25 | Intermec Ip Corp. | Process and device for detection of straight-line segments in a stream of digital data that are representative of an image in which the contour points of said image are identified |
US20020174318A1 (en) * | 1999-04-09 | 2002-11-21 | Dave Stuttard | Parallel data processing apparatus |
US6542989B2 (en) * | 1999-06-15 | 2003-04-01 | Koninklijke Philips Electronics N.V. | Single instruction having op code and stack control field |
US6611524B2 (en) * | 1999-06-30 | 2003-08-26 | Cisco Technology, Inc. | Programmable data packet parser |
US6745317B1 (en) * | 1999-07-30 | 2004-06-01 | Broadcom Corporation | Three level direct communication connections between neighboring multiple context processing elements |
US20040223656A1 (en) * | 1999-07-30 | 2004-11-11 | Indinell Sociedad Anonima | Method and apparatus for processing digital images |
US20010008563A1 (en) * | 2000-01-19 | 2001-07-19 | Ricoh Company, Ltd. | Parallel processor and image processing apparatus |
US20020107990A1 (en) * | 2000-03-03 | 2002-08-08 | Surgient Networks, Inc. | Network connected computing system including network switch |
US7020671B1 (en) * | 2000-03-21 | 2006-03-28 | Hitachi America, Ltd. | Implementation of an inverse discrete cosine transform using single instruction multiple data instructions |
US20020090128A1 (en) * | 2000-12-01 | 2002-07-11 | Ron Naftali | Hardware configuration for parallel data processing without cross communication |
US20020114394A1 (en) * | 2000-12-06 | 2002-08-22 | Kai-Kuang Ma | System and method for motion vector generation and analysis of digital video clips |
US6772268B1 (en) * | 2000-12-22 | 2004-08-03 | Nortel Networks Ltd | Centralized look up engine architecture and interface |
US7013302B2 (en) * | 2000-12-22 | 2006-03-14 | Nortel Networks Limited | Bit field manipulation |
US20030041163A1 (en) * | 2001-02-14 | 2003-02-27 | John Rhoades | Data processing architectures |
US20030044074A1 (en) * | 2001-03-26 | 2003-03-06 | Ramot University Authority For Applied Research And Industrial Development Ltd. | Device and method for decoding class-based codewords |
US20040071215A1 (en) * | 2001-04-20 | 2004-04-15 | Bellers Erwin B. | Method and apparatus for motion vector estimation |
US20040170201A1 (en) * | 2001-06-15 | 2004-09-02 | Kazuo Kubo | Error-correction multiplexing apparatus, error-correction demultiplexing apparatus, optical transmission system using them, and error-correction multiplexing transmission method |
US6760821B2 (en) * | 2001-08-10 | 2004-07-06 | Gemicer, Inc. | Memory engine for the inspection and manipulation of data |
US20030206466A1 (en) * | 2001-09-25 | 2003-11-06 | Fujitsu Limited | Associative memory circuit judging whether or not a memory cell content matches search data by performing a differential amplification to a potential of a match line and a reference potential |
US20030085902A1 (en) * | 2001-11-02 | 2003-05-08 | Koninklijke Philips Electronics N.V. | Apparatus and method for parallel multimedia processing |
US7098437B2 (en) * | 2002-01-25 | 2006-08-29 | Semiconductor Technology Academic Research Center | Semiconductor integrated circuit device having a plurality of photo detectors and processing elements |
US20030208657A1 (en) * | 2002-05-06 | 2003-11-06 | Hywire Ltd. | Variable key type search engine and method therefor |
US6901476B2 (en) * | 2002-05-06 | 2005-05-31 | Hywire Ltd. | Variable key type search engine and method therefor |
US20040030872A1 (en) * | 2002-08-08 | 2004-02-12 | Schlansker Michael S. | System and method using differential branch latency processing elements |
US7454593B2 (en) * | 2002-09-17 | 2008-11-18 | Micron Technology, Inc. | Row and column enable signal activation of processing array elements with interconnection logic to simulate bus effect |
US20040081238A1 (en) * | 2002-10-25 | 2004-04-29 | Manindra Parhy | Asymmetric block shape modes for motion estimation |
US20040081239A1 (en) * | 2002-10-28 | 2004-04-29 | Andrew Patti | System and method for estimating motion between images |
US20040190632A1 (en) * | 2003-03-03 | 2004-09-30 | Cismas Sorin C. | Memory word array organization and prediction combination for memory access |
US20040215927A1 (en) * | 2003-04-23 | 2004-10-28 | Mark Beaumont | Method for manipulating data in a group of processing elements |
US20060018562A1 (en) * | 2004-01-16 | 2006-01-26 | Ruggiero Carl J | Video image processing with parallel processing |
US20050163220A1 (en) * | 2004-01-26 | 2005-07-28 | Kentaro Takakura | Motion vector detection device and moving picture camera |
US7428628B2 (en) * | 2004-03-02 | 2008-09-23 | Imagination Technologies Limited | Method and apparatus for management of control flow in a SIMD device |
US7196708B2 (en) * | 2004-03-31 | 2007-03-27 | Sony Corporation | Parallel vector processing |
US20060072674A1 (en) * | 2004-07-29 | 2006-04-06 | Stmicroelectronics Pvt. Ltd. | Macro-block level parallel video decoder |
US20060098229A1 (en) * | 2004-11-10 | 2006-05-11 | Canon Kabushiki Kaisha | Image processing apparatus and method of controlling an image processing apparatus |
US20060174236A1 (en) * | 2005-01-28 | 2006-08-03 | Yosef Stein | Method and apparatus for accelerating processing of a non-sequential instruction stream on a processor with multiple compute units |
US20060222078A1 (en) * | 2005-03-10 | 2006-10-05 | Raveendran Vijayalakshmi R | Content classification for multimedia processing |
US20060227883A1 (en) * | 2005-04-11 | 2006-10-12 | Intel Corporation | Generating edge masks for a deblocking filter |
US20060262985A1 (en) * | 2005-05-03 | 2006-11-23 | Qualcomm Incorporated | System and method for scalable encoding and decoding of multimedia data using multiple layers |
US20070071404A1 (en) * | 2005-09-29 | 2007-03-29 | Honeywell International Inc. | Controlled video event presentation |
US7451293B2 (en) * | 2005-10-21 | 2008-11-11 | Brightscale Inc. | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing |
US20070188505A1 (en) * | 2006-01-10 | 2007-08-16 | Lazar Bivolarski | Method and apparatus for scheduling the processing of multimedia data in parallel processing systems |
US20070189618A1 (en) * | 2006-01-10 | 2007-08-16 | Lazar Bivolarski | Method and apparatus for processing sub-blocks of multimedia data in parallel processing systems |
US20070162722A1 (en) * | 2006-01-10 | 2007-07-12 | Lazar Bivolarski | Method and apparatus for processing algorithm steps of multimedia data in parallel processing systems |
US20080126278A1 (en) * | 2006-11-29 | 2008-05-29 | Alexander Bronstein | Parallel processing motion estimation for H.264 video codec |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9864717B2 (en) | 2011-04-13 | 2018-01-09 | Hewlett Packard Enterprise Development Lp | Input/output processing |
Also Published As
Publication number | Publication date |
---|---|
EP1941380A2 (en) | 2008-07-09 |
US20070130444A1 (en) | 2007-06-07 |
JP2009512920A (en) | 2009-03-26 |
TW200745876A (en) | 2007-12-16 |
US7451293B2 (en) | 2008-11-11 |
WO2007050444A3 (en) | 2009-04-30 |
WO2007050444A2 (en) | 2007-05-03 |
CA2626184A1 (en) | 2007-05-03 |
KR20080091754A (en) | 2008-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7451293B2 (en) | Array of Boolean logic controlled processing elements with concurrent I/O processing and instruction sequencing | |
CN110383237B (en) | Reconfigurable matrix multiplier system and method | |
KR101202445B1 (en) | Processor | |
JP7315317B2 (en) | Processors and how they transfer data | |
US8181003B2 (en) | Instruction set design, control and communication in programmable microprocessor cores and the like | |
US9015390B2 (en) | Active memory data compression system and method | |
JP3559046B2 (en) | Data processing management system | |
KR100904318B1 (en) | Conditional instruction for a single instruction, multiple data execution engine | |
EP0428326A1 (en) | Processor array system | |
CN100472505C (en) | Parallel processing array | |
WO2001031418A2 (en) | Wide connections for transferring data between pe's of an n-dimensional mesh-connected simd array while transferring operands from memory | |
US20050024983A1 (en) | Providing a register file memory with local addressing in a SIMD parallel processor | |
US20080059764A1 (en) | Integral parallel machine | |
US11907158B2 (en) | Vector processor with vector first and multiple lane configuration | |
US20080059763A1 (en) | System and method for fine-grain instruction parallelism for increased efficiency of processing compressed multimedia data | |
CN110050259B (en) | Vector processor and control method thereof | |
JP2021108104A (en) | Partially readable/writable reconfigurable systolic array system and method | |
JP2002529847A (en) | Digital signal processor with bit FIFO | |
US6728863B1 (en) | Wide connections for transferring data between PE's of an N-dimensional mesh-connected SIMD array while transferring operands from memory | |
CN101217673B (en) | Format conversion apparatus from band interleave format to band separate format | |
JP2812292B2 (en) | Image processing device | |
US20110208951A1 (en) | Instruction processor and method therefor | |
CN112579971B (en) | Matrix operation circuit, matrix operation device and matrix operation method | |
JP2004252556A (en) | Information processor | |
WO2023199014A1 (en) | Technique for handling data elements stored in an array storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |