US20070073925A1 - Systems and methods for synchronizing multiple processing engines of a microprocessor - Google Patents
Systems and methods for synchronizing multiple processing engines of a microprocessor Download PDFInfo
- Publication number
- US20070073925A1 US20070073925A1 US11/528,470 US52847006A US2007073925A1 US 20070073925 A1 US20070073925 A1 US 20070073925A1 US 52847006 A US52847006 A US 52847006A US 2007073925 A1 US2007073925 A1 US 2007073925A1
- Authority
- US
- United States
- Prior art keywords
- dma
- instruction
- pipeline
- instruction pipeline
- extended
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 title claims abstract description 17
- 230000015654 memory Effects 0.000 claims abstract description 26
- 239000000872 buffer Substances 0.000 claims abstract description 14
- 230000003139 buffering effect Effects 0.000 claims description 7
- 230000000903 blocking effect Effects 0.000 claims description 6
- 230000008878 coupling Effects 0.000 claims description 6
- 238000010168 coupling process Methods 0.000 claims description 6
- 238000005859 coupling reaction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30018—Bit or string instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
- G06F9/3875—Pipelining a single stage, e.g. superpipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/43—Hardware specially adapted for motion estimation or compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Definitions
- the invention relates generally to embedded microprocessor architecture and more specifically to systems and methods for synchronizing the operation of multiple processing engines in a microprocessor-based system.
- Processor extension logic is utilized to extend a microprocessor's capability.
- this logic is in parallel and accessible by the main processor pipeline. It is often used to perform specific, repetitive, computationally intensive functions thereby freeing up the main processor pipeline.
- processor extension logic such as an extended instruction pipeline that is distinct from the main instruction pipeline
- At least one embodiment of the invention may provide a method for synchronization of multiple processing engines in an extended processor core.
- the method may comprise placing direct memory access (DMA) functionality in a single instruction multiple data (SIMD) pipeline, where the DMA functionality comprises a data-in engine and a data-out engine, and each DMA engine is allowed to buffer at least one instruction issued to it in a queue without stopping the SIMD pipeline.
- the method may also comprise, when the DMA engine queue is full, and a new DMA instruction is trying to enter the queue, blocking the SIMD pipeline from executing any instructions that follow until the current DMA operation is complete, thereby allowing the DMA engine and SIMI pipeline to maximize parallel operation while still remaining synchronized.
- Another embodiment of the invention provides a method for synchronizing multiple processing engines of a microprocessor.
- the method according to this embodiment comprises coupling an extended instruction pipeline to a main instruction pipeline, coupling direct memory access (DMA) engines to the extended instruction pipeline, buffering at least one instruction in a queue in the DMA engine without stopping the extended instruction pipeline, and blocking the extended instruction pipeline from further execution when a DMA engine queue is full and a new DMA instruction arrives at the queue until a current DMA operation is complete.
- DMA direct memory access
- An additional embodiment of the invention provides, in a microprocessor having a main instruction pipeline and processor extension logic comprising an extended instruction pipeline that is coupled to the main instruction pipeline via an instruction queue, wherein the extended instruction pipeline is adapted to be selectively decoupled from the main instruction pipeline to perform autonomous operation, and where the extended instruction pipeline is further coupled to DMA engines for moving data into and moving data out of a local memory, a method for maximizing simultaneous operation of the extended instruction pipeline and the DMA engines.
- the method according to this embodiment comprises executing an instruction from the extended instruction pipeline requiring the DMA engine, buffering the instruction if sufficient queue space is available in the DMA engine, and preventing the extended instruction pipeline from further execution if insufficient queue space is available until a current DMA operation is complete, freeing up a space the queue to accept a blocked DMA instruction on the instruction pipeline, thereafter resuming execution of the extended processor pipeline.
- FIG. 1 is a functional block diagram illustrating a microprocessor-based system including a main processor core and a SIMD media accelerator according to at least one embodiment of the invention
- FIG. 2 is an instruction sequence flow diagram and corresponding event time line illustrating a method for synchronizing processing between DMA tasks and SIMD tasks according to at least one embodiment of the invention.
- FIG. 3 is a flow chart detailing steps of an exemplary method for synchronizing multiple processing engines in a microprocessor according to various embodiments of the invention.
- FIG. 1 a functional block diagram illustrating a microprocessor-based system 5 including a main processor core 10 and a SIMD media accelerator 50 according to at least one embodiment of the invention is provided.
- the diagram illustrates a microprocessor 5 comprising a standard single instruction single data (SISD) processor core 10 having a multistage instruction pipeline 12 and a SIMD media engine 50 .
- the processor core 10 may be a processor core such as the ARC 700 embedded processor core available from ARC International Limited of Elstree, United Kingdom, and as described in provisional patent application No. 60/572,238 filed May 19, 2004 entitled “Microprocessor Architecture” which, is hereby incorporated by reference in its entirety.
- the processor core may be a different processor core.
- a single instruction issued by the processor pipeline 12 may cause up to sixteen 16-bit elements to be operated on in parallel through the use of the 128-bit data path 55 in the media engine 50 .
- the SIMD engine 50 utilizes closely coupled memory units.
- the SIMD data memory 52 (SDM) is a 128-bit wide data memory that provides low latency access to perform loads to and stores from the 128-bit vector register file 51 .
- the SDM contents are transferable via a DMA unit 54 thereby freeing up the processor core 10 and the SIMD core 50 .
- the DMA unit 54 comprises a DMA in engine 61 and a DMA out engine 62 .
- both the DMA in engine 61 and DMA out engine 62 may comprise instruction queues (labeled Q in the Figure) for buffering one or more instructions.
- a SIMD code memory 56 allows the SIMD unit to fetch instructions from a localized code memory, allowing the SIMD pipeline to dynamically decouple from the processor core 10 resulting in truly parallel operation between the processor core and SIMD media engine as discussed in commonly assigned U.S. patent application Ser. No. ______, titled, “Systems and Methods for Recording Instruction Sequences in a Microprocessor Having a Dynamically Decoupleable Extended Instruction Pipeline,” filed concurrently herewith, the disclosure of which is hereby incorporated by reference in its entirety.
- the microprocessor architecture may permit the processor to operate in both closely coupled and decoupled modes of operation.
- the SIMD program code fetch and program stream supply is exclusively handled by the processor core 10 .
- the SIMD pipeline 53 executes code from a local memory 56 independent of the processor core 10 .
- the processor core 10 may control the SIMD pipeline 53 to execute video tasks such as audio processing, entropy encoding/decoding, discrete cosine transforms (DCTs) and inverse DCTs, motion compensation and de-block filtering.
- DCTs discrete cosine transforms
- the main processor pipeline 12 has been extended with a high performance SIMD engine 50 and two direct memory access (DMA) engines 61 and 62 , one for moving data into a local memory, SIMD data memory (SDM), and one for moving data out of local memory.
- the SIMD engine 50 and DMA engines 61 , 62 are all executing instructions that are fetched and issued from in the main processor pipeline 10 .
- these individual engines need to be able operate in parallel, and hence, as discussed above, instruction queues (Q) are placed between the main processor core 10 and the SIMD engine 50 , and between the SIMD 50 engine and the DMA engines 61 , 62 , so that they can all operate out of step of each other.
- a local SIMD code memory (SCM) is introduced so that macros can be called and can be executed from these memories. This allows the main processor core, the SIMD engines and the DMA engines to execute out of step of each other.
- the DMA engines 61 , 62 are placed in the SIMD pipeline 53 itself, but each DMA engine is allowed to buffer one or more instructions issued to it in a queue without stopping the SIMD pipeline execution.
- the SIMD engine pipeline 53 will be blocked from executing further instructions only when another DMA instruction arrives at the DMA. This allows the software to be re-organized so that a SIMD code will have to wait for a DMA operation to complete, or vice versa, as long as a double or more buffering approach is used, that is, two or more buffers are used to allow overlapping of data transfer and data computation.
- each DMA channel is allowed to buffer at least one instruction in a queue.
- a queue there are two independent video pixel data blocks to be processed, and that each requires multiple blocks of pixel data to be moved into local memory and to be processed, before moving the results out of local memory.
- this Figure illustrates an instruction sequence flow diagram 100 and corresponding event time line 110 illustrating a method for synchronizing processing between DMA tasks and SIMD tasks, with only one deep instruction queues in each DMA engines, according to at least one embodiment of the invention.
- the DI 2 DMA operation is blocked if the buffered DI 1 DMA operation is not completed, causing the DI 2 DMA instruction to be blocked from entering the DMA instruction queue, which in turn results in the S 1 SIMD operation being blocked. Since S 1 operation depends on data from DI 1 operation, the blocking action prevents the S 1 SIMD instruction sequence from proceeding until the DI 1 operation is completed.
- the DI 3 DMA operation is executed only after S 1 is completed.
- DI 2 and S 1 , DI 3 and S 2 , and DI 4 and S 3 are shown as starting at the same time respectively.
- S 1 will start one clock cycle after DI 2
- S 2 will start one clock cycle after DI 3
- S 3 will start one clock cycle after DI 4 .
- the time line is intended to demonstrate that S 1 cannot start before DI 1 is complete, S 2 can not start before DI 2 is complete, S 3 can not start before DI 3 is complete, and S 4 can not start before DI 4 is complete.
- This approach avoids the need of the main processor core from intervening continuously in order to achieve synchronization between the DMA unit and the SIMD pipeline.
- the processor core 10 does need to ensure that the instruction sequence sent uses this functionality to achieve the best performance by parallelizing SIMD and DMA operations.
- an advantage of this approach is that it facilitates the synchronization of SIMD and DMA operations in a multi-engine video processing core with minimal interaction between the main control processor core.
- This approach can be extended by increasing the depth of the DMA non-blocking instruction queue so as to allow more DMA instructions to be buffered in the DMA channels, allowing double, triple or more buffering.
- FIG. 3 is a flow chart of an exemplary method for synchronizing multiple processing engines in a microprocessor-based system according to at least one embodiment of the invention.
- FIG. 3 demonstrates a method for coding the instruction sequence to allow both the SIMD engine and DMA engines to operate simultaneously as much as possible.
- the method begins in step 200 and proceeds to step 205 where an instruction requiring the DMA engine is executed by the SIMD pipeline.
- the SIMD pipeline accesses the required DMA engine queue. If in step 210 , the DMA engine instruction queue is already full when it is accessed, the SIMD pipeline is paused from further execution, as described in step 215 .
- step 220 the SIMD waits for a free space in the instruction queue of the targeted DMA engine.
- the DMA engine corresponding to the target queue performs its current DMA operation instructed by the DMA instruction(s) already in the queue.
- the DMA engine instruction queue opens up a free space so that in step 225 , the stalled DMA instruction can be buffered in the queue.
- the SIMD pipeline then resumes execution in step 230 after the DMA instruction has been buffered. Accordingly, through the various systems and methods disclosed herein, simultaneous operation of the SIMD pipeline and the DMA engines is maximized without the risk of overwrite.
Abstract
Systems and methods for synchronizing multiple processing engines of a microprocessor. In a microprocessor engine employing processor extension logic, DMA engines are used to permit the processor extension logic to move data into and out of local memory independent of the main instruction pipeline. Synchronization between the extended instruction pipeline and DMA engines is performed to maximize simultaneous operation of these elements. The DMA engines includes a data-in and data-out engine each adapted to buffer at least one instruction in a queue. If, for each DMA engine, the queue is full and a new instruction is trying to enter the buffer, the DMA engine will cause the extended pipeline to pause execution until the current DMA operation is complete. This prevents data overwrites while maximizing simultaneous operation.
Description
- This application claims priority to U.S. Provisional Patent Application No. 60/721,108 titled “SIMD Architecture and Associated Systems and Methods,” filed Sep. 28, 2005, the disclosure of which is hereby incorporated by reference in its entirety.
- The invention relates generally to embedded microprocessor architecture and more specifically to systems and methods for synchronizing the operation of multiple processing engines in a microprocessor-based system.
- Processor extension logic is utilized to extend a microprocessor's capability.
- Typically, this logic is in parallel and accessible by the main processor pipeline. It is often used to perform specific, repetitive, computationally intensive functions thereby freeing up the main processor pipeline.
- A design issue that must be addressed in microprocessor architectures and microprocessor-based system in general that employ processor extension logic, such as an extended instruction pipeline that is distinct from the main instruction pipeline, is synchronization and control. It is difficult to balance the competing interests of simplifying implementation and debugging while maximizing parallelism.
- Thus, there exists a need for a parallel pipeline architecture that can fully exploit the advantages of parallelism without suffering from the design complexity of loosely or completely decoupled pipelines.
- At least one embodiment of the invention may provide a method for synchronization of multiple processing engines in an extended processor core. The method according to this embodiment may comprise placing direct memory access (DMA) functionality in a single instruction multiple data (SIMD) pipeline, where the DMA functionality comprises a data-in engine and a data-out engine, and each DMA engine is allowed to buffer at least one instruction issued to it in a queue without stopping the SIMD pipeline. The method may also comprise, when the DMA engine queue is full, and a new DMA instruction is trying to enter the queue, blocking the SIMD pipeline from executing any instructions that follow until the current DMA operation is complete, thereby allowing the DMA engine and SIMI pipeline to maximize parallel operation while still remaining synchronized.
- Another embodiment of the invention provides a method for synchronizing multiple processing engines of a microprocessor. The method according to this embodiment comprises coupling an extended instruction pipeline to a main instruction pipeline, coupling direct memory access (DMA) engines to the extended instruction pipeline, buffering at least one instruction in a queue in the DMA engine without stopping the extended instruction pipeline, and blocking the extended instruction pipeline from further execution when a DMA engine queue is full and a new DMA instruction arrives at the queue until a current DMA operation is complete.
- A further embodiment of the invention provides a multi-processing engine architecture for a microprocessor. The multi-processing engine architecture for a microprocessor according to this embodiment comprises a main instruction pipeline, an extended instruction pipeline coupled to the main instruction pipeline via an instruction queue, and direct memory access (DMA) engines coupled to the extended instruction pipeline, the DMA access engines comprising a data-in engine and a data-out engine, wherein each of the data-in and data-out engines comprise an instruction queue adapted to buffer at least one instruction
- An additional embodiment of the invention provides, in a microprocessor having a main instruction pipeline and processor extension logic comprising an extended instruction pipeline that is coupled to the main instruction pipeline via an instruction queue, wherein the extended instruction pipeline is adapted to be selectively decoupled from the main instruction pipeline to perform autonomous operation, and where the extended instruction pipeline is further coupled to DMA engines for moving data into and moving data out of a local memory, a method for maximizing simultaneous operation of the extended instruction pipeline and the DMA engines. The method according to this embodiment comprises executing an instruction from the extended instruction pipeline requiring the DMA engine, buffering the instruction if sufficient queue space is available in the DMA engine, and preventing the extended instruction pipeline from further execution if insufficient queue space is available until a current DMA operation is complete, freeing up a space the queue to accept a blocked DMA instruction on the instruction pipeline, thereafter resuming execution of the extended processor pipeline.
- These and other embodiments and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
- In order to facilitate a fuller understanding of the present disclosure, reference is now made to the accompanying drawings, in which like elements are referenced with like numerals. These drawings should not be construed as limiting the present disclosure, but are intended to be exemplary only
-
FIG. 1 is a functional block diagram illustrating a microprocessor-based system including a main processor core and a SIMD media accelerator according to at least one embodiment of the invention; -
FIG. 2 is an instruction sequence flow diagram and corresponding event time line illustrating a method for synchronizing processing between DMA tasks and SIMD tasks according to at least one embodiment of the invention; and -
FIG. 3 is a flow chart detailing steps of an exemplary method for synchronizing multiple processing engines in a microprocessor according to various embodiments of the invention. - The following description is intended to convey a thorough understanding of the embodiments described by providing a number of specific embodiments and details involving microprocessor architecture and systems and methods for synchronizing multiple processing engines in a microprocessor-based system. It should be appreciated, however, that the present invention is not limited to these specific embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
- Commonly assigned U.S. patent application Ser. No. ______ titled “System and Method for Selectively Decoupling a Parallel Extended Processor Pipeline,” filed concurrently with this application is hereby incorporated by reference in its entirety into the disclosure of this application.
- Referring now to
FIG. 1 , a functional block diagram illustrating a microprocessor-basedsystem 5 including amain processor core 10 and aSIMD media accelerator 50 according to at least one embodiment of the invention is provided. The diagram illustrates amicroprocessor 5 comprising a standard single instruction single data (SISD)processor core 10 having amultistage instruction pipeline 12 and aSIMD media engine 50. In various embodiments, theprocessor core 10 may be a processor core such as the ARC 700 embedded processor core available from ARC International Limited of Elstree, United Kingdom, and as described in provisional patent application No. 60/572,238 filed May 19, 2004 entitled “Microprocessor Architecture” which, is hereby incorporated by reference in its entirety. Alternatively, in various embodiments, the processor core may be a different processor core. - In various embodiments, a single instruction issued by the
processor pipeline 12 may cause up to sixteen 16-bit elements to be operated on in parallel through the use of the 128-bit data path 55 in themedia engine 50. In various embodiments, theSIMD engine 50 utilizes closely coupled memory units. In various embodiments, the SIMD data memory 52 (SDM) is a 128-bit wide data memory that provides low latency access to perform loads to and stores from the 128-bitvector register file 51. The SDM contents are transferable via aDMA unit 54 thereby freeing up theprocessor core 10 and theSIMD core 50. In various embodiments, theDMA unit 54 comprises a DMA inengine 61 and a DMA outengine 62. In various embodiments, both the DMA inengine 61 and DMA outengine 62 may comprise instruction queues (labeled Q in the Figure) for buffering one or more instructions. In various embodiments, a SIMD code memory 56 (SCM) allows the SIMD unit to fetch instructions from a localized code memory, allowing the SIMD pipeline to dynamically decouple from theprocessor core 10 resulting in truly parallel operation between the processor core and SIMD media engine as discussed in commonly assigned U.S. patent application Ser. No. ______, titled, “Systems and Methods for Recording Instruction Sequences in a Microprocessor Having a Dynamically Decoupleable Extended Instruction Pipeline,” filed concurrently herewith, the disclosure of which is hereby incorporated by reference in its entirety. - Therefore, in various embodiments, the microprocessor architecture according to various embodiments of the invention may permit the processor to operate in both closely coupled and decoupled modes of operation. In the closely coupled mode of operation, the SIMD program code fetch and program stream supply is exclusively handled by the
processor core 10. In the decoupled mode of operation, theSIMD pipeline 53 executes code from alocal memory 56 independent of theprocessor core 10. Theprocessor core 10 may control theSIMD pipeline 53 to execute video tasks such as audio processing, entropy encoding/decoding, discrete cosine transforms (DCTs) and inverse DCTs, motion compensation and de-block filtering. - With continued reference to the microprocessor architecture in
FIG. 1 , themain processor pipeline 12 has been extended with a highperformance SIMD engine 50 and two direct memory access (DMA)engines SIMD engine 50 andDMA engines main processor pipeline 10. To achieve high performance, these individual engines need to be able operate in parallel, and hence, as discussed above, instruction queues (Q) are placed between themain processor core 10 and theSIMD engine 50, and between theSIMD 50 engine and theDMA engines - As discussed above, operating the main pipeline, extended pipeline and DMA engines in parallel introduces the problem of synchronization. For example, a sequence of SIMD code segment will have to wait for a DMA operation to finish transferring data into the SDM, which is kicked off by the instruction just preceding it. On the other hand, the DMA engine cannot start transferring data out of the SDM until the previously issued SIMD code has been executed. This type of synchronization is normally performed by using software to probe status bits toggled by these engines, or by using interrupts and their associated service routines to kick off the dependent processes. Both of these solutions require large overheads in terms of cycles as well as coding effort to achieve the synchronization desired.
- In order to reduce these overheads, in various embodiments of the invention, the
DMA engines SIMD pipeline 53 itself, but each DMA engine is allowed to buffer one or more instructions issued to it in a queue without stopping the SIMD pipeline execution. When the DMA engine instruction queue is full, theSIMD engine pipeline 53 will be blocked from executing further instructions only when another DMA instruction arrives at the DMA. This allows the software to be re-organized so that a SIMD code will have to wait for a DMA operation to complete, or vice versa, as long as a double or more buffering approach is used, that is, two or more buffers are used to allow overlapping of data transfer and data computation. - With continued reference to the processor architecture of
FIG. 1 , there are twoDMA engines - Referring to
FIG. 2 , this Figure illustrates an instruction sequence flow diagram 100 and correspondingevent time line 110 illustrating a method for synchronizing processing between DMA tasks and SIMD tasks, with only one deep instruction queues in each DMA engines, according to at least one embodiment of the invention. Looking at the instruction sequence flow diagram 100, the DI2 DMA operation is blocked if the buffered DI1 DMA operation is not completed, causing the DI2 DMA instruction to be blocked from entering the DMA instruction queue, which in turn results in the S1 SIMD operation being blocked. Since S1 operation depends on data from DI1 operation, the blocking action prevents the S1 SIMD instruction sequence from proceeding until the DI1 operation is completed. The DI3 DMA operation is executed only after S1 is completed. This eliminates any chance of DI3 overwriting the same data region targeted by the DI1 operation before the data is used by the computation S1. By the time DI3 has completed, the DI2 operation would have completed, allowing S2 to start. If however, the DI2 operation is not completed, the DI3 operation will be blocked, preventing S2 from starting. Likewise, the DO operation is only executed when S4 has completed. It should be appreciated that in thetimeline 110 ofFIG. 2 , DI2 and S1, DI3 and S2, and DI4 and S3 are shown as starting at the same time respectively. In actual operation, S1 will start one clock cycle after DI2, S2 will start one clock cycle after DI3, and S3 will start one clock cycle after DI4. The time line is intended to demonstrate that S1 cannot start before DI1 is complete, S2 can not start before DI2 is complete, S3 can not start before DI3 is complete, and S4 can not start before DI4 is complete. - This approach avoids the need of the main processor core from intervening continuously in order to achieve synchronization between the DMA unit and the SIMD pipeline. However, the
processor core 10 does need to ensure that the instruction sequence sent uses this functionality to achieve the best performance by parallelizing SIMD and DMA operations. Thus, an advantage of this approach is that it facilitates the synchronization of SIMD and DMA operations in a multi-engine video processing core with minimal interaction between the main control processor core. This approach can be extended by increasing the depth of the DMA non-blocking instruction queue so as to allow more DMA instructions to be buffered in the DMA channels, allowing double, triple or more buffering. - Referring now to
FIG. 3 , this Figure is a flow chart of an exemplary method for synchronizing multiple processing engines in a microprocessor-based system according to at least one embodiment of the invention.FIG. 3 demonstrates a method for coding the instruction sequence to allow both the SIMD engine and DMA engines to operate simultaneously as much as possible. The method begins instep 200 and proceeds to step 205 where an instruction requiring the DMA engine is executed by the SIMD pipeline. Instep 210, the SIMD pipeline accesses the required DMA engine queue. If instep 210, the DMA engine instruction queue is already full when it is accessed, the SIMD pipeline is paused from further execution, as described instep 215. In step 220, the SIMD waits for a free space in the instruction queue of the targeted DMA engine. In the meantime, the DMA engine corresponding to the target queue performs its current DMA operation instructed by the DMA instruction(s) already in the queue. After this operation is performed, the DMA engine instruction queue opens up a free space so that instep 225, the stalled DMA instruction can be buffered in the queue. The SIMD pipeline then resumes execution instep 230 after the DMA instruction has been buffered. Accordingly, through the various systems and methods disclosed herein, simultaneous operation of the SIMD pipeline and the DMA engines is maximized without the risk of overwrite. - The embodiments of the present inventions are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to systems and method for synchronizing multiple processing engines in a microprocessor-based system having a main instruction pipeline and an extended instruction pipeline, the principles herein are equally applicable to other aspects of microprocessor design and function. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although some of the embodiments of the present invention have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breath and spirit of the embodiments of the present inventions as disclosed herein.
Claims (8)
1. A method for synchronizing multiple processing engines of a microprocessor comprising:
coupling an extended instruction pipeline to a main instruction pipeline;
coupling direct memory access (DMA) engines to the extended instruction pipeline;
buffering at least one instruction in the DMA engine, using a queue, without stopping the extended instruction pipeline; and
blocking the extended instruction pipeline from further execution when a DMA engine instruction queue is full and a new DMA instruction arrives at the queue, until a current DMA operation is complete.
2. The method according to claim 1 , wherein coupling DMA engines to the extended instruction pipeline comprises coupling a DMA engine having a data-in channel for moving data into a local memory and a DMA engine having a data-out channel for moving data out of a local memory of the extended instruction pipeline.
3. The method according to claim 2 , wherein blocking the extended instruction pipeline from issuing subsequent instructions until a DMA operation is complete when any DMA engine instruction queue is full, and a new DMA instruction is being issued to it.
4. The method according to claim 1 , further comprising restarting execution of the extended instruction pipeline when the DMA operation is complete and the new DMA operation that was trying the enter the buffer successfully leaves the instruction pipeline and enters the instruction queue.
5. A multi-processing engine architecture for a microprocessor comprising:
a main instruction pipeline;
an extended instruction pipeline coupled to the main instruction pipeline via an instruction queue; and
direct memory access (DMA) engines coupled to the extended instruction pipeline, the DMA access engines comprising a data-in engine and a data-out engine, wherein each of the data-in and data-out engines comprise an instruction queue adapted to buffer at least one instruction.
6. The architecture according to claim 5 , wherein the DMA access engines are adapted to prevent the extended instruction pipeline from executing additional instructions when the DMA instruction queue is full and a new DMA instruction is blocked from entering the buffer, until a current DMA operation is completed allowing the blocked DMA operation to enter the buffer from the instruction pipeline.
7. The architecture according to claim 6 , wherein the DMA access engine is adapted to cause the extended instruction pipeline to resume execution once the DMA instruction trying to enter the DMA buffer that was blocked previously enters the DMA instruction buffer when the current DMA operation is completed.
8. In a microprocessor having a main instruction pipeline and processor extension logic comprising an extended instruction pipeline that is coupled to the main instruction pipeline via an instruction queue, wherein the extended instruction pipeline is adapted to be selectively decoupled from the main instruction pipeline to perform autonomous operation, and where the extended instruction pipeline is further coupled to DMA engines for moving data into and moving data out of a local memory, a method for maximizing simultaneous operation of the extended instruction pipeline and the DMA engines comprising:
executing an instruction from the extended instruction pipeline requiring the DMA engine;
buffering the instruction if sufficient buffer space is available in the DMA engine instruction queue; and
preventing the extended instruction pipeline from further execution if insufficient queue space is available until a current DMA operation is complete, freeing up a space in the queue to accept a blocked DMA instruction on the instruction pipeline, thereafter resuming execution of the extended processor pipeline.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/528,470 US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72110805P | 2005-09-28 | 2005-09-28 | |
US11/528,470 US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070073925A1 true US20070073925A1 (en) | 2007-03-29 |
Family
ID=37968194
Family Applications (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,327 Active 2029-03-30 US7747088B2 (en) | 2005-09-28 | 2006-09-28 | System and methods for performing deblocking in microprocessor-based video codec applications |
US11/528,434 Abandoned US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US11/528,325 Active 2030-06-13 US8212823B2 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US11/528,326 Abandoned US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
US11/528,432 Active 2031-04-08 US8218635B2 (en) | 2005-09-28 | 2006-09-28 | Systolic-array based systems and methods for performing block matching in motion compensation |
US11/528,338 Active 2027-04-24 US7971042B2 (en) | 2005-09-28 | 2006-09-28 | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US11/528,470 Abandoned US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
Family Applications Before (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,327 Active 2029-03-30 US7747088B2 (en) | 2005-09-28 | 2006-09-28 | System and methods for performing deblocking in microprocessor-based video codec applications |
US11/528,434 Abandoned US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US11/528,325 Active 2030-06-13 US8212823B2 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US11/528,326 Abandoned US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
US11/528,432 Active 2031-04-08 US8218635B2 (en) | 2005-09-28 | 2006-09-28 | Systolic-array based systems and methods for performing block matching in motion compensation |
US11/528,338 Active 2027-04-24 US7971042B2 (en) | 2005-09-28 | 2006-09-28 | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
Country Status (2)
Country | Link |
---|---|
US (7) | US7747088B2 (en) |
WO (1) | WO2007049150A2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US20090063827A1 (en) * | 2007-08-28 | 2009-03-05 | Shunichi Ishiwata | Parallel processor and arithmetic method of the same |
WO2010017263A1 (en) * | 2008-08-06 | 2010-02-11 | Sandbridge Technologies, Inc. | Haltable and restartable dma engine |
US20100180100A1 (en) * | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
CN102346769A (en) * | 2011-09-20 | 2012-02-08 | 奇智软件(北京)有限公司 | Method and device for consolidating registry file |
US20150100767A1 (en) * | 2013-10-03 | 2015-04-09 | Synopsys, Inc. | Self-timed user-extension instructions for a processing device |
US9015397B2 (en) | 2012-11-29 | 2015-04-21 | Sandisk Technologies Inc. | Method and apparatus for DMA transfer with synchronization optimization |
US9715464B2 (en) | 2015-03-27 | 2017-07-25 | Microsoft Technology Licensing, Llc | Direct memory access descriptor processing |
CN109416630A (en) * | 2016-07-22 | 2019-03-01 | 英特尔公司 | The technology of self-adaptive processing for multiple buffers |
CN113312088A (en) * | 2021-06-29 | 2021-08-27 | 北京熵核科技有限公司 | Method and device for executing program instruction |
EP3994573A4 (en) * | 2019-07-03 | 2022-08-10 | Huaxia General Processor Technologies Inc. | System and architecture of pure functional neural network accelerator |
Families Citing this family (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9330060B1 (en) | 2003-04-15 | 2016-05-03 | Nvidia Corporation | Method and device for encoding and decoding video image data |
US8660182B2 (en) | 2003-06-09 | 2014-02-25 | Nvidia Corporation | MPEG motion estimation based on dual start points |
TWI239474B (en) * | 2004-07-28 | 2005-09-11 | Novatek Microelectronics Corp | Circuit for counting sum of absolute difference |
TWI295540B (en) * | 2005-06-15 | 2008-04-01 | Novatek Microelectronics Corp | Motion estimation circuit and operating method thereof |
TWI296091B (en) * | 2005-11-15 | 2008-04-21 | Novatek Microelectronics Corp | Motion estimation circuit and motion estimation processing element |
US8731071B1 (en) | 2005-12-15 | 2014-05-20 | Nvidia Corporation | System for performing finite input response (FIR) filtering in motion estimation |
US20070217515A1 (en) * | 2006-03-15 | 2007-09-20 | Yu-Jen Wang | Method for determining a search pattern for motion estimation |
US8724702B1 (en) | 2006-03-29 | 2014-05-13 | Nvidia Corporation | Methods and systems for motion estimation used in video coding |
US8660380B2 (en) | 2006-08-25 | 2014-02-25 | Nvidia Corporation | Method and system for performing two-dimensional transform on data value array with reduced power consumption |
US9094686B2 (en) * | 2006-09-06 | 2015-07-28 | Broadcom Corporation | Systems and methods for faster throughput for compressed video data decoding |
KR101354659B1 (en) * | 2006-11-08 | 2014-01-28 | 삼성전자주식회사 | Method and apparatus for motion compensation supporting multicodec |
US7958177B2 (en) * | 2006-11-29 | 2011-06-07 | Arcsoft, Inc. | Method of parallelly filtering input data words to obtain final output data words containing packed half-pel pixels |
US8756482B2 (en) | 2007-05-25 | 2014-06-17 | Nvidia Corporation | Efficient encoding/decoding of a sequence of data frames |
US9118927B2 (en) * | 2007-06-13 | 2015-08-25 | Nvidia Corporation | Sub-pixel interpolation and its application in motion compensated encoding of a video signal |
US8873625B2 (en) | 2007-07-18 | 2014-10-28 | Nvidia Corporation | Enhanced compression in representing non-frame-edge blocks of image frames |
US8634470B2 (en) * | 2007-07-24 | 2014-01-21 | Samsung Electronics Co., Ltd. | Multimedia decoding method and multimedia decoding apparatus based on multi-core processor |
JP5159258B2 (en) * | 2007-11-06 | 2013-03-06 | 株式会社東芝 | Arithmetic processing unit |
US8437410B1 (en) | 2007-11-21 | 2013-05-07 | Marvell International Ltd. | System and method to execute a clipping instruction |
US20090188521A1 (en) * | 2008-01-17 | 2009-07-30 | Evazynajad Ali M | Dental Floss Formed from Botanic and Botanically Derived Fiber |
US8250578B2 (en) * | 2008-02-22 | 2012-08-21 | International Business Machines Corporation | Pipelining hardware accelerators to computer systems |
US8726289B2 (en) * | 2008-02-22 | 2014-05-13 | International Business Machines Corporation | Streaming attachment of hardware accelerators to computer systems |
US7953912B2 (en) * | 2008-02-22 | 2011-05-31 | International Business Machines Corporation | Guided attachment of accelerators to computer systems |
US8386547B2 (en) | 2008-10-31 | 2013-02-26 | Intel Corporation | Instruction and logic for performing range detection |
US9179166B2 (en) * | 2008-12-05 | 2015-11-03 | Nvidia Corporation | Multi-protocol deblock engine core system and method |
US8666181B2 (en) | 2008-12-10 | 2014-03-04 | Nvidia Corporation | Adaptive multiple engine image motion detection system and method |
CN102055969B (en) * | 2009-10-30 | 2012-12-19 | 鸿富锦精密工业(深圳)有限公司 | Image deblocking filter and image processing device using same |
US9390539B2 (en) * | 2009-11-04 | 2016-07-12 | Intel Corporation | Performing parallel shading operations |
WO2012134532A1 (en) | 2011-04-01 | 2012-10-04 | Intel Corporation | Vector friendly instruction format and execution thereof |
TWI449433B (en) * | 2011-08-01 | 2014-08-11 | Novatek Microelectronics Corp | Image processing circuit and image processing method |
US9389861B2 (en) * | 2011-12-22 | 2016-07-12 | Intel Corporation | Systems, apparatuses, and methods for mapping a source operand to a different range |
US10157061B2 (en) * | 2011-12-22 | 2018-12-18 | Intel Corporation | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
WO2013095605A1 (en) * | 2011-12-23 | 2013-06-27 | Intel Corporation | Apparatus and method for sliding window data gather |
US9152424B2 (en) * | 2012-06-14 | 2015-10-06 | International Business Machines Corporation | Mitigating instruction prediction latency with independently filtered presence predictors |
US9241163B2 (en) * | 2013-03-15 | 2016-01-19 | Intersil Americas LLC | VC-2 decoding using parallel decoding paths |
US11228769B2 (en) | 2013-06-03 | 2022-01-18 | Texas Instruments Incorporated | Multi-threading in a video hardware engine |
US9330022B2 (en) * | 2013-06-25 | 2016-05-03 | Intel Corporation | Power logic for memory address conversion |
JP6262621B2 (en) * | 2013-09-25 | 2018-01-17 | 株式会社メガチップス | Image enlargement / reduction processing apparatus and image enlargement / reduction processing method |
GB2524063B (en) | 2014-03-13 | 2020-07-01 | Advanced Risc Mach Ltd | Data processing apparatus for executing an access instruction for N threads |
US20160125263A1 (en) * | 2014-11-03 | 2016-05-05 | Texas Instruments Incorporated | Method to compute sliding window block sum using instruction based selective horizontal addition in vector processor |
KR102332523B1 (en) * | 2014-12-24 | 2021-11-29 | 삼성전자주식회사 | Apparatus and method for execution processing |
US10108581B1 (en) * | 2017-04-03 | 2018-10-23 | Google Llc | Vector reduction processor |
GB2563384B (en) | 2017-06-07 | 2019-12-25 | Advanced Risc Mach Ltd | Programmable instruction buffering |
US10437740B2 (en) * | 2017-12-15 | 2019-10-08 | Exten Technologies, Inc. | High performance raid operations offload with minimized local buffering |
CA3113538A1 (en) * | 2018-09-24 | 2020-04-02 | Huawei Technologies Co., Ltd. | Image processing device and method for performing quality optimized deblocking |
US11099973B2 (en) * | 2019-01-28 | 2021-08-24 | Salesforce.Com, Inc. | Automated test case management systems and methods |
KR20220015680A (en) | 2020-07-31 | 2022-02-08 | 삼성전자주식회사 | Method and apparatus for performing deep learning operations |
US11880231B2 (en) * | 2020-12-14 | 2024-01-23 | Microsoft Technology Licensing, Llc | Accurate timestamp or derived counter value generation on a complex CPU |
US11567775B1 (en) * | 2021-10-25 | 2023-01-31 | Sap Se | Dynamic generation of logic for computing systems |
WO2023235004A1 (en) * | 2022-06-02 | 2023-12-07 | Micron Technology, Inc. | Time-division multiplexed simd function unit |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884057A (en) * | 1994-01-11 | 1999-03-16 | Exponential Technology, Inc. | Temporal re-alignment of a floating point pipeline to an integer pipeline for emulation of a load-operate architecture on a load/store processor |
US5923892A (en) * | 1997-10-27 | 1999-07-13 | Levy; Paul S. | Host processor and coprocessor arrangement for processing platform-independent code |
US6757019B1 (en) * | 1999-03-13 | 2004-06-29 | The Board Of Trustees Of The Leland Stanford Junior University | Low-power parallel processor and imager having peripheral control circuitry |
US6865663B2 (en) * | 2000-02-24 | 2005-03-08 | Pts Corporation | Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode |
US6950929B2 (en) * | 2001-05-24 | 2005-09-27 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device having a coprocessor |
US20060047934A1 (en) * | 2004-08-31 | 2006-03-02 | Schmisseur Mark A | Integrated circuit capable of memory access control |
US7079147B2 (en) * | 2003-05-14 | 2006-07-18 | Lsi Logic Corporation | System and method for cooperative operation of a processor and coprocessor |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
Family Cites Families (213)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4594659A (en) * | 1982-10-13 | 1986-06-10 | Honeywell Information Systems Inc. | Method and apparatus for prefetching instructions for a central execution pipeline unit |
JPS63225822A (en) * | 1986-08-11 | 1988-09-20 | Toshiba Corp | Barrel shifter |
US4905178A (en) * | 1986-09-19 | 1990-02-27 | Performance Semiconductor Corporation | Fast shifter method and structure |
JPS6398729A (en) * | 1986-10-15 | 1988-04-30 | Fujitsu Ltd | Barrel shifter |
US4914622A (en) * | 1987-04-17 | 1990-04-03 | Advanced Micro Devices, Inc. | Array-organized bit map with a barrel shifter |
DE3889812T2 (en) | 1987-08-28 | 1994-12-15 | Nec Corp | Data processor with a test structure for multi-position shifters. |
KR970005453B1 (en) | 1987-12-25 | 1997-04-16 | 가부시기가이샤 히다찌세이사꾸쇼 | Data processing apparatus for high speed processing |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
JPH01263820A (en) * | 1988-04-15 | 1989-10-20 | Hitachi Ltd | Microprocessor |
EP0344347B1 (en) | 1988-06-02 | 1993-12-29 | Deutsche ITT Industries GmbH | Digital signal processing unit |
GB2229832B (en) | 1989-03-30 | 1993-04-07 | Intel Corp | Byte swap instruction for memory format conversion within a microprocessor |
JPH03185530A (en) | 1989-12-14 | 1991-08-13 | Mitsubishi Electric Corp | Data processor |
DE69030648T2 (en) * | 1990-01-02 | 1997-11-13 | Motorola Inc | Method for sequential prefetching of 1-word, 2-word or 3-word instructions |
JPH03248226A (en) * | 1990-02-26 | 1991-11-06 | Nec Corp | Microprocessor |
JP2560889B2 (en) | 1990-05-22 | 1996-12-04 | 日本電気株式会社 | Microprocessor |
EP0463973A3 (en) * | 1990-06-29 | 1993-12-01 | Digital Equipment Corp | Branch prediction in high performance processor |
US5778423A (en) | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
US5155843A (en) | 1990-06-29 | 1992-10-13 | Digital Equipment Corporation | Error transition mode for multi-processor system |
JP2556612B2 (en) | 1990-08-29 | 1996-11-20 | 日本電気アイシーマイコンシステム株式会社 | Barrel shifter circuit |
US5636363A (en) * | 1991-06-14 | 1997-06-03 | Integrated Device Technology, Inc. | Hardware control structure and method for off-chip monitoring entries of an on-chip cache |
US5493687A (en) * | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5539911A (en) | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US5450586A (en) | 1991-08-14 | 1995-09-12 | Hewlett-Packard Company | System for analyzing and debugging embedded software through dynamic and interactive use of code markers |
US5283874A (en) * | 1991-10-21 | 1994-02-01 | Intel Corporation | Cross coupling mechanisms for simultaneously completing consecutive pipeline instructions even if they begin to process at the same microprocessor of the issue fee |
CA2073516A1 (en) | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Dynamic multi-mode parallel processor array architecture computer system |
FR2690299B1 (en) * | 1992-04-17 | 1994-06-17 | Telecommunications Sa | METHOD AND DEVICE FOR SPATIAL FILTERING OF DIGITAL IMAGES DECODED BY BLOCK TRANSFORMATION. |
US5423011A (en) * | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5542074A (en) | 1992-10-22 | 1996-07-30 | Maspar Computer Corporation | Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount |
US5696958A (en) | 1993-01-11 | 1997-12-09 | Silicon Graphics, Inc. | Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline |
GB2275119B (en) | 1993-02-03 | 1997-05-14 | Motorola Inc | A cached processor |
US5937202A (en) | 1993-02-11 | 1999-08-10 | 3-D Computing, Inc. | High-speed, parallel, processor architecture for front-end electronics, based on a single type of ASIC, and method use thereof |
US5454117A (en) | 1993-08-25 | 1995-09-26 | Nexgen, Inc. | Configurable branch prediction for a processor performing speculative execution |
JP2801135B2 (en) * | 1993-11-26 | 1998-09-21 | 富士通株式会社 | Instruction reading method and instruction reading device for pipeline processor |
US5590350A (en) * | 1993-11-30 | 1996-12-31 | Texas Instruments Incorporated | Three input arithmetic logic unit with mask generator |
US5509129A (en) * | 1993-11-30 | 1996-04-16 | Guttag; Karl M. | Long instruction word controlling plural independent processor operations |
US6116768A (en) | 1993-11-30 | 2000-09-12 | Texas Instruments Incorporated | Three input arithmetic logic unit with barrel rotator |
US5590351A (en) | 1994-01-21 | 1996-12-31 | Advanced Micro Devices, Inc. | Superscalar execution unit for sequential instruction pointer updates and segment limit checks |
JPH07253922A (en) * | 1994-03-14 | 1995-10-03 | Texas Instr Japan Ltd | Address generating circuit |
US5530825A (en) * | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5517436A (en) * | 1994-06-07 | 1996-05-14 | Andreas; David C. | Digital signal processor for audio applications |
BR9508403A (en) * | 1994-07-14 | 1997-11-11 | Johnson Grace Company | Method and apparatus for image compression |
US5809293A (en) | 1994-07-29 | 1998-09-15 | International Business Machines Corporation | System and method for program execution tracing within an integrated processor |
US5692168A (en) | 1994-10-18 | 1997-11-25 | Cyrix Corporation | Prefetch buffer using flow control bit to identify changes of flow within the code stream |
US5600674A (en) | 1995-03-02 | 1997-02-04 | Motorola Inc. | Method and apparatus of an enhanced digital signal processor |
US5655122A (en) | 1995-04-05 | 1997-08-05 | Sequent Computer Systems, Inc. | Optimizing compiler with static prediction of branch probability, branch frequency and function frequency |
US5835753A (en) | 1995-04-12 | 1998-11-10 | Advanced Micro Devices, Inc. | Microprocessor with dynamically extendable pipeline stages and a classifying circuit |
US5920711A (en) | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US5842004A (en) | 1995-08-04 | 1998-11-24 | Sun Microsystems, Inc. | Method and apparatus for decompression of compressed geometric three-dimensional graphics data |
US6292879B1 (en) | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US5727211A (en) * | 1995-11-09 | 1998-03-10 | Chromatic Research, Inc. | System and method for fast context switching between tasks |
US5996071A (en) | 1995-12-15 | 1999-11-30 | Via-Cyrix, Inc. | Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address |
US5896305A (en) * | 1996-02-08 | 1999-04-20 | Texas Instruments Incorporated | Shifter circuit for an arithmetic logic unit in a microprocessor |
US5752014A (en) * | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction |
US5784636A (en) | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US20010025337A1 (en) | 1996-06-10 | 2001-09-27 | Frank Worrell | Microprocessor including a mode detector for setting compression mode |
US5805876A (en) | 1996-09-30 | 1998-09-08 | International Business Machines Corporation | Method and system for reducing average branch resolution time and effective misprediction penalty in a processor |
US5964884A (en) | 1996-09-30 | 1999-10-12 | Advanced Micro Devices, Inc. | Self-timed pulse control circuit |
US5848264A (en) | 1996-10-25 | 1998-12-08 | S3 Incorporated | Debug and video queue for multi-processor chip |
US5909572A (en) | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
US6061521A (en) * | 1996-12-02 | 2000-05-09 | Compaq Computer Corp. | Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle |
US5909566A (en) * | 1996-12-31 | 1999-06-01 | Texas Instruments Incorporated | Microprocessor circuits, systems, and methods for speculatively executing an instruction using its most recently used data while concurrently prefetching data for the instruction |
KR100236533B1 (en) * | 1997-01-16 | 2000-01-15 | 윤종용 | Digital signal processor |
US6154857A (en) | 1997-04-08 | 2000-11-28 | Advanced Micro Devices, Inc. | Microprocessor-based device incorporating a cache for capturing software performance profiling data |
US6185732B1 (en) * | 1997-04-08 | 2001-02-06 | Advanced Micro Devices, Inc. | Software debug port for a microprocessor |
US6088786A (en) | 1997-06-27 | 2000-07-11 | Sun Microsystems, Inc. | Method and system for coupling a stack based processor to register based functional unit |
US6760833B1 (en) | 1997-08-01 | 2004-07-06 | Micron Technology, Inc. | Split embedded DRAM processor |
US6226738B1 (en) * | 1997-08-01 | 2001-05-01 | Micron Technology, Inc. | Split embedded DRAM processor |
US6157988A (en) | 1997-08-01 | 2000-12-05 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
US6026478A (en) * | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
JPH1185515A (en) | 1997-09-10 | 1999-03-30 | Ricoh Co Ltd | Microprocessor |
US5978909A (en) | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US6044458A (en) * | 1997-12-12 | 2000-03-28 | Motorola, Inc. | System for monitoring program flow utilizing fixwords stored sequentially to opcodes |
US6014743A (en) * | 1998-02-05 | 2000-01-11 | Intergrated Device Technology, Inc. | Apparatus and method for recording a floating point error pointer in zero cycles |
US6151672A (en) * | 1998-02-23 | 2000-11-21 | Hewlett-Packard Company | Methods and apparatus for reducing interference in a branch history table of a microprocessor |
US6374349B2 (en) | 1998-03-19 | 2002-04-16 | Mcfarling Scott | Branch predictor with serially connected predictor stages for improving branch prediction accuracy |
US6377970B1 (en) * | 1998-03-31 | 2002-04-23 | Intel Corporation | Method and apparatus for computing a sum of packed data elements using SIMD multiply circuitry |
US6584585B1 (en) | 1998-05-08 | 2003-06-24 | Gateway, Inc. | Virtual device driver and methods employing the same |
US6289417B1 (en) | 1998-05-18 | 2001-09-11 | Arm Limited | Operand supply to an execution unit |
US6466333B2 (en) | 1998-06-26 | 2002-10-15 | Canon Kabushiki Kaisha | Streamlined tetrahedral interpolation |
US20020053015A1 (en) * | 1998-07-14 | 2002-05-02 | Sony Corporation And Sony Electronics Inc. | Digital signal processor particularly suited for decoding digital audio |
US6327651B1 (en) | 1998-09-08 | 2001-12-04 | International Business Machines Corporation | Wide shifting in the vector permute unit |
US6253287B1 (en) * | 1998-09-09 | 2001-06-26 | Advanced Micro Devices, Inc. | Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions |
US6339822B1 (en) * | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6671743B1 (en) | 1998-11-13 | 2003-12-30 | Creative Technology, Ltd. | Method and system for exposing proprietary APIs in a privileged device driver to an application |
US6529930B1 (en) * | 1998-11-16 | 2003-03-04 | Hitachi America, Ltd. | Methods and apparatus for performing a signed saturation operation |
US6189091B1 (en) * | 1998-12-02 | 2001-02-13 | Ip First, L.L.C. | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection |
US6341348B1 (en) | 1998-12-03 | 2002-01-22 | Sun Microsystems, Inc. | Software branch prediction filtering for a microprocessor |
US6957327B1 (en) * | 1998-12-31 | 2005-10-18 | Stmicroelectronics, Inc. | Block-based branch target buffer |
US6477683B1 (en) | 1999-02-05 | 2002-11-05 | Tensilica, Inc. | Automated processor generation system for designing a configurable processor and method for the same |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6499101B1 (en) * | 1999-03-18 | 2002-12-24 | I.P. First L.L.C. | Static branch prediction mechanism for conditional branch instructions |
US6427206B1 (en) | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
US6560754B1 (en) * | 1999-05-13 | 2003-05-06 | Arc International Plc | Method and apparatus for jump control in a pipelined processor |
US6622240B1 (en) | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6518974B2 (en) * | 1999-07-16 | 2003-02-11 | Intel Corporation | Pixel engine |
JP2001034504A (en) * | 1999-07-19 | 2001-02-09 | Mitsubishi Electric Corp | Source level debugger |
US6772325B1 (en) | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6546481B1 (en) | 1999-11-05 | 2003-04-08 | Ip - First Llc | Split history tables for branch prediction |
US7072398B2 (en) * | 2000-12-06 | 2006-07-04 | Kai-Kuang Ma | System and method for motion vector generation and analysis of digital video clips |
US6609194B1 (en) | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US6909744B2 (en) | 1999-12-09 | 2005-06-21 | Redrock Semiconductor, Inc. | Processor architecture for compression and decompression of video and images |
KR100395763B1 (en) | 2000-02-01 | 2003-08-25 | 삼성전자주식회사 | A branch predictor for microprocessor having multiple processes |
US6412038B1 (en) | 2000-02-14 | 2002-06-25 | Intel Corporation | Integral modular cache for a processor |
US6629167B1 (en) * | 2000-02-18 | 2003-09-30 | Hewlett-Packard Development Company, L.P. | Pipeline decoupling buffer for handling early data and late data |
US6519696B1 (en) * | 2000-03-30 | 2003-02-11 | I.P. First, Llc | Paired register exchange using renaming register map |
US6876703B2 (en) | 2000-05-11 | 2005-04-05 | Ub Video Inc. | Method and apparatus for video coding |
US7079579B2 (en) * | 2000-07-13 | 2006-07-18 | Samsung Electronics Co., Ltd. | Block matching processor and method for block matching motion estimation in video compression |
US6681295B1 (en) * | 2000-08-31 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Fast lane prefetching |
US6718460B1 (en) * | 2000-09-05 | 2004-04-06 | Sun Microsystems, Inc. | Mechanism for error handling in a computer system |
US20020065860A1 (en) * | 2000-10-04 | 2002-05-30 | Grisenthwaite Richard Roy | Data processing apparatus and method for saturating data values |
US20030070013A1 (en) * | 2000-10-27 | 2003-04-10 | Daniel Hansson | Method and apparatus for reducing power consumption in a digital processor |
US6948054B2 (en) * | 2000-11-29 | 2005-09-20 | Lsi Logic Corporation | Simple branch prediction and misprediction recovery method |
KR100386639B1 (en) * | 2000-12-04 | 2003-06-02 | 주식회사 오픈비주얼 | Method for decompression of images and video using regularized dequantizer |
TW477954B (en) * | 2000-12-05 | 2002-03-01 | Faraday Tech Corp | Memory data accessing architecture and method for a processor |
US20020073301A1 (en) * | 2000-12-07 | 2002-06-13 | International Business Machines Corporation | Hardware for use with compiler generated branch information |
US7139903B2 (en) | 2000-12-19 | 2006-11-21 | Hewlett-Packard Development Company, L.P. | Conflict free parallel read access to a bank interleaved branch predictor in a processor |
US6963554B1 (en) | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US20020087851A1 (en) | 2000-12-28 | 2002-07-04 | Matsushita Electric Industrial Co., Ltd. | Microprocessor and an instruction converter |
US8285976B2 (en) | 2000-12-28 | 2012-10-09 | Micron Technology, Inc. | Method and apparatus for predicting branches using a meta predictor |
US6925634B2 (en) | 2001-01-24 | 2005-08-02 | Texas Instruments Incorporated | Method for maintaining cache coherency in software in a shared memory system |
US7039901B2 (en) | 2001-01-24 | 2006-05-02 | Texas Instruments Incorporated | Software shared memory bus |
US6823447B2 (en) | 2001-03-01 | 2004-11-23 | International Business Machines Corporation | Software hint to improve the branch target prediction accuracy |
AU2002240742A1 (en) | 2001-03-02 | 2002-09-19 | Astana Semiconductor Corp. | Apparatus for variable word length computing in an array processor |
JP3890910B2 (en) | 2001-03-21 | 2007-03-07 | 株式会社日立製作所 | Instruction execution result prediction device |
US7010558B2 (en) * | 2001-04-19 | 2006-03-07 | Arc International | Data processor with enhanced instruction execution and method |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US7165168B2 (en) | 2003-01-14 | 2007-01-16 | Ip-First, Llc | Microprocessor with branch target address cache update queue |
US20020194462A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line |
US7165169B2 (en) | 2001-05-04 | 2007-01-16 | Ip-First, Llc | Speculative branch target address cache with selective override by secondary predictor based on branch instruction type |
US20020194461A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache |
US6886093B2 (en) * | 2001-05-04 | 2005-04-26 | Ip-First, Llc | Speculative hybrid branch direction predictor |
GB0112275D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for normalization of floating point significands in a simd array mpp |
GB0112269D0 (en) * | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for alignment of floating point significands in a simd array mpp |
WO2003003195A1 (en) | 2001-06-29 | 2003-01-09 | Koninklijke Philips Electronics N.V. | Method, apparatus and compiler for predicting indirect branch target addresses |
US6823444B1 (en) | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US7162619B2 (en) * | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
JP4145586B2 (en) * | 2001-07-24 | 2008-09-03 | セイコーエプソン株式会社 | Image processing apparatus, image processing program, and image processing method |
US7010675B2 (en) * | 2001-07-27 | 2006-03-07 | Stmicroelectronics, Inc. | Fetch branch architecture for reducing branch penalty without branch prediction |
US7191445B2 (en) * | 2001-08-31 | 2007-03-13 | Texas Instruments Incorporated | Method using embedded real-time analysis components with corresponding real-time operating system software objects |
JP2003131902A (en) | 2001-10-24 | 2003-05-09 | Toshiba Corp | Software debugger, system-level debugger, debug method and debug program |
US7685212B2 (en) | 2001-10-29 | 2010-03-23 | Intel Corporation | Fast full search motion estimation with SIMD merge instruction |
US20040054877A1 (en) * | 2001-10-29 | 2004-03-18 | Macy William W. | Method and apparatus for shuffling data |
US7272622B2 (en) | 2001-10-29 | 2007-09-18 | Intel Corporation | Method and apparatus for parallel shift right merge of data |
US7051239B2 (en) | 2001-12-28 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip |
EP1470476A4 (en) | 2002-01-31 | 2007-05-30 | Arc Int | Configurable data processor with multi-length instruction set architecture |
US7168067B2 (en) | 2002-02-08 | 2007-01-23 | Agere Systems Inc. | Multiprocessor system with cache-based software breakpoints |
US7529912B2 (en) | 2002-02-12 | 2009-05-05 | Via Technologies, Inc. | Apparatus and method for instruction-level specification of floating point format |
US7181596B2 (en) * | 2002-02-12 | 2007-02-20 | Ip-First, Llc | Apparatus and method for extending a microprocessor instruction set |
US7328328B2 (en) | 2002-02-19 | 2008-02-05 | Ip-First, Llc | Non-temporal memory reference control mechanism |
US7315921B2 (en) | 2002-02-19 | 2008-01-01 | Ip-First, Llc | Apparatus and method for selective memory attribute control |
US7395412B2 (en) | 2002-03-08 | 2008-07-01 | Ip-First, Llc | Apparatus and method for extending data modes in a microprocessor |
US7546446B2 (en) | 2002-03-08 | 2009-06-09 | Ip-First, Llc | Selective interrupt suppression |
US7180943B1 (en) * | 2002-03-26 | 2007-02-20 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Compression of a data stream by selection among a set of compression tools |
US7185180B2 (en) | 2002-04-02 | 2007-02-27 | Ip-First, Llc | Apparatus and method for selective control of condition code write back |
US7380103B2 (en) | 2002-04-02 | 2008-05-27 | Ip-First, Llc | Apparatus and method for selective control of results write back |
US7373483B2 (en) | 2002-04-02 | 2008-05-13 | Ip-First, Llc | Mechanism for extending the number of registers in a microprocessor |
US7155598B2 (en) | 2002-04-02 | 2006-12-26 | Ip-First, Llc | Apparatus and method for conditional instruction execution |
US7302551B2 (en) | 2002-04-02 | 2007-11-27 | Ip-First, Llc | Suppression of store checking |
US20030198295A1 (en) * | 2002-04-12 | 2003-10-23 | Liang-Gee Chen | Global elimination algorithm for motion estimation and the hardware architecture thereof |
US7380109B2 (en) | 2002-04-15 | 2008-05-27 | Ip-First, Llc | Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor |
US20030204705A1 (en) | 2002-04-30 | 2003-10-30 | Oldfield William H. | Prediction of branch instructions in a data processing apparatus |
KR100450753B1 (en) | 2002-05-17 | 2004-10-01 | 한국전자통신연구원 | Programmable variable length decoder including interface of CPU processor |
US6938151B2 (en) | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US6718504B1 (en) | 2002-06-05 | 2004-04-06 | Arc International | Method and apparatus for implementing a data processor adapted for turbo decoding |
US7493480B2 (en) * | 2002-07-18 | 2009-02-17 | International Business Machines Corporation | Method and apparatus for prefetching branch history information |
US7392368B2 (en) * | 2002-08-09 | 2008-06-24 | Marvell International Ltd. | Cross multiply and add instruction and multiply and subtract instruction SIMD execution on real and imaginary components of a plurality of complex data elements |
US7000095B2 (en) * | 2002-09-06 | 2006-02-14 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
AU2003279015A1 (en) * | 2002-09-27 | 2004-04-19 | Videosoft, Inc. | Real-time video coding/decoding |
US20050125634A1 (en) | 2002-10-04 | 2005-06-09 | Fujitsu Limited | Processor and instruction control method |
US6968444B1 (en) | 2002-11-04 | 2005-11-22 | Advanced Micro Devices, Inc. | Microprocessor employing a fixed position dispatch unit |
US8667252B2 (en) * | 2002-11-21 | 2014-03-04 | Stmicroelectronics, Inc. | Method and apparatus to adapt the clock rate of a programmable coprocessor for optimal performance and power dissipation |
US7227901B2 (en) * | 2002-11-21 | 2007-06-05 | Ub Video Inc. | Low-complexity deblocking filter |
US7266676B2 (en) | 2003-03-21 | 2007-09-04 | Analog Devices, Inc. | Method and apparatus for branch prediction based on branch targets utilizing tag and data arrays |
US6774832B1 (en) | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US20040193855A1 (en) | 2003-03-31 | 2004-09-30 | Nicolas Kacevas | System and method for branch prediction access |
US7590829B2 (en) | 2003-03-31 | 2009-09-15 | Stretch, Inc. | Extension adapter |
US7174444B2 (en) | 2003-03-31 | 2007-02-06 | Intel Corporation | Preventing a read of a next sequential chunk in branch prediction of a subject chunk |
US20040225870A1 (en) | 2003-05-07 | 2004-11-11 | Srinivasan Srikanth T. | Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor |
US7010676B2 (en) | 2003-05-12 | 2006-03-07 | International Business Machines Corporation | Last iteration loop branch prediction upon counter threshold and resolution upon counter one |
US20040252766A1 (en) * | 2003-06-11 | 2004-12-16 | Daeyang Foundation (Sejong University) | Motion vector search method and apparatus |
US20040255104A1 (en) | 2003-06-12 | 2004-12-16 | Intel Corporation | Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor |
US7668897B2 (en) | 2003-06-16 | 2010-02-23 | Arm Limited | Result partitioning within SIMD data processing systems |
US7539714B2 (en) * | 2003-06-30 | 2009-05-26 | Intel Corporation | Method, apparatus, and instruction for performing a sign operation that multiplies |
US7424501B2 (en) | 2003-06-30 | 2008-09-09 | Intel Corporation | Nonlinear filtering and deblocking applications utilizing SIMD sign and absolute value operations |
US7783871B2 (en) | 2003-06-30 | 2010-08-24 | Intel Corporation | Method to remove stale branch predictions for an instruction prior to execution within a microprocessor |
US7373642B2 (en) | 2003-07-29 | 2008-05-13 | Stretch, Inc. | Defining instruction extensions in a standard programming language |
US20050024486A1 (en) | 2003-07-31 | 2005-02-03 | Viresh Ratnakar | Video codec system with real-time complexity adaptation |
US20050027974A1 (en) * | 2003-07-31 | 2005-02-03 | Oded Lempel | Method and system for conserving resources in an instruction pipeline |
US7133950B2 (en) * | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
JP2005078234A (en) * | 2003-08-29 | 2005-03-24 | Renesas Technology Corp | Information processor |
US7237098B2 (en) * | 2003-09-08 | 2007-06-26 | Ip-First, Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US20050066305A1 (en) * | 2003-09-22 | 2005-03-24 | Lisanke Robert John | Method and machine for efficient simulation of digital hardware within a software development environment |
US7277592B1 (en) | 2003-10-21 | 2007-10-02 | Redrock Semiconductory Ltd. | Spacial deblocking method using limited edge differences only to linearly correct blocking artifact |
KR100980076B1 (en) * | 2003-10-24 | 2010-09-06 | 삼성전자주식회사 | System and method for branch prediction with low-power consumption |
US7457362B2 (en) * | 2003-10-24 | 2008-11-25 | Texas Instruments Incorporated | Loop deblock filtering of block coded video in a very long instruction word processor |
US7363544B2 (en) * | 2003-10-30 | 2008-04-22 | International Business Machines Corporation | Program debug method and apparatus |
US8069336B2 (en) | 2003-12-03 | 2011-11-29 | Globalfoundries Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US7219207B2 (en) | 2003-12-03 | 2007-05-15 | Intel Corporation | Reconfigurable trace cache |
US7401328B2 (en) | 2003-12-18 | 2008-07-15 | Lsi Corporation | Software-implemented grouping techniques for use in a superscalar data processing system |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US8607209B2 (en) | 2004-02-04 | 2013-12-10 | Bluerisc Inc. | Energy-focused compiler-assisted branch prediction |
US7613911B2 (en) | 2004-03-12 | 2009-11-03 | Arm Limited | Prefetching exception vectors by early lookup exception vectors within a cache memory |
US20050216713A1 (en) | 2004-03-25 | 2005-09-29 | International Business Machines Corporation | Instruction text controlled selectively stated branches for prediction via a branch target buffer |
US7281120B2 (en) | 2004-03-26 | 2007-10-09 | International Business Machines Corporation | Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor |
US20050223202A1 (en) | 2004-03-31 | 2005-10-06 | Intel Corporation | Branch prediction in a pipelined processor |
CN101002169A (en) | 2004-05-19 | 2007-07-18 | Arc国际(英国)公司 | Microprocessor architecture |
US20060015706A1 (en) * | 2004-06-30 | 2006-01-19 | Chunrong Lai | TLB correlated branch predictor and method for use thereof |
TWI253024B (en) * | 2004-07-20 | 2006-04-11 | Realtek Semiconductor Corp | Method and apparatus for block matching |
TWI305323B (en) * | 2004-08-23 | 2009-01-11 | Faraday Tech Corp | Method for verification branch prediction mechanisms and readable recording medium for storing program thereof |
US20060095713A1 (en) * | 2004-11-03 | 2006-05-04 | Stexar Corporation | Clip-and-pack instruction for processor |
WO2006096612A2 (en) | 2005-03-04 | 2006-09-14 | The Trustees Of Columbia University In The City Of New York | System and method for motion estimation and mode decision for low-complexity h.264 decoder |
EP2163097A2 (en) | 2007-05-25 | 2010-03-17 | Arc International, Plc | Adaptive video encoding apparatus and methods |
-
2006
- 2006-09-28 US US11/528,327 patent/US7747088B2/en active Active
- 2006-09-28 US US11/528,434 patent/US20070074004A1/en not_active Abandoned
- 2006-09-28 WO PCT/IB2006/003358 patent/WO2007049150A2/en active Application Filing
- 2006-09-28 US US11/528,325 patent/US8212823B2/en active Active
- 2006-09-28 US US11/528,326 patent/US20070074007A1/en not_active Abandoned
- 2006-09-28 US US11/528,432 patent/US8218635B2/en active Active
- 2006-09-28 US US11/528,338 patent/US7971042B2/en active Active
- 2006-09-28 US US11/528,470 patent/US20070073925A1/en not_active Abandoned
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884057A (en) * | 1994-01-11 | 1999-03-16 | Exponential Technology, Inc. | Temporal re-alignment of a floating point pipeline to an integer pipeline for emulation of a load-operate architecture on a load/store processor |
US5923892A (en) * | 1997-10-27 | 1999-07-13 | Levy; Paul S. | Host processor and coprocessor arrangement for processing platform-independent code |
US6757019B1 (en) * | 1999-03-13 | 2004-06-29 | The Board Of Trustees Of The Leland Stanford Junior University | Low-power parallel processor and imager having peripheral control circuitry |
US6865663B2 (en) * | 2000-02-24 | 2005-03-08 | Pts Corporation | Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode |
US6950929B2 (en) * | 2001-05-24 | 2005-09-27 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device having a coprocessor |
US7079147B2 (en) * | 2003-05-14 | 2006-07-18 | Lsi Logic Corporation | System and method for cooperative operation of a processor and coprocessor |
US20060047934A1 (en) * | 2004-08-31 | 2006-03-02 | Schmisseur Mark A | Integrated circuit capable of memory access control |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20070074012A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline |
US20070074004A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US20070074007A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Parameterizable clip instruction and method of performing a clip operation using the same |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US20070074012A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20070074004A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
US20070074007A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Parameterizable clip instruction and method of performing a clip operation using the same |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US8218635B2 (en) | 2005-09-28 | 2012-07-10 | Synopsys, Inc. | Systolic-array based systems and methods for performing block matching in motion compensation |
US7747088B2 (en) | 2005-09-28 | 2010-06-29 | Arc International (Uk) Limited | System and methods for performing deblocking in microprocessor-based video codec applications |
US8212823B2 (en) | 2005-09-28 | 2012-07-03 | Synopsys, Inc. | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20090063827A1 (en) * | 2007-08-28 | 2009-03-05 | Shunichi Ishiwata | Parallel processor and arithmetic method of the same |
WO2010017263A1 (en) * | 2008-08-06 | 2010-02-11 | Sandbridge Technologies, Inc. | Haltable and restartable dma engine |
US8732382B2 (en) | 2008-08-06 | 2014-05-20 | Qualcomm Incorporated | Haltable and restartable DMA engine |
US20100180100A1 (en) * | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
CN102346769A (en) * | 2011-09-20 | 2012-02-08 | 奇智软件(北京)有限公司 | Method and device for consolidating registry file |
US9015397B2 (en) | 2012-11-29 | 2015-04-21 | Sandisk Technologies Inc. | Method and apparatus for DMA transfer with synchronization optimization |
US9547493B2 (en) * | 2013-10-03 | 2017-01-17 | Synopsys, Inc. | Self-timed user-extension instructions for a processing device |
US20150100767A1 (en) * | 2013-10-03 | 2015-04-09 | Synopsys, Inc. | Self-timed user-extension instructions for a processing device |
US9715464B2 (en) | 2015-03-27 | 2017-07-25 | Microsoft Technology Licensing, Llc | Direct memory access descriptor processing |
US10528494B2 (en) | 2015-03-27 | 2020-01-07 | Microsoft Technology Licensing, Llc | Direct memory access (“DMA”) descriptor processing using identifiers assigned to descriptors on DMA engines |
US10572401B2 (en) | 2015-03-27 | 2020-02-25 | Microsoft Technology Licensing, Llc | Direct memory access descriptor processing using timestamps |
CN109416630A (en) * | 2016-07-22 | 2019-03-01 | 英特尔公司 | The technology of self-adaptive processing for multiple buffers |
EP3994573A4 (en) * | 2019-07-03 | 2022-08-10 | Huaxia General Processor Technologies Inc. | System and architecture of pure functional neural network accelerator |
CN113312088A (en) * | 2021-06-29 | 2021-08-27 | 北京熵核科技有限公司 | Method and device for executing program instruction |
Also Published As
Publication number | Publication date |
---|---|
US8218635B2 (en) | 2012-07-10 |
US7971042B2 (en) | 2011-06-28 |
US20070070080A1 (en) | 2007-03-29 |
US7747088B2 (en) | 2010-06-29 |
US20070074004A1 (en) | 2007-03-29 |
US20070071101A1 (en) | 2007-03-29 |
US20070074007A1 (en) | 2007-03-29 |
WO2007049150A3 (en) | 2007-12-27 |
WO2007049150A2 (en) | 2007-05-03 |
US8212823B2 (en) | 2012-07-03 |
US20070074012A1 (en) | 2007-03-29 |
US20070071106A1 (en) | 2007-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20070073925A1 (en) | Systems and methods for synchronizing multiple processing engines of a microprocessor | |
US6865663B2 (en) | Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode | |
US7366874B2 (en) | Apparatus and method for dispatching very long instruction word having variable length | |
TW448374B (en) | Processing circuit and method for variable-length coding and decoding | |
US6757820B2 (en) | Decompression bit processing with a general purpose alignment tool | |
US8006069B2 (en) | Inter-processor communication method | |
US6209086B1 (en) | Method and apparatus for fast response time interrupt control in a pipelined data processor | |
EP1422617A2 (en) | Coprocessor architecture based on a split-instruction transaction model | |
US7620804B2 (en) | Central processing unit architecture with multiple pipelines which decodes but does not execute both branch paths | |
US20210294639A1 (en) | Entering protected pipeline mode without annulling pending instructions | |
US11048513B2 (en) | Entering protected pipeline mode with clearing | |
US20020002639A1 (en) | Methods and apparatus for loading a very long instruction word memory | |
US8854382B2 (en) | Multi-function encoder and decoder devices, and methods thereof | |
JP2928684B2 (en) | VLIW type arithmetic processing unit | |
US6275924B1 (en) | System for buffering instructions in a processor by reissuing instruction fetches during decoder stall time | |
JP2004510244A (en) | Variable width instruction alignment engine | |
US7984204B2 (en) | Programmable direct memory access controller having pipelined and sequentially connected stages | |
US7594103B1 (en) | Microprocessor and method of processing instructions for responding to interrupt condition | |
JP3668643B2 (en) | Information processing device | |
US6735686B1 (en) | Data processing device including two instruction decoders for decoding branch instructions | |
JPS60250438A (en) | Information processor | |
JPH0844561A (en) | Boosting control method and processor device provided with boosting control mechanism | |
JPH04353928A (en) | Arithmetic processing unit | |
JPH0421126A (en) | Data processor | |
JP2004521408A (en) | Event vector table override |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARC INTERNATIONAL(UK) LIMITED, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIM, SEOW CHUAN;GRAHAM, CARL NORMAN;WONG, KAR-LIK;AND OTHERS;REEL/FRAME:018356/0347;SIGNING DATES FROM 20060926 TO 20060927 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |