US20070074007A1 - Parameterizable clip instruction and method of performing a clip operation using the same - Google Patents
Parameterizable clip instruction and method of performing a clip operation using the same Download PDFInfo
- Publication number
- US20070074007A1 US20070074007A1 US11/528,326 US52832606A US2007074007A1 US 20070074007 A1 US20070074007 A1 US 20070074007A1 US 52832606 A US52832606 A US 52832606A US 2007074007 A1 US2007074007 A1 US 2007074007A1
- Authority
- US
- United States
- Prior art keywords
- instruction
- range
- address
- controlling parameter
- clip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012545 processing Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 abstract description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30018—Bit or string instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
- G06F9/3875—Pipelining a single stage, e.g. superpipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/43—Hardware specially adapted for motion estimation or compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Definitions
- the invention relates generally to embedded microprocessor architectures and more specifically to a clip instruction for SIMD microprocessor architectures and a method of performing a clip operation using such a clip instruction.
- SIMD Single instruction multiple data
- DCT discrete cosine transforms
- filters Data parallelism exists when a large mass of data of uniform type needs the same instruction performed on it.
- SISD single instruction single data
- SIMD architecture a single instruction may be used to effect an operation on a wide block of data.
- SIMD architecture exploits parallelism in the data stream while SISD can only operate on data sequentially.
- An example of an application that takes advantage of SIMD is one where the same value is being added to a large number of data points, a common operation in many media application.
- One example of this is changing the brightness of a graphic image.
- Each pixel of the image may consist of three values for the brightness of the red, green ad blue portions of the color.
- the R, G and B values, or alternatively the YUV values are read from memory, a value is added to it, and the resulting value is written back to memory.
- a SIMD processor enhances performance of this type of operation over that of a SISD processor.
- a reason for this improvement is that that in SIMD architectures, data is understood to be in blocks and a number of values can be loaded at once.
- SIMD processor Instead of a series of instructions to incrementally fetch individual pixels, a SIMD processor will have a single instruction that effectively says “get all these pixels” Another advantage of SIMD machines is multiple pieces of data are operated on simultaneously. Thus, a single instruction can say “perform this operations on all the pixels.” Thus, SIMD machines are much more efficient in exploiting data parallelism than SISD machines.
- SIMD architectures have particular promise for video encoding/decoding applications where many repetitive numerical computations must be performed on relatively large blocks of data.
- Numerical computation algorithms such as those common in video encoding/decoding, often require results to be clipped to be within a specified range of values. For example, in video processing, a system will have a maximum pixel depth depending on the system's resolution. If the value of an intermediate calculation result, such as interpolation or other calculation, lies outside the maximum value the final result will have to be clipped to the saturation value, for example, the maximum pixel value.
- Such a software clipping implementation incurs a high overhead due to the number of calculations required to test each value.
- the sequential nature of a software implementation makes it very difficult to be optimized in processors designed to exploit instruction level parallelism, such as, for example, SISD reduced instruction set (RISC) machines or very long instruction word (VLIW) machines.
- RISC reduced instruction set
- VLIW very long instruction word
- At least one embodiment of the invention may provide a parameterizable microprocessor clip instruction.
- the parameterizable microprocessor clip instruction may comprise a destination register operand, a source register operand of a value to be clipped, and a second source operand containing the control parameter specifying the manner in which clipping is to be performed, wherein the control parameter comprises a range type and range specifier.
- the source operand containing the “value” to be clipped is really referring to the values to be clipped because a 128-bit register is used to hold 8 16-bit values to be clipped by a single instruction.
- At least one embodiment of the invention may provide a method of causing a microprocessor to perform a clip operation.
- the method according to this embodiment may comprise providing an assembly instruction to the microprocessor, the instruction comprising an input address, an output address and a controlling parameter, decoding the instruction with logic in the microprocessor, retrieving a data input from the input address, determining a specific clip operation based on the controlling parameter, performing the clip operation on the data input, and writing the result to output address.
- Another embodiment of the invention may provide a method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor.
- the method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor may comprise specifying a source address of a data input, a destination address of a clipped output and a controlling parameter in a single instruction, obtaining the data input at the source address, performing the clip operation on the data input in accordance with the controlling parameter, and storing the result at the destination address.
- At least one other embodiment of the invention may provide a parameterizable assembly language program instruction for performing a clip operation in a video processing application.
- the parameterizable assembly language program instruction may comprise an instruction name for a particular microprocessor instruction, a first instruction input operand comprising a destination register address to write an instruction result, a second instruction input operand comprising a source register address containing a value to be clipped, and a third instruction input operand comprising a controlling parameter.
- FIG. 1 is a diagram illustrating the components of a parameterizable clip instruction for either SISD or SIMD processor architectures according to at least one embodiment of the invention
- FIG. 2 illustrates the format of a 32-bit parameter input to the parameterizable clip instruction of FIG. 1 according to at least one embodiment of the invention
- FIG. 3 is a table illustrating the ways in which the parameters of the parameterizable clip instruction may be specified.
- FIG. 4 is a flow chart of an exemplary method of performing a clip operation with a parameterizable clip instruction according to at least one embodiment of the invention.
- FIG. 1 a diagram illustrating the components of a parameterizable clip instruction for either SISD or SIMD processor architectures according to at least one embodiment of the invention is provided.
- algorithms in numerical computations such as those common in video encoding/decoding, often require results to be clipped to be within a specified range of values. For example, in video processing, a system will have a maximum pixel depth depending on the system's resolution. If the value of an intermediate calculation result, such as an interpolation or other calculation lies outside the maximum value the final result will have to be clipped to a saturation value, for example, the maximum pixel value.
- Such a software clipping implementation incurs a high overhead due to the number of calculations required to test each value.
- the sequential nature of a software implementation makes it very difficult to be optimized in processors designed to exploit instruction level parallelism, such as, for example, SISD reduced instruction set (RISC) machines or very long instruction word (VLIW) machines.
- RISC reduced instruction set
- VLIW very long instruction word
- Some processors do implement clipping at the hardware level using specialized processor instructions, however, the clipping ranges of these instructions are fixed to some value, typically a power of two. Therefore, various embodiments of this invention provide a parameterizable clip instruction for a microprocessor that enables adjustment of clipping parameters.
- the instruction 100 labeled “VBLCIP” contains three elements, rd, rb and rc.
- Rb and rd are the source and destination register addresses respectively. That is, rb is the register address of the value to be clipped and rd is the register address where the clipped value is to be written.
- Rc is the controlling parameter for the instruction. The value of rc dictates how the value located at address rb will be clipped. This instruction permits 8 16 bit values to be clipped within the range specified by the control parameter rc.
- FIG. 2 illustrates the format of controlling parameter rc in the form of a 32-bit operand and FIG. 3 is a table illustrating the ways in which the parameters of the parameterizable clip instruction may be specified.
- the input rc is a 32 bit input.
- rc may be 16, 32, 64, 128 or other bit size.
- the most significant 16 bits that is, bits 31 to 16 are unused as seen in the table.
- bits 15 and 14 are reserved for the range type, while bits 13 - 0 are used for the range specifier.
- range types are available. Specifically, range types of [0, 2 N ⁇ 1], [ ⁇ N, N], [ ⁇ 2 N , 2 N ⁇ 1] and [0, N] corresponding to 2-bit binary values 00, 01, 10 and 11. The remaining 14 least significant bits, bits 13 to bit 0 are used to represent N, the range specifier. These bits contain a binary number having a maximum value of 11111111111111 (16383). Thus, by using range type 01 or 11, ranges not limited to powers of two may be used.
- the range specifier N is itself a parameter supplied to the VBCLIP instruction 100 .
- the bit type RT specifies one of the four possible ways the clipping range can be defined using the range specifier N.
- Range types 00 and 10 are designed to work with unsigned and signed clipping ranges respectively, while types 01 and 11 are designed to work with signed and unsigned clipping ranges that are not powers of two.
- the VBCLIP instruction is therefore a highly flexible processor implementation of clipping.
- FIGS. 2 and 3 describes VBCLIP as an SISD instruction, the instruction syntax can easily be extended to SIMD architectures in which both registers rb and rc are vector registers.
- clipping as specified in rc, is applied to each slice of the vector register rb with the results assigned to the corresponding slice in rd.
- An additional advantage of a SIMD version of the clipping instruction is bypassing the data dependent sequential nature of clipping operations that is awkward to implement in parallel machines.
- this figure is flow chart an exemplary method for performing a clip operation with a parameterizable clip instruction according to at least one embodiment of the invention.
- the method begins in step 200 and proceeds to step 205 where the clip instruction is fed to the microprocessor pipeline.
- the instruction comprises an instruction taking the form of a name and three input operands: a destination address, a source address and a controlling parameter.
- the data to be operated on is fetched from the source address specified in the instruction.
- the range type indicated in the instruction is referenced to determine the actual range after decoding the instruction.
- the range type is represented by two bits of the input operand's controlling parameter rc.
- a table is stored in a memory register of the processor that maintains a list of the range types indexed by the two-bit code.
- the range specifier is extracted from the instruction and using the range type, a range is determined.
- the value fetched in step 210 is clipped in accordance with the range determined in step 220 .
- the result is written to the destination address specified in the destination address input operand rd of the instruction. Operation of the method stops in step 235 .
Abstract
A parameterizable clip instruction for SIMD microprocessor architecture and method of performing a clip operating the same. A single instruction is provided with three input operands: a destination address, a source address and a controlling parameter. The controlling parameter includes a range type and a range specifier. The range type is a multi-bit integer in the operand that is used to index a table of range types. The range specifier plugs into the range type to define a range. The data input at the source address is clipped according to the controlling parameters. The instruction is particularly suited to video encoding/decoding applications where interpolations or other calculations, lies outside the maximum value and that final result will have to be clipped to saturation value, for example, the maximum pixel value. Signed and unsigned clipping ranges may be used that are not only powers of two.
Description
- This application claims priority to U.S. Provisional Patent Application No. 60/721,108 titled “SIMD Architecture and Associated Systems and Methods,” filed Sep. 28, 2005, the disclosure of which is hereby incorporated by reference in its entirety.
- The invention relates generally to embedded microprocessor architectures and more specifically to a clip instruction for SIMD microprocessor architectures and a method of performing a clip operation using such a clip instruction.
- Single instruction multiple data (SIMD) architectures have become increasingly important as demand for video processing in electronic devices has increased. The SIMD architecture exploits the data parallelism that is abundant in data manipulations often found in media related applications, such as discrete cosine transforms (DCT) and filters. Data parallelism exists when a large mass of data of uniform type needs the same instruction performed on it. Thus, in contrast to a single instruction single data (SISD) architecture, in a SIMD architecture a single instruction may be used to effect an operation on a wide block of data. SIMD architecture exploits parallelism in the data stream while SISD can only operate on data sequentially.
- An example of an application that takes advantage of SIMD is one where the same value is being added to a large number of data points, a common operation in many media application. One example of this is changing the brightness of a graphic image. Each pixel of the image may consist of three values for the brightness of the red, green ad blue portions of the color. To change the brightness, the R, G and B values, or alternatively the YUV values are read from memory, a value is added to it, and the resulting value is written back to memory. A SIMD processor enhances performance of this type of operation over that of a SISD processor. A reason for this improvement is that that in SIMD architectures, data is understood to be in blocks and a number of values can be loaded at once. Instead of a series of instructions to incrementally fetch individual pixels, a SIMD processor will have a single instruction that effectively says “get all these pixels” Another advantage of SIMD machines is multiple pieces of data are operated on simultaneously. Thus, a single instruction can say “perform this operations on all the pixels.” Thus, SIMD machines are much more efficient in exploiting data parallelism than SISD machines.
- SIMD architectures have particular promise for video encoding/decoding applications where many repetitive numerical computations must be performed on relatively large blocks of data. Numerical computation algorithms, such as those common in video encoding/decoding, often require results to be clipped to be within a specified range of values. For example, in video processing, a system will have a maximum pixel depth depending on the system's resolution. If the value of an intermediate calculation result, such as interpolation or other calculation, lies outside the maximum value the final result will have to be clipped to the saturation value, for example, the maximum pixel value.
- Clipping is typically implemented in software using a sequence of instructions that first test the intermediate value and then conditionally assign the final value, for example, if value>maximum, then value=maximum. Such a software clipping implementation incurs a high overhead due to the number of calculations required to test each value. The sequential nature of a software implementation makes it very difficult to be optimized in processors designed to exploit instruction level parallelism, such as, for example, SISD reduced instruction set (RISC) machines or very long instruction word (VLIW) machines. Some processors do implement clipping at the hardware level using specialized processor instructions, however, the clipping ranges of these instructions are fixed to some value, typically a power of two.
- Thus, there exists a need for a SIMD microprocessor architecture that ameliorates at least some of the above-noted deficiencies of conventional systems. At least one embodiment of the invention may provide a parameterizable microprocessor clip instruction. The parameterizable microprocessor clip instruction according to this embodiment may comprise a destination register operand, a source register operand of a value to be clipped, and a second source operand containing the control parameter specifying the manner in which clipping is to be performed, wherein the control parameter comprises a range type and range specifier. It should be appreciated that in the context of a SIMD machine, the source operand containing the “value” to be clipped is really referring to the values to be clipped because a 128-bit register is used to hold 8 16-bit values to be clipped by a single instruction.
- Accordingly, at least one embodiment of the invention may provide a method of causing a microprocessor to perform a clip operation. The method according to this embodiment may comprise providing an assembly instruction to the microprocessor, the instruction comprising an input address, an output address and a controlling parameter, decoding the instruction with logic in the microprocessor, retrieving a data input from the input address, determining a specific clip operation based on the controlling parameter, performing the clip operation on the data input, and writing the result to output address.
- Another embodiment of the invention may provide a method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor. The method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor may comprise specifying a source address of a data input, a destination address of a clipped output and a controlling parameter in a single instruction, obtaining the data input at the source address, performing the clip operation on the data input in accordance with the controlling parameter, and storing the result at the destination address.
- At least one other embodiment of the invention may provide a parameterizable assembly language program instruction for performing a clip operation in a video processing application. The parameterizable assembly language program instruction according to this embodiment may comprise an instruction name for a particular microprocessor instruction, a first instruction input operand comprising a destination register address to write an instruction result, a second instruction input operand comprising a source register address containing a value to be clipped, and a third instruction input operand comprising a controlling parameter.
- These and other embodiments and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
- In order to facilitate a fuller understanding of the present disclosure, reference is now made to the accompanying drawings, in which like elements are referenced with like numerals. These drawings should not be construed as limiting the present disclosure, but are intended to be exemplary only.
-
FIG. 1 is a diagram illustrating the components of a parameterizable clip instruction for either SISD or SIMD processor architectures according to at least one embodiment of the invention; -
FIG. 2 illustrates the format of a 32-bit parameter input to the parameterizable clip instruction ofFIG. 1 according to at least one embodiment of the invention; -
FIG. 3 is a table illustrating the ways in which the parameters of the parameterizable clip instruction may be specified; and -
FIG. 4 is a flow chart of an exemplary method of performing a clip operation with a parameterizable clip instruction according to at least one embodiment of the invention. - The following description is intended to convey a thorough understanding of the embodiments described by providing a number of specific embodiments and details involving microprocessor architecture and systems and methods for performing clip operations with a parameterizable clip instruction. It should be appreciated, however, that the present invention is not limited to these specific embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
- Referring now to
FIG. 1 , a diagram illustrating the components of a parameterizable clip instruction for either SISD or SIMD processor architectures according to at least one embodiment of the invention is provided. As discussed above, algorithms in numerical computations, such as those common in video encoding/decoding, often require results to be clipped to be within a specified range of values. For example, in video processing, a system will have a maximum pixel depth depending on the system's resolution. If the value of an intermediate calculation result, such as an interpolation or other calculation lies outside the maximum value the final result will have to be clipped to a saturation value, for example, the maximum pixel value. - Conventionally, clipping is implemented in software using a sequence of instructions that first test the intermediate value and then conditionally assign the final value, for example, if value>maximum, then value=maximum. Such a software clipping implementation incurs a high overhead due to the number of calculations required to test each value. The sequential nature of a software implementation makes it very difficult to be optimized in processors designed to exploit instruction level parallelism, such as, for example, SISD reduced instruction set (RISC) machines or very long instruction word (VLIW) machines. Some processors do implement clipping at the hardware level using specialized processor instructions, however, the clipping ranges of these instructions are fixed to some value, typically a power of two. Therefore, various embodiments of this invention provide a parameterizable clip instruction for a microprocessor that enables adjustment of clipping parameters.
- Referring to
FIG. 1 , theinstruction 100 labeled “VBLCIP” contains three elements, rd, rb and rc. Rb and rd are the source and destination register addresses respectively. That is, rb is the register address of the value to be clipped and rd is the register address where the clipped value is to be written. Rc is the controlling parameter for the instruction. The value of rc dictates how the value located at address rb will be clipped. This instruction permits 8 16 bit values to be clipped within the range specified by the control parameter rc. -
FIG. 2 illustrates the format of controlling parameter rc in the form of a 32-bit operand andFIG. 3 is a table illustrating the ways in which the parameters of the parameterizable clip instruction may be specified. As seen from these Figures, in this example, the input rc is a 32 bit input. However, it should be appreciated that depending upon the native word size of the processor, rc may be 16, 32, 64, 128 or other bit size. In various embodiments, the most significant 16 bits, that is,bits 31 to 16 are unused as seen in the table. In various embodiments, bits 15 and 14 are reserved for the range type, while bits 13-0 are used for the range specifier. - In the example of
FIG. 3 , four range types are available. Specifically, range types of [0, 2N−1], [−N, N], [−2N, 2N−1] and [0, N] corresponding to 2-bitbinary values bits 13 tobit 0 are used to represent N, the range specifier. These bits contain a binary number having a maximum value of 11111111111111 (16383). Thus, by usingrange type - In the table 110 of
FIG. 3 , the range specifier N is itself a parameter supplied to theVBCLIP instruction 100. The bit type RT specifies one of the four possible ways the clipping range can be defined using the range specifier N. Range types 00 and 10 are designed to work with unsigned and signed clipping ranges respectively, whiletypes FIGS. 2 and 3 describes VBCLIP as an SISD instruction, the instruction syntax can easily be extended to SIMD architectures in which both registers rb and rc are vector registers. In this case, clipping, as specified in rc, is applied to each slice of the vector register rb with the results assigned to the corresponding slice in rd. An additional advantage of a SIMD version of the clipping instruction is bypassing the data dependent sequential nature of clipping operations that is awkward to implement in parallel machines. - Referring now to
FIG. 4 , this figure is flow chart an exemplary method for performing a clip operation with a parameterizable clip instruction according to at least one embodiment of the invention. The method begins instep 200 and proceeds to step 205 where the clip instruction is fed to the microprocessor pipeline. As discussed above in the context ofFIGS. 1-3 , in various embodiments, the instruction comprises an instruction taking the form of a name and three input operands: a destination address, a source address and a controlling parameter. Then, instep 210, the data to be operated on is fetched from the source address specified in the instruction. Also, instep 215, the range type indicated in the instruction is referenced to determine the actual range after decoding the instruction. In various embodiments, the range type is represented by two bits of the input operand's controlling parameter rc. In various embodiments, a table is stored in a memory register of the processor that maintains a list of the range types indexed by the two-bit code. Instep 220, the range specifier is extracted from the instruction and using the range type, a range is determined. Instep 225, the value fetched instep 210 is clipped in accordance with the range determined instep 220. Instep 230 the result is written to the destination address specified in the destination address input operand rd of the instruction. Operation of the method stops instep 235. - The embodiments of the present inventions are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to systems and methods for performing clip operations with a parameterizable clip instruction, the principles herein are equally applicable to other aspects of microprocessor design and function. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although some of the embodiments of the present invention have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breath and spirit of the embodiments of the present inventions as disclosed herein.
Claims (18)
1. A method of causing a microprocessor to perform a clip operation comprising:
providing an assembly instruction to the microprocessor, the instruction comprising an input address, an output address and a controlling parameter;
decoding the instruction with logic in the microprocessor;
retrieving a data input from the input address;
determining a specific clip operation based on the controlling parameter;
performing the clip operation on the data input; and
writing the result to output address.
2. The method according to claim 1 , wherein determining a clip operation based on the controlling parameter comprises decoding the controlling parameter into a range type and a range specifier.
3. The method according to claim 2 , wherein the range type is a type selected from the group consisting of a [0, 2N−1], [−N, N], [−2 N, 2N−1] and [0, N], where N is the range specifier.
4. The method according to claim 2 , wherein decoding the controlling parameter into a range type comprises performing a table look up of a X-bit number in the controlling parameter where 2X is the number of range types.
5. The method according to claim 2 , wherein performing the clip operation comprises clipping the input value according to the range type and range specifier.
6. The method according to claim 1 , wherein in the input address and output addresses comprise vector registers.
7. A method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor comprising:
specifying a source address of a data input, a destination address of a clipped output and a controlling parameter in a single instruction;
obtaining the data input at the source address;
performing the clip operation on the data input in accordance with the controlling parameter; and
storing the result at the destination address.
8. The method according to claim 7 , wherein specifying a controlling parameter comprises specifying a Y bit number including a range type and a range specifier, where Y is an integer power of 2.
9. The method according to claim 8 , wherein the range type is a is a type selected from the group consisting of a [0, 2N−1], [−N, N], [−2N, 2N−1] and [0, N], where N is the range specifier.
10. The method according to claim 9 , wherein the range specifier is a positive integer.
11. The method according to claim 8 , wherein performing the clip operation in accordance with the controlling parameter comprises clipping the data input based on the instruction's range specifier and range type.
12. The method according to claim 7 , wherein the source address and destination address comprise vector registers and performing the clip operation comprises performing the clip operation in accordance with the controlling parameter on each slice of the source address vector registers and storing the results at a corresponding slice of the destination address vector register.
13. A parameterizable assembly language program instruction for performing a clip operation in an video processing application comprising:
an instruction name for a particular microprocessor instruction;
a first instruction input operand comprising a destination register address to write an instruction result;
a second instruction input operand comprising a source register address containing a value to be clipped; and
a third instruction input operand comprising a controlling parameter.
14. The instruction according to claim 13 , wherein the controlling parameter comprises a Z-bit number wherein Z is an integer power of 2.
15. The instruction according to claim 13 , wherein the controlling parameter includes a range type and a range specifier.
16. The instruction according to claim 15 , wherein the range type is a type selected from the group consisting of a [0, 2N−1], [−N, N], [−2N, 2N−1] and [0, N], where N is the range specifier.
17. The instruction according to claim 16 , wherein N is a positive integer.
18. The instruction according to claim 13 , wherein the destination register address and the source register address are vector register addresses.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/528,326 US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72110805P | 2005-09-28 | 2005-09-28 | |
US11/528,326 US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/016,447 Division US20090022710A1 (en) | 2002-09-19 | 2008-01-18 | Nuclear factor of activated t cells receptor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070074007A1 true US20070074007A1 (en) | 2007-03-29 |
Family
ID=37968194
Family Applications (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,470 Abandoned US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US11/528,432 Active 2031-04-08 US8218635B2 (en) | 2005-09-28 | 2006-09-28 | Systolic-array based systems and methods for performing block matching in motion compensation |
US11/528,327 Active 2029-03-30 US7747088B2 (en) | 2005-09-28 | 2006-09-28 | System and methods for performing deblocking in microprocessor-based video codec applications |
US11/528,434 Abandoned US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US11/528,338 Active 2027-04-24 US7971042B2 (en) | 2005-09-28 | 2006-09-28 | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US11/528,326 Abandoned US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
US11/528,325 Active 2030-06-13 US8212823B2 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
Family Applications Before (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,470 Abandoned US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US11/528,432 Active 2031-04-08 US8218635B2 (en) | 2005-09-28 | 2006-09-28 | Systolic-array based systems and methods for performing block matching in motion compensation |
US11/528,327 Active 2029-03-30 US7747088B2 (en) | 2005-09-28 | 2006-09-28 | System and methods for performing deblocking in microprocessor-based video codec applications |
US11/528,434 Abandoned US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US11/528,338 Active 2027-04-24 US7971042B2 (en) | 2005-09-28 | 2006-09-28 | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,325 Active 2030-06-13 US8212823B2 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
Country Status (2)
Country | Link |
---|---|
US (7) | US20070073925A1 (en) |
WO (1) | WO2007049150A2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
WO2010051298A2 (en) * | 2008-10-31 | 2010-05-06 | Intel Corporation | Instruction and logic for performing range detection |
US20100180100A1 (en) * | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
US8437410B1 (en) | 2007-11-21 | 2013-05-07 | Marvell International Ltd. | System and method to execute a clipping instruction |
US20140215186A1 (en) * | 2011-12-22 | 2014-07-31 | Elmoustapha Ould-Ahmed-Vall | Systems, apparatuses, and methods for mapping a source operand to a different range |
CN104011670A (en) * | 2011-12-22 | 2014-08-27 | 英特尔公司 | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
US9513917B2 (en) | 2011-04-01 | 2016-12-06 | Intel Corporation | Vector friendly instruction format and execution thereof |
US11567775B1 (en) * | 2021-10-25 | 2023-01-31 | Sap Se | Dynamic generation of logic for computing systems |
Families Citing this family (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9015397B2 (en) | 2012-11-29 | 2015-04-21 | Sandisk Technologies Inc. | Method and apparatus for DMA transfer with synchronization optimization |
US9330060B1 (en) | 2003-04-15 | 2016-05-03 | Nvidia Corporation | Method and device for encoding and decoding video image data |
US8660182B2 (en) | 2003-06-09 | 2014-02-25 | Nvidia Corporation | MPEG motion estimation based on dual start points |
TWI239474B (en) * | 2004-07-28 | 2005-09-11 | Novatek Microelectronics Corp | Circuit for counting sum of absolute difference |
TWI295540B (en) * | 2005-06-15 | 2008-04-01 | Novatek Microelectronics Corp | Motion estimation circuit and operating method thereof |
TWI296091B (en) * | 2005-11-15 | 2008-04-21 | Novatek Microelectronics Corp | Motion estimation circuit and motion estimation processing element |
US8731071B1 (en) | 2005-12-15 | 2014-05-20 | Nvidia Corporation | System for performing finite input response (FIR) filtering in motion estimation |
US20070217515A1 (en) * | 2006-03-15 | 2007-09-20 | Yu-Jen Wang | Method for determining a search pattern for motion estimation |
US8724702B1 (en) | 2006-03-29 | 2014-05-13 | Nvidia Corporation | Methods and systems for motion estimation used in video coding |
US8660380B2 (en) | 2006-08-25 | 2014-02-25 | Nvidia Corporation | Method and system for performing two-dimensional transform on data value array with reduced power consumption |
US9094686B2 (en) * | 2006-09-06 | 2015-07-28 | Broadcom Corporation | Systems and methods for faster throughput for compressed video data decoding |
KR101354659B1 (en) * | 2006-11-08 | 2014-01-28 | 삼성전자주식회사 | Method and apparatus for motion compensation supporting multicodec |
US7958177B2 (en) * | 2006-11-29 | 2011-06-07 | Arcsoft, Inc. | Method of parallelly filtering input data words to obtain final output data words containing packed half-pel pixels |
US8756482B2 (en) | 2007-05-25 | 2014-06-17 | Nvidia Corporation | Efficient encoding/decoding of a sequence of data frames |
US9118927B2 (en) * | 2007-06-13 | 2015-08-25 | Nvidia Corporation | Sub-pixel interpolation and its application in motion compensated encoding of a video signal |
US8873625B2 (en) | 2007-07-18 | 2014-10-28 | Nvidia Corporation | Enhanced compression in representing non-frame-edge blocks of image frames |
US8634470B2 (en) * | 2007-07-24 | 2014-01-21 | Samsung Electronics Co., Ltd. | Multimedia decoding method and multimedia decoding apparatus based on multi-core processor |
JP2009054032A (en) * | 2007-08-28 | 2009-03-12 | Toshiba Corp | Parallel processor |
JP5159258B2 (en) * | 2007-11-06 | 2013-03-06 | 株式会社東芝 | Arithmetic processing unit |
US20090188521A1 (en) * | 2008-01-17 | 2009-07-30 | Evazynajad Ali M | Dental Floss Formed from Botanic and Botanically Derived Fiber |
US8726289B2 (en) * | 2008-02-22 | 2014-05-13 | International Business Machines Corporation | Streaming attachment of hardware accelerators to computer systems |
US8250578B2 (en) * | 2008-02-22 | 2012-08-21 | International Business Machines Corporation | Pipelining hardware accelerators to computer systems |
US7953912B2 (en) * | 2008-02-22 | 2011-05-31 | International Business Machines Corporation | Guided attachment of accelerators to computer systems |
CN102112971A (en) * | 2008-08-06 | 2011-06-29 | 阿斯奔收购公司 | Haltable and restartable dma engine |
US9179166B2 (en) * | 2008-12-05 | 2015-11-03 | Nvidia Corporation | Multi-protocol deblock engine core system and method |
US8666181B2 (en) | 2008-12-10 | 2014-03-04 | Nvidia Corporation | Adaptive multiple engine image motion detection system and method |
CN102055969B (en) * | 2009-10-30 | 2012-12-19 | 鸿富锦精密工业(深圳)有限公司 | Image deblocking filter and image processing device using same |
US9390539B2 (en) * | 2009-11-04 | 2016-07-12 | Intel Corporation | Performing parallel shading operations |
TWI449433B (en) * | 2011-08-01 | 2014-08-11 | Novatek Microelectronics Corp | Image processing circuit and image processing method |
CN102346769B (en) * | 2011-09-20 | 2014-10-22 | 奇智软件(北京)有限公司 | Method and device for consolidating registry file |
WO2013095605A1 (en) * | 2011-12-23 | 2013-06-27 | Intel Corporation | Apparatus and method for sliding window data gather |
US9152424B2 (en) * | 2012-06-14 | 2015-10-06 | International Business Machines Corporation | Mitigating instruction prediction latency with independently filtered presence predictors |
US9241163B2 (en) * | 2013-03-15 | 2016-01-19 | Intersil Americas LLC | VC-2 decoding using parallel decoding paths |
US11228769B2 (en) | 2013-06-03 | 2022-01-18 | Texas Instruments Incorporated | Multi-threading in a video hardware engine |
US9330022B2 (en) * | 2013-06-25 | 2016-05-03 | Intel Corporation | Power logic for memory address conversion |
JP6262621B2 (en) * | 2013-09-25 | 2018-01-17 | 株式会社メガチップス | Image enlargement / reduction processing apparatus and image enlargement / reduction processing method |
US9547493B2 (en) * | 2013-10-03 | 2017-01-17 | Synopsys, Inc. | Self-timed user-extension instructions for a processing device |
GB2524063B (en) | 2014-03-13 | 2020-07-01 | Advanced Risc Mach Ltd | Data processing apparatus for executing an access instruction for N threads |
US20160125263A1 (en) * | 2014-11-03 | 2016-05-05 | Texas Instruments Incorporated | Method to compute sliding window block sum using instruction based selective horizontal addition in vector processor |
KR102332523B1 (en) * | 2014-12-24 | 2021-11-29 | 삼성전자주식회사 | Apparatus and method for execution processing |
US9715464B2 (en) | 2015-03-27 | 2017-07-25 | Microsoft Technology Licensing, Llc | Direct memory access descriptor processing |
US10390114B2 (en) * | 2016-07-22 | 2019-08-20 | Intel Corporation | Memory sharing for physical accelerator resources in a data center |
US10108581B1 (en) | 2017-04-03 | 2018-10-23 | Google Llc | Vector reduction processor |
GB2563384B (en) * | 2017-06-07 | 2019-12-25 | Advanced Risc Mach Ltd | Programmable instruction buffering |
US10437740B2 (en) * | 2017-12-15 | 2019-10-08 | Exten Technologies, Inc. | High performance raid operations offload with minimized local buffering |
CA3113538A1 (en) * | 2018-09-24 | 2020-04-02 | Huawei Technologies Co., Ltd. | Image processing device and method for performing quality optimized deblocking |
US11099973B2 (en) * | 2019-01-28 | 2021-08-24 | Salesforce.Com, Inc. | Automated test case management systems and methods |
EP3994573A4 (en) * | 2019-07-03 | 2022-08-10 | Huaxia General Processor Technologies Inc. | System and architecture of pure functional neural network accelerator |
US11880231B2 (en) * | 2020-12-14 | 2024-01-23 | Microsoft Technology Licensing, Llc | Accurate timestamp or derived counter value generation on a complex CPU |
CN113312088B (en) * | 2021-06-29 | 2022-05-17 | 北京熵核科技有限公司 | Method and device for executing program instruction |
WO2023235004A1 (en) * | 2022-06-02 | 2023-12-07 | Micron Technology, Inc. | Time-division multiplexed simd function unit |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884057A (en) * | 1994-01-11 | 1999-03-16 | Exponential Technology, Inc. | Temporal re-alignment of a floating point pipeline to an integer pipeline for emulation of a load-operate architecture on a load/store processor |
US20020065860A1 (en) * | 2000-10-04 | 2002-05-30 | Grisenthwaite Richard Roy | Data processing apparatus and method for saturating data values |
US6529930B1 (en) * | 1998-11-16 | 2003-03-04 | Hitachi America, Ltd. | Methods and apparatus for performing a signed saturation operation |
US20060015702A1 (en) * | 2002-08-09 | 2006-01-19 | Khan Moinul H | Method and apparatus for SIMD complex arithmetic |
US20060095713A1 (en) * | 2004-11-03 | 2006-05-04 | Stexar Corporation | Clip-and-pack instruction for processor |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
Family Cites Families (215)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4594659A (en) * | 1982-10-13 | 1986-06-10 | Honeywell Information Systems Inc. | Method and apparatus for prefetching instructions for a central execution pipeline unit |
JPS63225822A (en) * | 1986-08-11 | 1988-09-20 | Toshiba Corp | Barrel shifter |
US4905178A (en) * | 1986-09-19 | 1990-02-27 | Performance Semiconductor Corporation | Fast shifter method and structure |
JPS6398729A (en) * | 1986-10-15 | 1988-04-30 | Fujitsu Ltd | Barrel shifter |
US4914622A (en) * | 1987-04-17 | 1990-04-03 | Advanced Micro Devices, Inc. | Array-organized bit map with a barrel shifter |
DE3889812T2 (en) | 1987-08-28 | 1994-12-15 | Nec Corp | Data processor with a test structure for multi-position shifters. |
KR970005453B1 (en) | 1987-12-25 | 1997-04-16 | 가부시기가이샤 히다찌세이사꾸쇼 | Data processing apparatus for high speed processing |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
JPH01263820A (en) * | 1988-04-15 | 1989-10-20 | Hitachi Ltd | Microprocessor |
EP0344347B1 (en) | 1988-06-02 | 1993-12-29 | Deutsche ITT Industries GmbH | Digital signal processing unit |
GB2229832B (en) | 1989-03-30 | 1993-04-07 | Intel Corp | Byte swap instruction for memory format conversion within a microprocessor |
JPH03185530A (en) | 1989-12-14 | 1991-08-13 | Mitsubishi Electric Corp | Data processor |
DE69030648T2 (en) * | 1990-01-02 | 1997-11-13 | Motorola Inc | Method for sequential prefetching of 1-word, 2-word or 3-word instructions |
JPH03248226A (en) * | 1990-02-26 | 1991-11-06 | Nec Corp | Microprocessor |
JP2560889B2 (en) | 1990-05-22 | 1996-12-04 | 日本電気株式会社 | Microprocessor |
US5778423A (en) | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
CA2045790A1 (en) * | 1990-06-29 | 1991-12-30 | Richard Lee Sites | Branch prediction in high-performance processor |
US5155843A (en) | 1990-06-29 | 1992-10-13 | Digital Equipment Corporation | Error transition mode for multi-processor system |
JP2556612B2 (en) | 1990-08-29 | 1996-11-20 | 日本電気アイシーマイコンシステム株式会社 | Barrel shifter circuit |
US5636363A (en) * | 1991-06-14 | 1997-06-03 | Integrated Device Technology, Inc. | Hardware control structure and method for off-chip monitoring entries of an on-chip cache |
US5539911A (en) | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US5493687A (en) * | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5450586A (en) | 1991-08-14 | 1995-09-12 | Hewlett-Packard Company | System for analyzing and debugging embedded software through dynamic and interactive use of code markers |
US5283874A (en) | 1991-10-21 | 1994-02-01 | Intel Corporation | Cross coupling mechanisms for simultaneously completing consecutive pipeline instructions even if they begin to process at the same microprocessor of the issue fee |
CA2073516A1 (en) | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Dynamic multi-mode parallel processor array architecture computer system |
FR2690299B1 (en) * | 1992-04-17 | 1994-06-17 | Telecommunications Sa | METHOD AND DEVICE FOR SPATIAL FILTERING OF DIGITAL IMAGES DECODED BY BLOCK TRANSFORMATION. |
US5423011A (en) * | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5542074A (en) | 1992-10-22 | 1996-07-30 | Maspar Computer Corporation | Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount |
US5696958A (en) | 1993-01-11 | 1997-12-09 | Silicon Graphics, Inc. | Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline |
GB2275119B (en) | 1993-02-03 | 1997-05-14 | Motorola Inc | A cached processor |
US5937202A (en) * | 1993-02-11 | 1999-08-10 | 3-D Computing, Inc. | High-speed, parallel, processor architecture for front-end electronics, based on a single type of ASIC, and method use thereof |
US5454117A (en) | 1993-08-25 | 1995-09-26 | Nexgen, Inc. | Configurable branch prediction for a processor performing speculative execution |
JP2801135B2 (en) * | 1993-11-26 | 1998-09-21 | 富士通株式会社 | Instruction reading method and instruction reading device for pipeline processor |
US5509129A (en) * | 1993-11-30 | 1996-04-16 | Guttag; Karl M. | Long instruction word controlling plural independent processor operations |
US5590350A (en) | 1993-11-30 | 1996-12-31 | Texas Instruments Incorporated | Three input arithmetic logic unit with mask generator |
US6116768A (en) | 1993-11-30 | 2000-09-12 | Texas Instruments Incorporated | Three input arithmetic logic unit with barrel rotator |
US5590351A (en) | 1994-01-21 | 1996-12-31 | Advanced Micro Devices, Inc. | Superscalar execution unit for sequential instruction pointer updates and segment limit checks |
JPH07253922A (en) * | 1994-03-14 | 1995-10-03 | Texas Instr Japan Ltd | Address generating circuit |
US5530825A (en) * | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5517436A (en) * | 1994-06-07 | 1996-05-14 | Andreas; David C. | Digital signal processor for audio applications |
WO1996002895A1 (en) * | 1994-07-14 | 1996-02-01 | Johnson Grace Company | Method and apparatus for compressing images |
US5809293A (en) | 1994-07-29 | 1998-09-15 | International Business Machines Corporation | System and method for program execution tracing within an integrated processor |
US5692168A (en) | 1994-10-18 | 1997-11-25 | Cyrix Corporation | Prefetch buffer using flow control bit to identify changes of flow within the code stream |
US5600674A (en) * | 1995-03-02 | 1997-02-04 | Motorola Inc. | Method and apparatus of an enhanced digital signal processor |
US5655122A (en) | 1995-04-05 | 1997-08-05 | Sequent Computer Systems, Inc. | Optimizing compiler with static prediction of branch probability, branch frequency and function frequency |
US5835753A (en) | 1995-04-12 | 1998-11-10 | Advanced Micro Devices, Inc. | Microprocessor with dynamically extendable pipeline stages and a classifying circuit |
US5920711A (en) | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US5842004A (en) | 1995-08-04 | 1998-11-24 | Sun Microsystems, Inc. | Method and apparatus for decompression of compressed geometric three-dimensional graphics data |
US6292879B1 (en) | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US5727211A (en) * | 1995-11-09 | 1998-03-10 | Chromatic Research, Inc. | System and method for fast context switching between tasks |
US5996071A (en) | 1995-12-15 | 1999-11-30 | Via-Cyrix, Inc. | Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address |
US5896305A (en) * | 1996-02-08 | 1999-04-20 | Texas Instruments Incorporated | Shifter circuit for an arithmetic logic unit in a microprocessor |
US5752014A (en) * | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction |
US5784636A (en) | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US20010025337A1 (en) | 1996-06-10 | 2001-09-27 | Frank Worrell | Microprocessor including a mode detector for setting compression mode |
US5964884A (en) | 1996-09-30 | 1999-10-12 | Advanced Micro Devices, Inc. | Self-timed pulse control circuit |
US5805876A (en) | 1996-09-30 | 1998-09-08 | International Business Machines Corporation | Method and system for reducing average branch resolution time and effective misprediction penalty in a processor |
US5848264A (en) | 1996-10-25 | 1998-12-08 | S3 Incorporated | Debug and video queue for multi-processor chip |
US6061521A (en) * | 1996-12-02 | 2000-05-09 | Compaq Computer Corp. | Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle |
US5909572A (en) | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
EP0855645A3 (en) * | 1996-12-31 | 2000-05-24 | Texas Instruments Incorporated | System and method for speculative execution of instructions with data prefetch |
KR100236533B1 (en) * | 1997-01-16 | 2000-01-15 | 윤종용 | Digital signal processor |
US6185732B1 (en) * | 1997-04-08 | 2001-02-06 | Advanced Micro Devices, Inc. | Software debug port for a microprocessor |
US6154857A (en) | 1997-04-08 | 2000-11-28 | Advanced Micro Devices, Inc. | Microprocessor-based device incorporating a cache for capturing software performance profiling data |
US6088786A (en) | 1997-06-27 | 2000-07-11 | Sun Microsystems, Inc. | Method and system for coupling a stack based processor to register based functional unit |
US6760833B1 (en) | 1997-08-01 | 2004-07-06 | Micron Technology, Inc. | Split embedded DRAM processor |
US6226738B1 (en) * | 1997-08-01 | 2001-05-01 | Micron Technology, Inc. | Split embedded DRAM processor |
US6026478A (en) * | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
US6157988A (en) | 1997-08-01 | 2000-12-05 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
JPH1185515A (en) | 1997-09-10 | 1999-03-30 | Ricoh Co Ltd | Microprocessor |
US5923892A (en) * | 1997-10-27 | 1999-07-13 | Levy; Paul S. | Host processor and coprocessor arrangement for processing platform-independent code |
US5978909A (en) | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US6044458A (en) * | 1997-12-12 | 2000-03-28 | Motorola, Inc. | System for monitoring program flow utilizing fixwords stored sequentially to opcodes |
US6014743A (en) * | 1998-02-05 | 2000-01-11 | Intergrated Device Technology, Inc. | Apparatus and method for recording a floating point error pointer in zero cycles |
US6151672A (en) | 1998-02-23 | 2000-11-21 | Hewlett-Packard Company | Methods and apparatus for reducing interference in a branch history table of a microprocessor |
US6374349B2 (en) | 1998-03-19 | 2002-04-16 | Mcfarling Scott | Branch predictor with serially connected predictor stages for improving branch prediction accuracy |
US6377970B1 (en) * | 1998-03-31 | 2002-04-23 | Intel Corporation | Method and apparatus for computing a sum of packed data elements using SIMD multiply circuitry |
US6584585B1 (en) | 1998-05-08 | 2003-06-24 | Gateway, Inc. | Virtual device driver and methods employing the same |
US6289417B1 (en) | 1998-05-18 | 2001-09-11 | Arm Limited | Operand supply to an execution unit |
US6466333B2 (en) | 1998-06-26 | 2002-10-15 | Canon Kabushiki Kaisha | Streamlined tetrahedral interpolation |
US20020053015A1 (en) * | 1998-07-14 | 2002-05-02 | Sony Corporation And Sony Electronics Inc. | Digital signal processor particularly suited for decoding digital audio |
US6327651B1 (en) * | 1998-09-08 | 2001-12-04 | International Business Machines Corporation | Wide shifting in the vector permute unit |
US6253287B1 (en) * | 1998-09-09 | 2001-06-26 | Advanced Micro Devices, Inc. | Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions |
US6339822B1 (en) * | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6671743B1 (en) | 1998-11-13 | 2003-12-30 | Creative Technology, Ltd. | Method and system for exposing proprietary APIs in a privileged device driver to an application |
US6189091B1 (en) * | 1998-12-02 | 2001-02-13 | Ip First, L.L.C. | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection |
US6341348B1 (en) | 1998-12-03 | 2002-01-22 | Sun Microsystems, Inc. | Software branch prediction filtering for a microprocessor |
US6957327B1 (en) * | 1998-12-31 | 2005-10-18 | Stmicroelectronics, Inc. | Block-based branch target buffer |
US6477683B1 (en) | 1999-02-05 | 2002-11-05 | Tensilica, Inc. | Automated processor generation system for designing a configurable processor and method for the same |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6757019B1 (en) * | 1999-03-13 | 2004-06-29 | The Board Of Trustees Of The Leland Stanford Junior University | Low-power parallel processor and imager having peripheral control circuitry |
US6499101B1 (en) * | 1999-03-18 | 2002-12-24 | I.P. First L.L.C. | Static branch prediction mechanism for conditional branch instructions |
US6427206B1 (en) | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
US6560754B1 (en) * | 1999-05-13 | 2003-05-06 | Arc International Plc | Method and apparatus for jump control in a pipelined processor |
US6622240B1 (en) | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6518974B2 (en) * | 1999-07-16 | 2003-02-11 | Intel Corporation | Pixel engine |
JP2001034504A (en) * | 1999-07-19 | 2001-02-09 | Mitsubishi Electric Corp | Source level debugger |
US6772325B1 (en) | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6546481B1 (en) | 1999-11-05 | 2003-04-08 | Ip - First Llc | Split history tables for branch prediction |
US7072398B2 (en) * | 2000-12-06 | 2006-07-04 | Kai-Kuang Ma | System and method for motion vector generation and analysis of digital video clips |
US6609194B1 (en) | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US6909744B2 (en) | 1999-12-09 | 2005-06-21 | Redrock Semiconductor, Inc. | Processor architecture for compression and decompression of video and images |
KR100395763B1 (en) | 2000-02-01 | 2003-08-25 | 삼성전자주식회사 | A branch predictor for microprocessor having multiple processes |
US6412038B1 (en) | 2000-02-14 | 2002-06-25 | Intel Corporation | Integral modular cache for a processor |
US6629167B1 (en) | 2000-02-18 | 2003-09-30 | Hewlett-Packard Development Company, L.P. | Pipeline decoupling buffer for handling early data and late data |
US6865663B2 (en) * | 2000-02-24 | 2005-03-08 | Pts Corporation | Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode |
US6519696B1 (en) * | 2000-03-30 | 2003-02-11 | I.P. First, Llc | Paired register exchange using renaming register map |
US6876703B2 (en) * | 2000-05-11 | 2005-04-05 | Ub Video Inc. | Method and apparatus for video coding |
US7079579B2 (en) * | 2000-07-13 | 2006-07-18 | Samsung Electronics Co., Ltd. | Block matching processor and method for block matching motion estimation in video compression |
US6681295B1 (en) * | 2000-08-31 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Fast lane prefetching |
US6718460B1 (en) * | 2000-09-05 | 2004-04-06 | Sun Microsystems, Inc. | Mechanism for error handling in a computer system |
US20030070013A1 (en) * | 2000-10-27 | 2003-04-10 | Daniel Hansson | Method and apparatus for reducing power consumption in a digital processor |
US6948054B2 (en) * | 2000-11-29 | 2005-09-20 | Lsi Logic Corporation | Simple branch prediction and misprediction recovery method |
KR100386639B1 (en) * | 2000-12-04 | 2003-06-02 | 주식회사 오픈비주얼 | Method for decompression of images and video using regularized dequantizer |
TW477954B (en) * | 2000-12-05 | 2002-03-01 | Faraday Tech Corp | Memory data accessing architecture and method for a processor |
US20020073301A1 (en) * | 2000-12-07 | 2002-06-13 | International Business Machines Corporation | Hardware for use with compiler generated branch information |
US7139903B2 (en) | 2000-12-19 | 2006-11-21 | Hewlett-Packard Development Company, L.P. | Conflict free parallel read access to a bank interleaved branch predictor in a processor |
US6963554B1 (en) | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US20020087851A1 (en) | 2000-12-28 | 2002-07-04 | Matsushita Electric Industrial Co., Ltd. | Microprocessor and an instruction converter |
US8285976B2 (en) | 2000-12-28 | 2012-10-09 | Micron Technology, Inc. | Method and apparatus for predicting branches using a meta predictor |
US6925634B2 (en) | 2001-01-24 | 2005-08-02 | Texas Instruments Incorporated | Method for maintaining cache coherency in software in a shared memory system |
US7039901B2 (en) | 2001-01-24 | 2006-05-02 | Texas Instruments Incorporated | Software shared memory bus |
US6823447B2 (en) | 2001-03-01 | 2004-11-23 | International Business Machines Corporation | Software hint to improve the branch target prediction accuracy |
EP1381957A2 (en) | 2001-03-02 | 2004-01-21 | Atsana Semiconductor Corp. | Data processing apparatus and system and method for controlling memory access |
JP3890910B2 (en) | 2001-03-21 | 2007-03-07 | 株式会社日立製作所 | Instruction execution result prediction device |
US7010558B2 (en) * | 2001-04-19 | 2006-03-07 | Arc International | Data processor with enhanced instruction execution and method |
US20020194462A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US6886093B2 (en) * | 2001-05-04 | 2005-04-26 | Ip-First, Llc | Speculative hybrid branch direction predictor |
US20020194461A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache |
US7165169B2 (en) | 2001-05-04 | 2007-01-16 | Ip-First, Llc | Speculative branch target address cache with selective override by secondary predictor based on branch instruction type |
US7165168B2 (en) | 2003-01-14 | 2007-01-16 | Ip-First, Llc | Microprocessor with branch target address cache update queue |
GB0112275D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for normalization of floating point significands in a simd array mpp |
GB0112269D0 (en) * | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for alignment of floating point significands in a simd array mpp |
US6950929B2 (en) * | 2001-05-24 | 2005-09-27 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device having a coprocessor |
CN1265286C (en) | 2001-06-29 | 2006-07-19 | 皇家菲利浦电子有限公司 | Method, appts. and compiler for predicting indirect branch target addresses |
US6823444B1 (en) | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US7162619B2 (en) * | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
JP4145586B2 (en) * | 2001-07-24 | 2008-09-03 | セイコーエプソン株式会社 | Image processing apparatus, image processing program, and image processing method |
US7010675B2 (en) * | 2001-07-27 | 2006-03-07 | Stmicroelectronics, Inc. | Fetch branch architecture for reducing branch penalty without branch prediction |
US7191445B2 (en) * | 2001-08-31 | 2007-03-13 | Texas Instruments Incorporated | Method using embedded real-time analysis components with corresponding real-time operating system software objects |
JP2003131902A (en) | 2001-10-24 | 2003-05-09 | Toshiba Corp | Software debugger, system-level debugger, debug method and debug program |
US20040054877A1 (en) * | 2001-10-29 | 2004-03-18 | Macy William W. | Method and apparatus for shuffling data |
US7685212B2 (en) | 2001-10-29 | 2010-03-23 | Intel Corporation | Fast full search motion estimation with SIMD merge instruction |
US7272622B2 (en) | 2001-10-29 | 2007-09-18 | Intel Corporation | Method and apparatus for parallel shift right merge of data |
US7051239B2 (en) | 2001-12-28 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip |
WO2003065165A2 (en) | 2002-01-31 | 2003-08-07 | Arc International | Configurable data processor with multi-length instruction set architecture |
US7168067B2 (en) | 2002-02-08 | 2007-01-23 | Agere Systems Inc. | Multiprocessor system with cache-based software breakpoints |
US7529912B2 (en) | 2002-02-12 | 2009-05-05 | Via Technologies, Inc. | Apparatus and method for instruction-level specification of floating point format |
US7181596B2 (en) * | 2002-02-12 | 2007-02-20 | Ip-First, Llc | Apparatus and method for extending a microprocessor instruction set |
US7328328B2 (en) | 2002-02-19 | 2008-02-05 | Ip-First, Llc | Non-temporal memory reference control mechanism |
US7315921B2 (en) | 2002-02-19 | 2008-01-01 | Ip-First, Llc | Apparatus and method for selective memory attribute control |
US7546446B2 (en) | 2002-03-08 | 2009-06-09 | Ip-First, Llc | Selective interrupt suppression |
US7395412B2 (en) | 2002-03-08 | 2008-07-01 | Ip-First, Llc | Apparatus and method for extending data modes in a microprocessor |
US7180943B1 (en) * | 2002-03-26 | 2007-02-20 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Compression of a data stream by selection among a set of compression tools |
US7373483B2 (en) | 2002-04-02 | 2008-05-13 | Ip-First, Llc | Mechanism for extending the number of registers in a microprocessor |
US7302551B2 (en) | 2002-04-02 | 2007-11-27 | Ip-First, Llc | Suppression of store checking |
US7155598B2 (en) | 2002-04-02 | 2006-12-26 | Ip-First, Llc | Apparatus and method for conditional instruction execution |
US7380103B2 (en) | 2002-04-02 | 2008-05-27 | Ip-First, Llc | Apparatus and method for selective control of results write back |
US7185180B2 (en) | 2002-04-02 | 2007-02-27 | Ip-First, Llc | Apparatus and method for selective control of condition code write back |
US20030198295A1 (en) * | 2002-04-12 | 2003-10-23 | Liang-Gee Chen | Global elimination algorithm for motion estimation and the hardware architecture thereof |
US7380109B2 (en) | 2002-04-15 | 2008-05-27 | Ip-First, Llc | Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor |
US20030204705A1 (en) | 2002-04-30 | 2003-10-30 | Oldfield William H. | Prediction of branch instructions in a data processing apparatus |
KR100450753B1 (en) | 2002-05-17 | 2004-10-01 | 한국전자통신연구원 | Programmable variable length decoder including interface of CPU processor |
US6938151B2 (en) | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US6718504B1 (en) | 2002-06-05 | 2004-04-06 | Arc International | Method and apparatus for implementing a data processor adapted for turbo decoding |
US7493480B2 (en) * | 2002-07-18 | 2009-02-17 | International Business Machines Corporation | Method and apparatus for prefetching branch history information |
US7000095B2 (en) * | 2002-09-06 | 2006-02-14 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
AU2003279015A1 (en) * | 2002-09-27 | 2004-04-19 | Videosoft, Inc. | Real-time video coding/decoding |
US20050125634A1 (en) | 2002-10-04 | 2005-06-09 | Fujitsu Limited | Processor and instruction control method |
US6968444B1 (en) | 2002-11-04 | 2005-11-22 | Advanced Micro Devices, Inc. | Microprocessor employing a fixed position dispatch unit |
US8667252B2 (en) * | 2002-11-21 | 2014-03-04 | Stmicroelectronics, Inc. | Method and apparatus to adapt the clock rate of a programmable coprocessor for optimal performance and power dissipation |
US7227901B2 (en) * | 2002-11-21 | 2007-06-05 | Ub Video Inc. | Low-complexity deblocking filter |
US7266676B2 (en) | 2003-03-21 | 2007-09-04 | Analog Devices, Inc. | Method and apparatus for branch prediction based on branch targets utilizing tag and data arrays |
US6774832B1 (en) | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US7590829B2 (en) | 2003-03-31 | 2009-09-15 | Stretch, Inc. | Extension adapter |
US20040193855A1 (en) | 2003-03-31 | 2004-09-30 | Nicolas Kacevas | System and method for branch prediction access |
US7174444B2 (en) | 2003-03-31 | 2007-02-06 | Intel Corporation | Preventing a read of a next sequential chunk in branch prediction of a subject chunk |
US20040225870A1 (en) | 2003-05-07 | 2004-11-11 | Srinivasan Srikanth T. | Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor |
US7010676B2 (en) | 2003-05-12 | 2006-03-07 | International Business Machines Corporation | Last iteration loop branch prediction upon counter threshold and resolution upon counter one |
US7079147B2 (en) * | 2003-05-14 | 2006-07-18 | Lsi Logic Corporation | System and method for cooperative operation of a processor and coprocessor |
US20040252766A1 (en) * | 2003-06-11 | 2004-12-16 | Daeyang Foundation (Sejong University) | Motion vector search method and apparatus |
US20040255104A1 (en) | 2003-06-12 | 2004-12-16 | Intel Corporation | Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor |
US7668897B2 (en) | 2003-06-16 | 2010-02-23 | Arm Limited | Result partitioning within SIMD data processing systems |
US7424501B2 (en) * | 2003-06-30 | 2008-09-09 | Intel Corporation | Nonlinear filtering and deblocking applications utilizing SIMD sign and absolute value operations |
US7539714B2 (en) * | 2003-06-30 | 2009-05-26 | Intel Corporation | Method, apparatus, and instruction for performing a sign operation that multiplies |
US7783871B2 (en) | 2003-06-30 | 2010-08-24 | Intel Corporation | Method to remove stale branch predictions for an instruction prior to execution within a microprocessor |
US7373642B2 (en) * | 2003-07-29 | 2008-05-13 | Stretch, Inc. | Defining instruction extensions in a standard programming language |
US20050027974A1 (en) * | 2003-07-31 | 2005-02-03 | Oded Lempel | Method and system for conserving resources in an instruction pipeline |
US20050024486A1 (en) | 2003-07-31 | 2005-02-03 | Viresh Ratnakar | Video codec system with real-time complexity adaptation |
US7133950B2 (en) * | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
JP2005078234A (en) * | 2003-08-29 | 2005-03-24 | Renesas Technology Corp | Information processor |
US7237098B2 (en) * | 2003-09-08 | 2007-06-26 | Ip-First, Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US20050066305A1 (en) * | 2003-09-22 | 2005-03-24 | Lisanke Robert John | Method and machine for efficient simulation of digital hardware within a software development environment |
US7277592B1 (en) * | 2003-10-21 | 2007-10-02 | Redrock Semiconductory Ltd. | Spacial deblocking method using limited edge differences only to linearly correct blocking artifact |
US7457362B2 (en) | 2003-10-24 | 2008-11-25 | Texas Instruments Incorporated | Loop deblock filtering of block coded video in a very long instruction word processor |
KR100980076B1 (en) * | 2003-10-24 | 2010-09-06 | 삼성전자주식회사 | System and method for branch prediction with low-power consumption |
US7363544B2 (en) * | 2003-10-30 | 2008-04-22 | International Business Machines Corporation | Program debug method and apparatus |
US7219207B2 (en) | 2003-12-03 | 2007-05-15 | Intel Corporation | Reconfigurable trace cache |
US8069336B2 (en) | 2003-12-03 | 2011-11-29 | Globalfoundries Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US7401328B2 (en) | 2003-12-18 | 2008-07-15 | Lsi Corporation | Software-implemented grouping techniques for use in a superscalar data processing system |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US8607209B2 (en) | 2004-02-04 | 2013-12-10 | Bluerisc Inc. | Energy-focused compiler-assisted branch prediction |
US7613911B2 (en) | 2004-03-12 | 2009-11-03 | Arm Limited | Prefetching exception vectors by early lookup exception vectors within a cache memory |
US20050216713A1 (en) | 2004-03-25 | 2005-09-29 | International Business Machines Corporation | Instruction text controlled selectively stated branches for prediction via a branch target buffer |
US7281120B2 (en) | 2004-03-26 | 2007-10-09 | International Business Machines Corporation | Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor |
US20050223202A1 (en) | 2004-03-31 | 2005-10-06 | Intel Corporation | Branch prediction in a pipelined processor |
US20050278517A1 (en) | 2004-05-19 | 2005-12-15 | Kar-Lik Wong | Systems and methods for performing branch prediction in a variable length instruction set microprocessor |
US20060015706A1 (en) * | 2004-06-30 | 2006-01-19 | Chunrong Lai | TLB correlated branch predictor and method for use thereof |
TWI253024B (en) * | 2004-07-20 | 2006-04-11 | Realtek Semiconductor Corp | Method and apparatus for block matching |
TWI305323B (en) * | 2004-08-23 | 2009-01-11 | Faraday Tech Corp | Method for verification branch prediction mechanisms and readable recording medium for storing program thereof |
US20060047934A1 (en) * | 2004-08-31 | 2006-03-02 | Schmisseur Mark A | Integrated circuit capable of memory access control |
WO2006096612A2 (en) | 2005-03-04 | 2006-09-14 | The Trustees Of Columbia University In The City Of New York | System and method for motion estimation and mode decision for low-complexity h.264 decoder |
US8879636B2 (en) | 2007-05-25 | 2014-11-04 | Synopsys, Inc. | Adaptive video encoding apparatus and methods |
-
2006
- 2006-09-28 US US11/528,470 patent/US20070073925A1/en not_active Abandoned
- 2006-09-28 US US11/528,432 patent/US8218635B2/en active Active
- 2006-09-28 US US11/528,327 patent/US7747088B2/en active Active
- 2006-09-28 US US11/528,434 patent/US20070074004A1/en not_active Abandoned
- 2006-09-28 WO PCT/IB2006/003358 patent/WO2007049150A2/en active Application Filing
- 2006-09-28 US US11/528,338 patent/US7971042B2/en active Active
- 2006-09-28 US US11/528,326 patent/US20070074007A1/en not_active Abandoned
- 2006-09-28 US US11/528,325 patent/US8212823B2/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884057A (en) * | 1994-01-11 | 1999-03-16 | Exponential Technology, Inc. | Temporal re-alignment of a floating point pipeline to an integer pipeline for emulation of a load-operate architecture on a load/store processor |
US6529930B1 (en) * | 1998-11-16 | 2003-03-04 | Hitachi America, Ltd. | Methods and apparatus for performing a signed saturation operation |
US20020065860A1 (en) * | 2000-10-04 | 2002-05-30 | Grisenthwaite Richard Roy | Data processing apparatus and method for saturating data values |
US20060015702A1 (en) * | 2002-08-09 | 2006-01-19 | Khan Moinul H | Method and apparatus for SIMD complex arithmetic |
US20060095713A1 (en) * | 2004-11-03 | 2006-05-04 | Stexar Corporation | Clip-and-pack instruction for processor |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
US20070073925A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US20070074012A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline |
US20070074004A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8212823B2 (en) | 2005-09-28 | 2012-07-03 | Synopsys, Inc. | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20070074012A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20070074004A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US20070073925A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US7747088B2 (en) | 2005-09-28 | 2010-06-29 | Arc International (Uk) Limited | System and methods for performing deblocking in microprocessor-based video codec applications |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US8218635B2 (en) | 2005-09-28 | 2012-07-10 | Synopsys, Inc. | Systolic-array based systems and methods for performing block matching in motion compensation |
US8804850B1 (en) | 2007-11-21 | 2014-08-12 | Marvell International, Ltd. | System and method to execute a clipping instruction |
US8437410B1 (en) | 2007-11-21 | 2013-05-07 | Marvell International Ltd. | System and method to execute a clipping instruction |
US8386547B2 (en) | 2008-10-31 | 2013-02-26 | Intel Corporation | Instruction and logic for performing range detection |
TWI470545B (en) * | 2008-10-31 | 2015-01-21 | Intel Corp | Apparatus,processor,system,method,instruction,and logic for performing range detection |
KR101105474B1 (en) | 2008-10-31 | 2012-01-13 | 인텔 코오퍼레이션 | Instruction and logic for performing range detection |
WO2010051298A3 (en) * | 2008-10-31 | 2010-07-08 | Intel Corporation | Instruction and logic for performing range detection |
WO2010051298A2 (en) * | 2008-10-31 | 2010-05-06 | Intel Corporation | Instruction and logic for performing range detection |
US20100180100A1 (en) * | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
US10795680B2 (en) | 2011-04-01 | 2020-10-06 | Intel Corporation | Vector friendly instruction format and execution thereof |
US9513917B2 (en) | 2011-04-01 | 2016-12-06 | Intel Corporation | Vector friendly instruction format and execution thereof |
US11210096B2 (en) | 2011-04-01 | 2021-12-28 | Intel Corporation | Vector friendly instruction format and execution thereof |
US11740904B2 (en) | 2011-04-01 | 2023-08-29 | Intel Corporation | Vector friendly instruction format and execution thereof |
CN104011670A (en) * | 2011-12-22 | 2014-08-27 | 英特尔公司 | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
US20140297991A1 (en) * | 2011-12-22 | 2014-10-02 | Jesus Corbal | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
CN104011668A (en) * | 2011-12-22 | 2014-08-27 | 英特尔公司 | Systems, Apparatuses, And Methods For Mapping A Source Operand To A Different Range |
US9389861B2 (en) * | 2011-12-22 | 2016-07-12 | Intel Corporation | Systems, apparatuses, and methods for mapping a source operand to a different range |
CN106843811A (en) * | 2011-12-22 | 2017-06-13 | 英特尔公司 | System, apparatus and method for source operand to be mapped to different range |
US10157061B2 (en) * | 2011-12-22 | 2018-12-18 | Intel Corporation | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
US20140215186A1 (en) * | 2011-12-22 | 2014-07-31 | Elmoustapha Ould-Ahmed-Vall | Systems, apparatuses, and methods for mapping a source operand to a different range |
US11567775B1 (en) * | 2021-10-25 | 2023-01-31 | Sap Se | Dynamic generation of logic for computing systems |
Also Published As
Publication number | Publication date |
---|---|
US20070070080A1 (en) | 2007-03-29 |
WO2007049150A2 (en) | 2007-05-03 |
US20070071101A1 (en) | 2007-03-29 |
US7971042B2 (en) | 2011-06-28 |
US20070073925A1 (en) | 2007-03-29 |
US7747088B2 (en) | 2010-06-29 |
US8218635B2 (en) | 2012-07-10 |
US20070071106A1 (en) | 2007-03-29 |
WO2007049150A3 (en) | 2007-12-27 |
US20070074004A1 (en) | 2007-03-29 |
US20070074012A1 (en) | 2007-03-29 |
US8212823B2 (en) | 2012-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20070074007A1 (en) | Parameterizable clip instruction and method of performing a clip operation using the same | |
US11669330B2 (en) | Method for performing random read access to a block of data using parallel LUT read instruction in vector processors | |
US7042466B1 (en) | Efficient clip-testing in graphics acceleration | |
US9229718B2 (en) | Method and apparatus for shuffling data | |
US8386754B2 (en) | Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism | |
US7739319B2 (en) | Method and apparatus for parallel table lookup using SIMD instructions | |
US7127593B2 (en) | Conditional execution with multiple destination stores | |
US20100274988A1 (en) | Flexible vector modes of operation for SIMD processor | |
US6502115B2 (en) | Conversion between packed floating point data and packed 32-bit integer data in different architectural registers | |
US7631025B2 (en) | Method and apparatus for rearranging data between multiple registers | |
US10678540B2 (en) | Arithmetic operation with shift | |
US6247116B1 (en) | Conversion from packed floating point data to packed 16-bit integer data in different architectural registers | |
US20010016902A1 (en) | Conversion from packed floating point data to packed 8-bit integer data in different architectural registers | |
US20090100247A1 (en) | Simd permutations with extended range in a data processor | |
US7546442B1 (en) | Fixed length memory to memory arithmetic and architecture for direct memory access using fixed length instructions | |
US20080077772A1 (en) | Method and apparatus for performing select operations | |
US20070061550A1 (en) | Instruction execution in a processor | |
US20060095713A1 (en) | Clip-and-pack instruction for processor | |
US20110072238A1 (en) | Method for variable length opcode mapping in a VLIW processor | |
CN114721624A (en) | Processor, method and system for processing matrix | |
US6275925B1 (en) | Program execution method and program execution device | |
EP3944077A1 (en) | Systems, apparatuses, and methods for generating an index by sort order and reordering elements based on sort order | |
US7430573B2 (en) | Processor | |
US20190102199A1 (en) | Methods and systems for executing vectorized pythagorean tuple instructions | |
US11704124B2 (en) | Instructions for vector multiplication of unsigned words with rounding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |