US20050273559A1 - Microprocessor architecture including unified cache debug unit - Google Patents

Microprocessor architecture including unified cache debug unit Download PDF

Info

Publication number
US20050273559A1
US20050273559A1 US11/132,432 US13243205A US2005273559A1 US 20050273559 A1 US20050273559 A1 US 20050273559A1 US 13243205 A US13243205 A US 13243205A US 2005273559 A1 US2005273559 A1 US 2005273559A1
Authority
US
United States
Prior art keywords
cache
unit
microprocessor
pipeline
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/132,432
Inventor
Aris Aristodemou
Daniel Hansson
Morgyn Taylor
Kar-Lik Wong
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARC International
Original Assignee
ARC International
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARC International filed Critical ARC International
Priority to US11/132,432 priority Critical patent/US20050273559A1/en
Assigned to ARC INTERNATIONAL reassignment ARC INTERNATIONAL ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARISTODEMOU, ARIS, HANSSON, DANIEL, TAYLOR, MORGYN, WONG, KAR-LIK
Publication of US20050273559A1 publication Critical patent/US20050273559A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3648Software debugging using additional hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/325Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3816Instruction alignment, e.g. cache line crossing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3846Speculative instruction execution using static prediction, e.g. branch taken strategy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This invention relates generally to microprocessor architecture and more specifically to an improved cache debug unit for a microprocessor.
  • cache memories have been successful in achieving high performance in many computer systems.
  • cache memories of microprocessor-based systems were provided off-chip using high performance memory components. This was primarily because the amount of silicon area necessary to provide an on-chip cache memory of reasonable performance would have been impractical.
  • Increasing the size of an integrated circuit to accommodate a cache memory adversely impacts the yield of the integrated circuit in a given manufacturing process.
  • the central processing unit looks into the cache memory for a copy of the memory word. If the memory word is found in the cache memory, a cache “hit” is said to have occurred, and the main memory is not accessed. Thus, a figure of merit which can be used to measure the effectiveness of the cache memory is the “hit” ratio.
  • the hit ratio is the percentage of total memory references in which the desired datum is found in the cache memory without accessing the main memory. When the desired datum is not found in the cache memory, a “cache miss” is said to have occurred and the main memory is then accessed for the desired datum.
  • there are portions of the address space which are not mapped to the cache memory.
  • This portion of the address space is said to be “uncached” or “uncacheable”.
  • the addresses assigned to input/output (I/O) devices are almost always uncached. Both a cache miss and an uncacheable memory reference result in an access to the main memory.
  • a known method used to debug a processor utilizes means for observing the program flow during operation of the processor.
  • program observability is relatively straight forward by using probes.
  • observing the program flow of processors having cache integrated on-chip is much more difficult because most of the processing operations are performed internally within the chip.
  • on-chip cache has become standard in most microprocessors designs. Due to difficulties in interfacing with the on-chip cache, debugging systems have also had to move onto the chip. Modern on-chip cache memories may now employ cache debug units directly in the cache memory themselves.
  • a separate cache debug unit which serves as an interface to both the instruction cache and the data cache.
  • the cache debug has shared hardware logic accessible to both the instruction cache and the data cache.
  • a cache debug unit may be selectively switched off or run on a separate clock than the instruction pipeline.
  • an auxiliary unit of the execute stage of the microprocessor core is used to pass instructions to the cache debug unit and to receive responses back from the cache debug unit.
  • the cache debug unit may also access the memory subsystem to perform cache flushes, cache updates and various other debugging functions.
  • At least one exemplary embodiment of the invention provide a microprocessor core comprising a multistage pipeline, a cache debug unit, a data pathway between the cache debug unit and an instruction cache unit, a data pathway between the cache debug unit and a data cache unit, and a data pathway between a unit of the multistage pipeline and the cache debut unit.
  • At least one additional exemplary embodiment provides a microprocessor comprising a multistage pipeline, a data cache unit, an instruction cache unit, and a unified cache debug unit operatively connected to the data cache unit, the instruction cache unit, and the multistage pipeline.
  • Yet another exemplary embodiment of this invention provides a RISC-type microprocessor comprising a multistage pipeline, and a cache debug unit, wherein the cache debug unit comprises an interface to an instruction cache unit of the microprocessor, and an interface to a data cache unit of the microprocessor.
  • FIG. 1 is a block diagram illustrating a processor core in accordance with at least one exemplary embodiment of this invention.
  • FIG. 2 is a block diagram illustrating an architecture for a unified cache debug unit for a microprocessor in accordance with at least one embodiment of this invention.
  • FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention.
  • FIG. 1 features a processor core 100 having a seven stage instruction pipeline.
  • a fetch stage (PET) 110 includes an instruction cache 112 , branch prediction unit (BPU) 114 and connection to instruction ram 190 and a cache debug unit (CDU) 195 .
  • An align stage (ALN) 120 is shown in FIG. 1 following the fetch stage 110 .
  • the align stage 120 formats the words coming from the fetch stage 110 into the appropriate instructions.
  • instructions are fetched from memory in 32-bit words.
  • the fetch stage 110 retrieves or fetches a 32-bit word at a specified fetch address
  • the entry at that fetch address may contain an aligned 16-bit or 32-bit instruction, an unaligned 16 bit instruction preceded by a portion of a previous instruction, or an unaligned portion of a larger instruction preceded by a portion of a previous instruction based on the actual instruction address.
  • a fetched word may have an instruction fetch address of Ox4, but an actual instruction address of Ox6.
  • the 32-bit word fetched from memory is passed to the align stage 120 where it is aligned into a complete instruction.
  • this alignment may include discarding superfluous 16-bit instructions or assembling unaligned 32-bit or larger instructions into a single instructions.
  • the N-bit instruction is forwarded to the decoder (DEC) 130 .
  • DEC decoder
  • an instruction extension interface 180 is also shown which permits interface of customized processor instructions that are used to complement the standard instruction set architecture of the microprocessor. Interfacing of these customized instructions occurs through a timing registered interface to the various stages of the microprocessor pipeline 100 in order to minimize the effect of critical path loading when attaching customized logic to a pre-existing processor pipeline.
  • a custom opcode slot is defined in the extensions instruction interface for the specific custom instruction in order for the microprocessor to correctly acknowledge the presence of a custom instruction 182 as well as the extraction of the source operand addresses that are used to index the register file 142 .
  • the custom instruction flag interface 184 is used to allow the addition of custom instruction flags that are used by the microprocessor for conditional evaluation using either the standard condition code evaluators or custom extension condition code evaluators 184 in order to determine whether the instruction is executed or not based upon the condition evaluation result (EXEC) 150 .
  • a custom ALU interface 186 permits user defined arithmetic and logical extension instructions the result of which are selected in the result select stage 186 .
  • the fast results forwarding block 156 selects the relevant results from a group of simple execution units 154 (comprised of the Normalizing Unit, Barrel Shifter, Logical Unit and Fast Adder) of the execute stage 150 to be written directly to the register file 142 on the same output clock pulse, reducing the number of required clock cycles for non-computationally intensive operations. More complex arithmetic instructions 152 that require an entire cycle to compute their results forward the results in the write back stage (WB) 170 through the select stage (SEL) 160 that contains a results selector 162 that is used to select the correct output from multiple arithmetic units 152 .
  • WB write back stage
  • SEL select stage
  • yet another novel feature of the microprocessor architecture shown in this figure is the inclusion of a cache debug unit (CDU) 195 shown in the example of FIG. 1 as connected to the fetch stage 110 of the instruction pipeline.
  • CDU cache debug unit
  • the cache debug unit 195 will be referred to as a unified cache debug unit.
  • the unified cache debug unit architecture serves as a debug unit for both an instruction cache and a data cache of the microprocessor.
  • FIG. 2 an exemplary architecture of a cache debug unit (CDU) such as that depicted in FIG. 1 is illustrated.
  • the cache debug provides a facility to check if certain things are stored in cache and to selectively change the contents of cache memory. Under certain circumstances it may be necessary to flush cache, pre-load cache, or to look at or change certain locations in a cache based on instructions or current processor pipeline conditions.
  • each of the instruction cache and data cache will be allocated for debug logic.
  • these debug functions are performed off line, rather than at run time, and/or are expected to be slow.
  • there are strong similarities to the debug functions in both the instruction cache and the data cache causing redundant logic to be employed in the processor design, thereby increasing costs and complexity of the design.
  • the debug units are seldom used during runtime, they consume power even when not being specifically invoked because of their inclusion in the instruction and data cache components themselves.
  • a unified cache debug unit 200 such as that shown in FIG. 2 .
  • the unified cache debug unit 200 ameliorates at least some of these problems by providing a single unit that is located separately from the instruction cache 210 and data cache 220 units.
  • the unified cache debug unit 200 may interface with the instruction pipeline through the auxiliary unit 240 .
  • auxiliary unit 240 interface allows the requests to be sent to the CDU 200 and responses to such requests to be received from the CDU 200 . These are labeled as Aux request and Aux response in FIG. 2 .
  • a state control device 250 may dictate to the CDU 200 the current state, such as in the event of pipeline flushes or other system changes which may preempt a previous command from the auxiliary unit 240 .
  • the instruction cache 210 is comprised of an instruction cache RAM 212 , a branch prediction unit (BPU) 214 and a multi-way instruction cache (MWIC) 216 .
  • the CDU 200 communicates with the instruction cache RAM 212 through the BPU 214 via the instruction cache RAM access line 201 labeled I$ RAM Access. In various embodiments, this line only permits contact between the CDU 200 and the instruction cache RAM 212 .
  • Calls to the external memory subsystem 230 are made through the multi-way instruction cache (MWIC) 216 , over request fill line 202 . For example, if the CDU 200 needs to pull a piece of information from the memory subsystem 230 to the instruction cache RAM 212 the path through the request fill line 202 is used.
  • the structure of the data cache 220 in some respects mirrors that of the instruction cache 210 .
  • the data cache 220 is comprised of a data cache RAM 222 , a data cache RAM control 224 and a data burst unit 226 .
  • the CDU 200 communicates with the data cache RAM 222 through the data cache RAM control 224 via the data cache RAM access line 203 . In various embodiments, this line may permit communication between the CDU 200 and the data cache RAM 222 only.
  • calls to the external memory subsystem 230 through the data cache 220 are made through the data burst unit (DBU) 226 , over fill/flush request line 204 .
  • DBU data burst unit
  • the data cache 220 may contain data not stored in the memory subsystem 230
  • the CDU 200 may need to take data from the data cache 220 and write it to the memory subsystem 230 .
  • the cache debug unit 200 because the CDU 200 is located outside of both the instruction cache 210 and the data cache 220 , the architecture of each of these structures is simplified. Moreover, because in various exemplary embodiments, the CDU 200 may be selectively turned off when it is not being used, less power will be consumed than with conventional cache-based debug units which receive power even when not in use. In various embodiments, the cache debug unit 200 remains powered off until a call is received from the auxiliary unit 240 or until the pipeline determines that an instruction from the auxiliary unit 240 to the cache debug unit 200 is in the pipeline. In various embodiments, the cache debug unit will remain powered on until an instruction is received to power off.
  • the cache debug unit 200 will power off after all requested information has been sent back to the auxiliary unit 240 . Moreover, because conventional instruction and data cache debug units have similar structure, reduction in total amount of silicon may be achieved due to shared logic hardware in the CDU 200 .

Abstract

A microprocessor architecture including a unified cache debug unit. A debug unit on the processor chip receives data/command signals from a unit of the execute stage of the multi-stage instruction pipeline of the processor and returns information to the execute stage unit. The cache debug unit is operatively connected to both instruction and data cache units of the microprocessor. The memory subsystem of the processor may be accessed by the cache debug unit through either of the instruction or data cache units. By unifying the cache debug in a separate structure, the need for redundant debug structure in both cache units is obviated. Also, the unified cache debug unit can be powered down when not accessed by the instruction pipeline, thereby saving power.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture,” hereby incorporated by reference in its entirety.
  • FIELD OF THE INVENTION
  • This invention relates generally to microprocessor architecture and more specifically to an improved cache debug unit for a microprocessor.
  • BACKGROUND OF THE INVENTION
  • A major focus of microprocessor design has been to increase effective clock speed through hardware simplifications. Exploiting the property of locality of memory references, cache memories have been successful in achieving high performance in many computer systems. In the past, cache memories of microprocessor-based systems were provided off-chip using high performance memory components. This was primarily because the amount of silicon area necessary to provide an on-chip cache memory of reasonable performance would have been impractical. Increasing the size of an integrated circuit to accommodate a cache memory adversely impacts the yield of the integrated circuit in a given manufacturing process. However, with the density achieved recently in integrated circuit technology, it is now possible to provide on-chip cache memory economically.
  • In a computer system with a cache memory, when a memory word is needed, the central processing unit (CPU) looks into the cache memory for a copy of the memory word. If the memory word is found in the cache memory, a cache “hit” is said to have occurred, and the main memory is not accessed. Thus, a figure of merit which can be used to measure the effectiveness of the cache memory is the “hit” ratio. The hit ratio is the percentage of total memory references in which the desired datum is found in the cache memory without accessing the main memory. When the desired datum is not found in the cache memory, a “cache miss” is said to have occurred and the main memory is then accessed for the desired datum. In addition, in many computer systems there are portions of the address space which are not mapped to the cache memory. This portion of the address space is said to be “uncached” or “uncacheable”. For example, the addresses assigned to input/output (I/O) devices are almost always uncached. Both a cache miss and an uncacheable memory reference result in an access to the main memory.
  • In the course of developing or debugging a computer system, it is often necessary to monitor program execution by the CPU or to interrupt one instruction stream to direct the CPU to execute certain alternate instructions. A known method used to debug a processor utilizes means for observing the program flow during operation of the processor. With systems having off-chip cache, program observability is relatively straight forward by using probes. However, observing the program flow of processors having cache integrated on-chip is much more difficult because most of the processing operations are performed internally within the chip.
  • As integrated circuit manufacturing techniques have improved, on-chip cache has become standard in most microprocessors designs. Due to difficulties in interfacing with the on-chip cache, debugging systems have also had to move onto the chip. Modern on-chip cache memories may now employ cache debug units directly in the cache memory themselves.
  • There is therefore a need for a cached processor having relatively simple design, reduced silicon footprint and reduced power consumption that allows the real time capture of data in the cached processor for debug purposes and which can be used at high frequencies.
  • It should be appreciated that the description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
  • As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
  • SUMMARY OF THE INVENTION
  • Various embodiments of the invention are disclosed that overcome one or more of the shortcomings of conventional microprocessors through a microprocessor architecture having a unified cache debug unit. In these embodiments, a separate cache debug unit is provided which serves as an interface to both the instruction cache and the data cache. In various exemplary embodiments, the cache debug has shared hardware logic accessible to both the instruction cache and the data cache. In various exemplary embodiments, a cache debug unit may be selectively switched off or run on a separate clock than the instruction pipeline. In various exemplary embodiments, an auxiliary unit of the execute stage of the microprocessor core is used to pass instructions to the cache debug unit and to receive responses back from the cache debug unit. Through the instruction cache and data cache respectively, the cache debug unit may also access the memory subsystem to perform cache flushes, cache updates and various other debugging functions.
  • At least one exemplary embodiment of the invention provide a microprocessor core comprising a multistage pipeline, a cache debug unit, a data pathway between the cache debug unit and an instruction cache unit, a data pathway between the cache debug unit and a data cache unit, and a data pathway between a unit of the multistage pipeline and the cache debut unit.
  • At least one additional exemplary embodiment provides a microprocessor comprising a multistage pipeline, a data cache unit, an instruction cache unit, and a unified cache debug unit operatively connected to the data cache unit, the instruction cache unit, and the multistage pipeline.
  • Yet another exemplary embodiment of this invention provides a RISC-type microprocessor comprising a multistage pipeline, and a cache debug unit, wherein the cache debug unit comprises an interface to an instruction cache unit of the microprocessor, and an interface to a data cache unit of the microprocessor.
  • Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating a processor core in accordance with at least one exemplary embodiment of this invention; and
  • FIG. 2 is a block diagram illustrating an architecture for a unified cache debug unit for a microprocessor in accordance with at least one embodiment of this invention.
  • DETAILED DESCRIPTION OF THE DISCLOSURE
  • The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
  • Discussion of the invention will now made by way of example in reference to the various drawing figures. FIG. 1 illustrates in block diagram form, an architecture for a microprocessor core 100 and peripheral hardware structure in accordance with at least one exemplary embodiment of this invention. Several novel features will be apparent from FIG. 1 which distinguish the illustrated microprocessor architecture from that of a conventional microprocessor architecture. Firstly, the microprocessor architecture of FIG. 1 features a processor core 100 having a seven stage instruction pipeline. A fetch stage (PET) 110 includes an instruction cache 112, branch prediction unit (BPU) 114 and connection to instruction ram 190 and a cache debug unit (CDU) 195. An align stage (ALN) 120 is shown in FIG. 1 following the fetch stage 110.
  • Because the microprocessor core 100 shown in FIG. 1 is operable to work with a variable bit-length instruction set, namely, 16-bits, 32-bits, 48-bits or 64-bits, the align stage 120 formats the words coming from the fetch stage 110 into the appropriate instructions. In various exemplary embodiments, instructions are fetched from memory in 32-bit words. Thus, when the fetch stage 110 retrieves or fetches a 32-bit word at a specified fetch address, the entry at that fetch address may contain an aligned 16-bit or 32-bit instruction, an unaligned 16 bit instruction preceded by a portion of a previous instruction, or an unaligned portion of a larger instruction preceded by a portion of a previous instruction based on the actual instruction address. For example, a fetched word may have an instruction fetch address of Ox4, but an actual instruction address of Ox6. In various exemplary embodiments, the 32-bit word fetched from memory is passed to the align stage 120 where it is aligned into a complete instruction. In various exemplary embodiments, this alignment may include discarding superfluous 16-bit instructions or assembling unaligned 32-bit or larger instructions into a single instructions. After completely assembling the instruction, the N-bit instruction is forwarded to the decoder (DEC) 130.
  • Still referring to FIG. 1, an instruction extension interface 180 is also shown which permits interface of customized processor instructions that are used to complement the standard instruction set architecture of the microprocessor. Interfacing of these customized instructions occurs through a timing registered interface to the various stages of the microprocessor pipeline 100 in order to minimize the effect of critical path loading when attaching customized logic to a pre-existing processor pipeline. Specifically, a custom opcode slot is defined in the extensions instruction interface for the specific custom instruction in order for the microprocessor to correctly acknowledge the presence of a custom instruction 182 as well as the extraction of the source operand addresses that are used to index the register file 142. The custom instruction flag interface 184 is used to allow the addition of custom instruction flags that are used by the microprocessor for conditional evaluation using either the standard condition code evaluators or custom extension condition code evaluators 184 in order to determine whether the instruction is executed or not based upon the condition evaluation result (EXEC) 150. A custom ALU interface 186 permits user defined arithmetic and logical extension instructions the result of which are selected in the result select stage 186.
  • Another novel feature of the microprocessor architecture illustrated in FIG. 1 is the fast results forwarding block 156 in the execute stage 150 of the instruction pipeline. The fast result forwarding block 156 selects the relevant results from a group of simple execution units 154 (comprised of the Normalizing Unit, Barrel Shifter, Logical Unit and Fast Adder) of the execute stage 150 to be written directly to the register file 142 on the same output clock pulse, reducing the number of required clock cycles for non-computationally intensive operations. More complex arithmetic instructions 152 that require an entire cycle to compute their results forward the results in the write back stage (WB) 170 through the select stage (SEL) 160 that contains a results selector 162 that is used to select the correct output from multiple arithmetic units 152.
  • With continued reference to FIG. 1, yet another novel feature of the microprocessor architecture shown in this figure is the inclusion of a cache debug unit (CDU) 195 shown in the example of FIG. 1 as connected to the fetch stage 110 of the instruction pipeline. Throughout this specification and claims the cache debug unit 195 will be referred to as a unified cache debug unit. In various embodiments, the unified cache debug unit architecture serves as a debug unit for both an instruction cache and a data cache of the microprocessor.
  • Referring now to FIG. 2, an exemplary architecture of a cache debug unit (CDU) such as that depicted in FIG. 1 is illustrated. In general, the cache debug provides a facility to check if certain things are stored in cache and to selectively change the contents of cache memory. Under certain circumstances it may be necessary to flush cache, pre-load cache, or to look at or change certain locations in a cache based on instructions or current processor pipeline conditions.
  • As noted herein, in a conventional microprocessor architecture employing cache debug, a portion of each of the instruction cache and data cache will be allocated for debug logic. Usually, however, these debug functions are performed off line, rather than at run time, and/or are expected to be slow. Furthermore, there are strong similarities to the debug functions in both the instruction cache and the data cache causing redundant logic to be employed in the processor design, thereby increasing costs and complexity of the design. Although the debug units are seldom used during runtime, they consume power even when not being specifically invoked because of their inclusion in the instruction and data cache components themselves.
  • In various exemplary embodiments, this design drawback of conventional cache debug units is overcome by a unified cache debug unit 200, such as that shown in FIG. 2. The unified cache debug unit 200 ameliorates at least some of these problems by providing a single unit that is located separately from the instruction cache 210 and data cache 220 units. In various exemplary embodiments, the unified cache debug unit 200 may interface with the instruction pipeline through the auxiliary unit 240. In various embodiments, auxiliary unit 240 interface allows the requests to be sent to the CDU 200 and responses to such requests to be received from the CDU 200. These are labeled as Aux request and Aux response in FIG. 2. In the example shown in FIG. 2, a state control device 250 may dictate to the CDU 200 the current state, such as in the event of pipeline flushes or other system changes which may preempt a previous command from the auxiliary unit 240.
  • As shown in the exemplary embodiment illustrated in FIG. 2, the instruction cache 210 is comprised of an instruction cache RAM 212, a branch prediction unit (BPU) 214 and a multi-way instruction cache (MWIC) 216. In various embodiments, the CDU 200 communicates with the instruction cache RAM 212 through the BPU 214 via the instruction cache RAM access line 201 labeled I$ RAM Access. In various embodiments, this line only permits contact between the CDU 200 and the instruction cache RAM 212. Calls to the external memory subsystem 230, are made through the multi-way instruction cache (MWIC) 216, over request fill line 202. For example, if the CDU 200 needs to pull a piece of information from the memory subsystem 230 to the instruction cache RAM 212 the path through the request fill line 202 is used.
  • With continued reference to FIG. 2, in various exemplary embodiments, the structure of the data cache 220, in some respects mirrors that of the instruction cache 210. In the example illustrated in FIG. 2, the data cache 220 is comprised of a data cache RAM 222, a data cache RAM control 224 and a data burst unit 226. In various exemplary embodiments, the CDU 200 communicates with the data cache RAM 222 through the data cache RAM control 224 via the data cache RAM access line 203. In various embodiments, this line may permit communication between the CDU 200 and the data cache RAM 222 only. Thus, in various embodiments, calls to the external memory subsystem 230 through the data cache 220, are made through the data burst unit (DBU) 226, over fill/flush request line 204. Because, in various embodiments, the data cache 220 may contain data not stored in the memory subsystem 230, the CDU 200 may need to take data from the data cache 220 and write it to the memory subsystem 230.
  • In various exemplary embodiments, because the CDU 200 is located outside of both the instruction cache 210 and the data cache 220, the architecture of each of these structures is simplified. Moreover, because in various exemplary embodiments, the CDU 200 may be selectively turned off when it is not being used, less power will be consumed than with conventional cache-based debug units which receive power even when not in use. In various embodiments, the cache debug unit 200 remains powered off until a call is received from the auxiliary unit 240 or until the pipeline determines that an instruction from the auxiliary unit 240 to the cache debug unit 200 is in the pipeline. In various embodiments, the cache debug unit will remain powered on until an instruction is received to power off. However, in various other embodiments, the cache debug unit 200 will power off after all requested information has been sent back to the auxiliary unit 240. Moreover, because conventional instruction and data cache debug units have similar structure, reduction in total amount of silicon may be achieved due to shared logic hardware in the CDU 200.
  • While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only. The embodiments of the present invention are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to cache debug unit in an RISC-type embedded microprocessor, the principles herein are equally applicable to cache debug units in microprocessors in general. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although the embodiments of the present inventions have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the embodiments of the present inventions as disclosed herein.

Claims (21)

1. In a microprocessor, a microprocessor core comprising:
a multistage pipeline;
a cache debug unit;
a data pathway between the cache debug unit and an instruction cache unit;
a data pathway between the cache debug unit and a data cache unit; and
a data pathway between a unit of the multistage pipeline and the cache debut unit.
2. The microprocessor according to claim 1, wherein the unit of the multistage pipeline is an auxiliary unit of an execute stage of the pipeline.
3. The microprocessor according to claim 1, further comprising a state control unit adapted to provide a current state of the pipeline to the cache debug unit.
4. The microprocessor according to claim 3, wherein a current state comprises at least one of a pipeline flush or other system change that preempts a previous command from the pipeline.
5. The microprocessor according to claim 1, further comprising a data pathway between the cache debug unit and a memory subsystem of the microprocessor through each of the instruction cache and data cache units.
6. The microprocessor according to claim 1, further comprising a power management control adapted to selectively power down the cache debug unit when not in demand by the microprocessor.
7. The microprocessor according to claim 1, wherein the microprocessor core is a RISC-type embedded microprocessor core.
8. A microprocessor comprising:
a multistage pipeline;
a data cache unit;
an instruction cache unit; and
a unified cache debug unit operatively connected to the data cache unit, the instruction cache unit, and the multistage pipeline.
9. The microprocessor according to claim 8, wherein the unified cache debug unit is operatively connected to the multistage pipeline through an auxiliary unit in an execute stage of the multistage pipeline.
10. The microprocessor according to claim 8, further comprising a state control unit adapted to provide a current state of the pipeline to the unified cache debug unit.
11. The microprocessor according to claim 10, wherein a current state comprises at least one of a pipeline flush or other system change that preempts a previous command from the multistage pipeline.
12. The microprocessor according to claim 8, further comprising a data pathway between the unified cache debug unit and a memory subsystem of the microprocessor through each of the instruction cache and data cache units.
13. The microprocessor according to claim 8, further comprising a power management control adapted to selectively power down the cache debug unit when not in demand by the microprocessor.
14. The microprocessor according to claim 8, wherein the architecture is a RISC-type embedded microprocessor architecture.
15. A RISC-type microprocessor comprising:
a multistage pipeline; and
a cache debug unit, wherein the cache debug unit comprises:
an interface to an instruction cache unit of the microprocessor; and
an interface to a data cache unit of the microprocessor.
16. The microprocessor according to claim 15, further comprising an interface between the cache debug unit and at least one stage of the multistage pipeline.
17. The microprocessor according to claim 16, wherein the at least one stage of the multistage pipeline comprises an auxiliary unit of an execute stage of the multistage pipeline.
18. The microprocessor according to claim 15, further comprising a state control unit adapted to provide a current state of the multistage pipeline to the cache debug unit.
19. The microprocessor according to claim 18, wherein a current state comprises at least one of a pipeline flush or other system change that preempts a previous command from the unit of the multistage pipeline.
20. The microprocessor according to claim 15, further comprising an interface between the cache debug unit and a memory subsystem through each of the instruction cache and data cache units.
21. The microprocessor according to claim 15, further comprising a power management control adapted to selectively power down the cache debug unit when not in demand by the multistage pipeline.
US11/132,432 2004-05-19 2005-05-19 Microprocessor architecture including unified cache debug unit Abandoned US20050273559A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/132,432 US20050273559A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including unified cache debug unit

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US57223804P 2004-05-19 2004-05-19
US11/132,432 US20050273559A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including unified cache debug unit

Publications (1)

Publication Number Publication Date
US20050273559A1 true US20050273559A1 (en) 2005-12-08

Family

ID=35429033

Family Applications (7)

Application Number Title Priority Date Filing Date
US11/132,428 Abandoned US20050278517A1 (en) 2004-05-19 2005-05-19 Systems and methods for performing branch prediction in a variable length instruction set microprocessor
US11/132,424 Active 2031-02-12 US8719837B2 (en) 2004-05-19 2005-05-19 Microprocessor architecture having extendible logic
US11/132,447 Abandoned US20050278505A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US11/132,423 Abandoned US20050278513A1 (en) 2004-05-19 2005-05-19 Systems and methods of dynamic branch prediction in a microprocessor
US11/132,448 Abandoned US20050289323A1 (en) 2004-05-19 2005-05-19 Barrel shifter for a microprocessor
US11/132,432 Abandoned US20050273559A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including unified cache debug unit
US14/222,194 Active US9003422B2 (en) 2004-05-19 2014-03-21 Microprocessor architecture having extendible logic

Family Applications Before (5)

Application Number Title Priority Date Filing Date
US11/132,428 Abandoned US20050278517A1 (en) 2004-05-19 2005-05-19 Systems and methods for performing branch prediction in a variable length instruction set microprocessor
US11/132,424 Active 2031-02-12 US8719837B2 (en) 2004-05-19 2005-05-19 Microprocessor architecture having extendible logic
US11/132,447 Abandoned US20050278505A1 (en) 2004-05-19 2005-05-19 Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US11/132,423 Abandoned US20050278513A1 (en) 2004-05-19 2005-05-19 Systems and methods of dynamic branch prediction in a microprocessor
US11/132,448 Abandoned US20050289323A1 (en) 2004-05-19 2005-05-19 Barrel shifter for a microprocessor

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/222,194 Active US9003422B2 (en) 2004-05-19 2014-03-21 Microprocessor architecture having extendible logic

Country Status (5)

Country Link
US (7) US20050278517A1 (en)
CN (1) CN101002169A (en)
GB (1) GB2428842A (en)
TW (1) TW200602974A (en)
WO (1) WO2005114441A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050278505A1 (en) * 2004-05-19 2005-12-15 Lim Seow C Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US7971042B2 (en) 2005-09-28 2011-06-28 Synopsys, Inc. Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline
US20110320716A1 (en) * 2010-06-24 2011-12-29 International Business Machines Corporation Loading and unloading a memory element for debug
US20150149984A1 (en) * 2013-11-22 2015-05-28 International Business Machines Corporation Determining instruction execution history in a debugger
USRE47851E1 (en) * 2006-09-28 2020-02-11 Rambus Inc. Data processing system having cache memory debugging support and method therefor

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7577795B2 (en) * 2006-01-25 2009-08-18 International Business Machines Corporation Disowning cache entries on aging out of the entry
US20070260862A1 (en) * 2006-05-03 2007-11-08 Mcfarling Scott Providing storage in a memory hierarchy for prediction information
US7752468B2 (en) 2006-06-06 2010-07-06 Intel Corporation Predict computing platform memory power utilization
US7716460B2 (en) * 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US7529909B2 (en) * 2006-12-28 2009-05-05 Microsoft Corporation Security verified reconfiguration of execution datapath in extensible microcomputer
US7779241B1 (en) 2007-04-10 2010-08-17 Dunn David A History based pipelined branch prediction
US8209488B2 (en) * 2008-02-01 2012-06-26 International Business Machines Corporation Techniques for prediction-based indirect data prefetching
US8166277B2 (en) * 2008-02-01 2012-04-24 International Business Machines Corporation Data prefetching using indirect addressing
US9519480B2 (en) * 2008-02-11 2016-12-13 International Business Machines Corporation Branch target preloading using a multiplexer and hash circuit to reduce incorrect branch predictions
US9201655B2 (en) * 2008-03-19 2015-12-01 International Business Machines Corporation Method, computer program product, and hardware product for eliminating or reducing operand line crossing penalty
US8181003B2 (en) * 2008-05-29 2012-05-15 Axis Semiconductor, Inc. Instruction set design, control and communication in programmable microprocessor cores and the like
US8131982B2 (en) * 2008-06-13 2012-03-06 International Business Machines Corporation Branch prediction instructions having mask values involving unloading and loading branch history data
US8225069B2 (en) * 2009-03-31 2012-07-17 Intel Corporation Control of on-die system fabric blocks
US10338923B2 (en) * 2009-05-05 2019-07-02 International Business Machines Corporation Branch prediction path wrong guess instruction
JP5423156B2 (en) * 2009-06-01 2014-02-19 富士通株式会社 Information processing apparatus and branch prediction method
US8954714B2 (en) * 2010-02-01 2015-02-10 Altera Corporation Processor with cycle offsets and delay lines to allow scheduling of instructions through time
US8521999B2 (en) * 2010-03-11 2013-08-27 International Business Machines Corporation Executing touchBHT instruction to pre-fetch information to prediction mechanism for branch with taken history
CN104011646B (en) 2011-12-22 2018-03-27 英特尔公司 For processor, method, system and the instruction of the sequence for producing the continuous integral number according to numerical order
WO2013095564A1 (en) 2011-12-22 2013-06-27 Intel Corporation Processors, methods, systems, and instructions to generate sequences of integers in numerical order that differ by a constant stride
WO2013095563A1 (en) 2011-12-22 2013-06-27 Intel Corporation Packed data rearrangement control indexes precursors generation processors, methods, systems, and instructions
US10223112B2 (en) 2011-12-22 2019-03-05 Intel Corporation Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset
US9395994B2 (en) 2011-12-30 2016-07-19 Intel Corporation Embedded branch prediction unit
WO2013147879A1 (en) * 2012-03-30 2013-10-03 Intel Corporation Dynamic branch hints using branches-to-nowhere conditional branch
US9152424B2 (en) 2012-06-14 2015-10-06 International Business Machines Corporation Mitigating instruction prediction latency with independently filtered presence predictors
US9135012B2 (en) 2012-06-14 2015-09-15 International Business Machines Corporation Instruction filtering
KR101826080B1 (en) * 2012-06-15 2018-02-06 인텔 코포레이션 A virtual load store queue having a dynamic dispatch window with a unified structure
US9378017B2 (en) * 2012-12-29 2016-06-28 Intel Corporation Apparatus and method of efficient vector roll operation
CN103425498B (en) * 2013-08-20 2018-07-24 复旦大学 A kind of long instruction words command memory of low-power consumption and its method for optimizing power consumption
US9870226B2 (en) * 2014-07-03 2018-01-16 The Regents Of The University Of Michigan Control of switching between executed mechanisms
US9910670B2 (en) 2014-07-09 2018-03-06 Intel Corporation Instruction set for eliminating misaligned memory accesses during processing of an array having misaligned data rows
US9740607B2 (en) 2014-09-03 2017-08-22 Micron Technology, Inc. Swap operations in memory
TWI569207B (en) * 2014-10-28 2017-02-01 上海兆芯集成電路有限公司 Fractional use of prediction history storage for operating system routines
US9665374B2 (en) * 2014-12-18 2017-05-30 Intel Corporation Binary translation mechanism
CN107533461B (en) * 2015-04-24 2022-03-18 优创半导体科技有限公司 Computer processor with different registers for addressing memory
US10346168B2 (en) * 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10776115B2 (en) * 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
US10664280B2 (en) * 2015-11-09 2020-05-26 MIPS Tech, LLC Fetch ahead branch target buffer
GB2548601B (en) * 2016-03-23 2019-02-13 Advanced Risc Mach Ltd Processing vector instructions
US10599428B2 (en) 2016-03-23 2020-03-24 Arm Limited Relaxed execution of overlapping mixed-scalar-vector instructions
US10192281B2 (en) * 2016-07-07 2019-01-29 Intel Corporation Graphics command parsing mechanism
WO2018149495A1 (en) * 2017-02-16 2018-08-23 Huawei Technologies Co., Ltd. A method and system to fetch multicore instruction traces from a virtual platform emulator to a performance simulation model
US9959247B1 (en) 2017-02-17 2018-05-01 Google Llc Permuting in a matrix-vector processor
CN107179895B (en) * 2017-05-17 2020-08-28 北京中科睿芯科技有限公司 Method for accelerating instruction execution speed in data stream structure by applying composite instruction
US10902348B2 (en) 2017-05-19 2021-01-26 International Business Machines Corporation Computerized branch predictions and decisions
GB2564390B (en) * 2017-07-04 2019-10-02 Advanced Risc Mach Ltd An apparatus and method for controlling use of a register cache
US11114138B2 (en) 2017-09-15 2021-09-07 Groq, Inc. Data structures with multiple read ports
US11360934B1 (en) 2017-09-15 2022-06-14 Groq, Inc. Tensor streaming processor architecture
US11868804B1 (en) 2019-11-18 2024-01-09 Groq, Inc. Processor instruction dispatch configuration
US11243880B1 (en) 2017-09-15 2022-02-08 Groq, Inc. Processor architecture
US11170307B1 (en) 2017-09-21 2021-11-09 Groq, Inc. Predictive model compiler for generating a statically scheduled binary with known resource constraints
US10372459B2 (en) 2017-09-21 2019-08-06 Qualcomm Incorporated Training and utilization of neural branch predictor
US20200065112A1 (en) * 2018-08-22 2020-02-27 Qualcomm Incorporated Asymmetric speculative/nonspeculative conditional branching
US11204976B2 (en) 2018-11-19 2021-12-21 Groq, Inc. Expanded kernel generation
US11163577B2 (en) 2018-11-26 2021-11-02 International Business Machines Corporation Selectively supporting static branch prediction settings only in association with processor-designated types of instructions
US11086631B2 (en) 2018-11-30 2021-08-10 Western Digital Technologies, Inc. Illegal instruction exception handling
CN109783384A (en) * 2019-01-10 2019-05-21 未来电视有限公司 Log use-case test method, log use-case test device and electronic equipment
US11182166B2 (en) 2019-05-23 2021-11-23 Samsung Electronics Co., Ltd. Branch prediction throughput by skipping over cachelines without branches
CN110442382B (en) * 2019-07-31 2021-06-15 西安芯海微电子科技有限公司 Prefetch cache control method, device, chip and computer readable storage medium
CN110727463B (en) * 2019-09-12 2021-08-10 无锡江南计算技术研究所 Zero-level instruction circular buffer prefetching method and device based on dynamic credit
CN114930351A (en) 2019-11-26 2022-08-19 格罗克公司 Loading operands from a multidimensional array and outputting results using only a single side
CN112015490A (en) * 2020-11-02 2020-12-01 鹏城实验室 Method, apparatus and medium for programmable device implementing and testing reduced instruction set
CN113076277A (en) * 2021-03-26 2021-07-06 大唐微电子技术有限公司 Method and device for realizing pipeline scheduling, computer storage medium and terminal
US11599358B1 (en) 2021-08-12 2023-03-07 Tenstorrent Inc. Pre-staged instruction registers for variable length instruction set machine
US11663007B2 (en) * 2021-10-01 2023-05-30 Arm Limited Control of branch prediction for zero-overhead loop
CN115495155B (en) * 2022-11-18 2023-03-24 北京数渡信息科技有限公司 Hardware circulation processing device suitable for general processor
CN117193861B (en) * 2023-11-07 2024-03-15 芯来智融半导体科技(上海)有限公司 Instruction processing method, apparatus, computer device and storage medium

Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5423011A (en) * 1992-06-11 1995-06-06 International Business Machines Corporation Apparatus for initializing branch prediction information
US5450586A (en) * 1991-08-14 1995-09-12 Hewlett-Packard Company System for analyzing and debugging embedded software through dynamic and interactive use of code markers
US5493687A (en) * 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5530825A (en) * 1994-04-15 1996-06-25 Motorola, Inc. Data processor with branch target address cache and method of operation
US5560036A (en) * 1989-12-14 1996-09-24 Mitsubishi Denki Kabushiki Kaisha Data processing having incircuit emulation function
US5586279A (en) * 1993-02-03 1996-12-17 Motorola Inc. Data processing system and method for testing a data processor having a cache memory
US5636363A (en) * 1991-06-14 1997-06-03 Integrated Device Technology, Inc. Hardware control structure and method for off-chip monitoring entries of an on-chip cache
US5809293A (en) * 1994-07-29 1998-09-15 International Business Machines Corporation System and method for program execution tracing within an integrated processor
US5808876A (en) * 1997-06-20 1998-09-15 International Business Machines Corporation Multi-function power distribution system
US5848264A (en) * 1996-10-25 1998-12-08 S3 Incorporated Debug and video queue for multi-processor chip
US5920711A (en) * 1995-06-02 1999-07-06 Synopsys, Inc. System for frame-based protocol, graphical capture, synthesis, analysis, and simulation
US5964884A (en) * 1996-09-30 1999-10-12 Advanced Micro Devices, Inc. Self-timed pulse control circuit
US5978909A (en) * 1997-11-26 1999-11-02 Intel Corporation System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer
US6154857A (en) * 1997-04-08 2000-11-28 Advanced Micro Devices, Inc. Microprocessor-based device incorporating a cache for capturing software performance profiling data
US6185732B1 (en) * 1997-04-08 2001-02-06 Advanced Micro Devices, Inc. Software debug port for a microprocessor
US6292879B1 (en) * 1995-10-25 2001-09-18 Anthony S. Fong Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer
US20020100020A1 (en) * 2001-01-24 2002-07-25 Hunter Jeff L. Method for maintaining cache coherency in software in a shared memory system
US20020100019A1 (en) * 2001-01-24 2002-07-25 Hunter Jeff L. Software shared memory bus
US20030046614A1 (en) * 2001-08-31 2003-03-06 Brokish Charles W. System and method for using embedded real-time analysis components
US6550056B1 (en) * 1999-07-19 2003-04-15 Mitsubishi Denki Kabushiki Kaisha Source level debugger for debugging source programs
US20030126508A1 (en) * 2001-12-28 2003-07-03 Timothe Litt Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip
US20030154463A1 (en) * 2002-02-08 2003-08-14 Betker Michael Richard Multiprocessor system with cache-based software breakpoints
US6609194B1 (en) * 1999-11-12 2003-08-19 Ip-First, Llc Apparatus for performing branch target address calculation based on branch type
US6622240B1 (en) * 1999-06-18 2003-09-16 Intrinsity, Inc. Method and apparatus for pre-branch instruction
US6774832B1 (en) * 2003-03-25 2004-08-10 Raytheon Company Multi-bit output DDS with real time delta sigma modulation look up from memory
US6823444B1 (en) * 2001-07-03 2004-11-23 Ip-First, Llc Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap
US20050097398A1 (en) * 2003-10-30 2005-05-05 International Business Machines Corporation Program debug method and apparatus
US6963554B1 (en) * 2000-12-27 2005-11-08 National Semiconductor Corporation Microwire dynamic sequencer pipeline stall
US7093165B2 (en) * 2001-10-24 2006-08-15 Kabushiki Kaisha Toshiba Debugging Method

Family Cites Families (193)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4342082A (en) 1977-01-13 1982-07-27 International Business Machines Corp. Program instruction mechanism for shortened recursive handling of interruptions
US4216539A (en) 1978-05-05 1980-08-05 Zehntel, Inc. In-circuit digital tester
US4400773A (en) 1980-12-31 1983-08-23 International Business Machines Corp. Independent handling of I/O interrupt requests and associated status information transfers
US4594659A (en) * 1982-10-13 1986-06-10 Honeywell Information Systems Inc. Method and apparatus for prefetching instructions for a central execution pipeline unit
JPS63225822A (en) 1986-08-11 1988-09-20 Toshiba Corp Barrel shifter
US4905178A (en) 1986-09-19 1990-02-27 Performance Semiconductor Corporation Fast shifter method and structure
JPS6398729A (en) 1986-10-15 1988-04-30 Fujitsu Ltd Barrel shifter
US4914622A (en) 1987-04-17 1990-04-03 Advanced Micro Devices, Inc. Array-organized bit map with a barrel shifter
DE3889812T2 (en) 1987-08-28 1994-12-15 Nec Corp Data processor with a test structure for multi-position shifters.
KR970005453B1 (en) * 1987-12-25 1997-04-16 가부시기가이샤 히다찌세이사꾸쇼 Data processing apparatus for high speed processing
US4926323A (en) 1988-03-03 1990-05-15 Advanced Micro Devices, Inc. Streamlined instruction processor
JPH01263820A (en) 1988-04-15 1989-10-20 Hitachi Ltd Microprocessor
EP0344347B1 (en) 1988-06-02 1993-12-29 Deutsche ITT Industries GmbH Digital signal processing unit
GB2229832B (en) 1989-03-30 1993-04-07 Intel Corp Byte swap instruction for memory format conversion within a microprocessor
DE69032318T2 (en) * 1989-08-31 1998-09-24 Canon Kk Image processing device
DE69030648T2 (en) * 1990-01-02 1997-11-13 Motorola Inc Method for sequential prefetching of 1-word, 2-word or 3-word instructions
JPH03248226A (en) 1990-02-26 1991-11-06 Nec Corp Microprocessor
JP2560889B2 (en) 1990-05-22 1996-12-04 日本電気株式会社 Microprocessor
CA2045790A1 (en) * 1990-06-29 1991-12-30 Richard Lee Sites Branch prediction in high-performance processor
US5778423A (en) * 1990-06-29 1998-07-07 Digital Equipment Corporation Prefetch instruction for improving performance in reduced instruction set processor
US5155843A (en) * 1990-06-29 1992-10-13 Digital Equipment Corporation Error transition mode for multi-processor system
JP2556612B2 (en) 1990-08-29 1996-11-20 日本電気アイシーマイコンシステム株式会社 Barrel shifter circuit
US5539911A (en) * 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
DE69229084T2 (en) * 1991-07-08 1999-10-21 Canon Kk Color imaging process, color image reader and color image processing apparatus
CA2073516A1 (en) 1991-11-27 1993-05-28 Peter Michael Kogge Dynamic multi-mode parallel processor array architecture computer system
US5485625A (en) 1992-06-29 1996-01-16 Ford Motor Company Method and apparatus for monitoring external events during a microprocessor's sleep mode
US5274770A (en) 1992-07-29 1993-12-28 Tritech Microelectronics International Pte Ltd. Flexible register-based I/O microcontroller with single cycle instruction execution
US5294928A (en) 1992-08-31 1994-03-15 Microchip Technology Incorporated A/D converter with zero power mode
US5333119A (en) 1992-09-30 1994-07-26 Regents Of The University Of Minnesota Digital signal processor with delayed-evaluation array multipliers and low-power memory addressing
US5542074A (en) 1992-10-22 1996-07-30 Maspar Computer Corporation Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount
US5696958A (en) 1993-01-11 1997-12-09 Silicon Graphics, Inc. Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline
US5577217A (en) * 1993-05-14 1996-11-19 Intel Corporation Method and apparatus for a branch target buffer with shared branch pattern tables for associated branch predictions
JPH06332693A (en) 1993-05-27 1994-12-02 Hitachi Ltd Issuing system of suspending instruction with time-out function
US5454117A (en) * 1993-08-25 1995-09-26 Nexgen, Inc. Configurable branch prediction for a processor performing speculative execution
US5584031A (en) 1993-11-09 1996-12-10 Motorola Inc. System and method for executing a low power delay instruction
JP2801135B2 (en) 1993-11-26 1998-09-21 富士通株式会社 Instruction reading method and instruction reading device for pipeline processor
US5509129A (en) 1993-11-30 1996-04-16 Guttag; Karl M. Long instruction word controlling plural independent processor operations
US5590350A (en) 1993-11-30 1996-12-31 Texas Instruments Incorporated Three input arithmetic logic unit with mask generator
US6116768A (en) 1993-11-30 2000-09-12 Texas Instruments Incorporated Three input arithmetic logic unit with barrel rotator
US5590351A (en) 1994-01-21 1996-12-31 Advanced Micro Devices, Inc. Superscalar execution unit for sequential instruction pointer updates and segment limit checks
TW253946B (en) * 1994-02-04 1995-08-11 Ibm Data processor with branch prediction and method of operation
JPH07253922A (en) * 1994-03-14 1995-10-03 Texas Instr Japan Ltd Address generating circuit
US5517436A (en) 1994-06-07 1996-05-14 Andreas; David C. Digital signal processor for audio applications
US5566357A (en) 1994-10-06 1996-10-15 Qualcomm Incorporated Power reduction in a cellular radiotelephone
US5692168A (en) 1994-10-18 1997-11-25 Cyrix Corporation Prefetch buffer using flow control bit to identify changes of flow within the code stream
JPH08202469A (en) 1995-01-30 1996-08-09 Fujitsu Ltd Microcontroller unit equipped with universal asychronous transmitting and receiving circuit
US5600674A (en) 1995-03-02 1997-02-04 Motorola Inc. Method and apparatus of an enhanced digital signal processor
US5655122A (en) 1995-04-05 1997-08-05 Sequent Computer Systems, Inc. Optimizing compiler with static prediction of branch probability, branch frequency and function frequency
US5835753A (en) 1995-04-12 1998-11-10 Advanced Micro Devices, Inc. Microprocessor with dynamically extendable pipeline stages and a classifying circuit
US5659752A (en) * 1995-06-30 1997-08-19 International Business Machines Corporation System and method for improving branch prediction in compiled program code
US5768602A (en) 1995-08-04 1998-06-16 Apple Computer, Inc. Sleep mode controller for power management
US5842004A (en) 1995-08-04 1998-11-24 Sun Microsystems, Inc. Method and apparatus for decompression of compressed geometric three-dimensional graphics data
US5727211A (en) * 1995-11-09 1998-03-10 Chromatic Research, Inc. System and method for fast context switching between tasks
US5778438A (en) 1995-12-06 1998-07-07 Intel Corporation Method and apparatus for maintaining cache coherency in a computer system with a highly pipelined bus and multiple conflicting snoop requests
US5774709A (en) 1995-12-06 1998-06-30 Lsi Logic Corporation Enhanced branch delay slot handling with single exception program counter
US5996071A (en) 1995-12-15 1999-11-30 Via-Cyrix, Inc. Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address
JP3663710B2 (en) * 1996-01-17 2005-06-22 ヤマハ株式会社 Program generation method and processor interrupt control method
US5896305A (en) 1996-02-08 1999-04-20 Texas Instruments Incorporated Shifter circuit for an arithmetic logic unit in a microprocessor
JPH09261490A (en) * 1996-03-22 1997-10-03 Minolta Co Ltd Image forming device
US5752014A (en) * 1996-04-29 1998-05-12 International Business Machines Corporation Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction
US5784636A (en) 1996-05-28 1998-07-21 National Semiconductor Corporation Reconfigurable computer architecture for use in signal processing applications
US20010025337A1 (en) 1996-06-10 2001-09-27 Frank Worrell Microprocessor including a mode detector for setting compression mode
US5826079A (en) 1996-07-05 1998-10-20 Ncr Corporation Method for improving the execution efficiency of frequently communicating processes utilizing affinity process scheduling by identifying and assigning the frequently communicating processes to the same processor
US5805876A (en) * 1996-09-30 1998-09-08 International Business Machines Corporation Method and system for reducing average branch resolution time and effective misprediction penalty in a processor
US6058142A (en) 1996-11-29 2000-05-02 Sony Corporation Image processing apparatus
US5909572A (en) 1996-12-02 1999-06-01 Compaq Computer Corp. System and method for conditionally moving an operand from a source register to a destination register
US6061521A (en) 1996-12-02 2000-05-09 Compaq Computer Corp. Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle
EP0855645A3 (en) * 1996-12-31 2000-05-24 Texas Instruments Incorporated System and method for speculative execution of instructions with data prefetch
KR100236533B1 (en) 1997-01-16 2000-01-15 윤종용 Digital signal processor
EP0855718A1 (en) 1997-01-28 1998-07-29 Hewlett-Packard Company Memory low power mode control
US6584525B1 (en) 1998-11-19 2003-06-24 Edwin E. Klingman Adaptation of standard microprocessor architectures via an interface to a configurable subsystem
US6021500A (en) 1997-05-07 2000-02-01 Intel Corporation Processor with sleep and deep sleep modes
US5950120A (en) 1997-06-17 1999-09-07 Lsi Logic Corporation Apparatus and method for shutdown of wireless communications mobile station with multiple clocks
US5931950A (en) 1997-06-17 1999-08-03 Pc-Tel, Inc. Wake-up-on-ring power conservation for host signal processing communication system
US6035374A (en) 1997-06-25 2000-03-07 Sun Microsystems, Inc. Method of executing coded instructions in a multiprocessor having shared execution resources including active, nap, and sleep states in accordance with cache miss latency
US6088786A (en) 1997-06-27 2000-07-11 Sun Microsystems, Inc. Method and system for coupling a stack based processor to register based functional unit
US5878264A (en) 1997-07-17 1999-03-02 Sun Microsystems, Inc. Power sequence controller with wakeup logic for enabling a wakeup interrupt handler procedure
US6760833B1 (en) 1997-08-01 2004-07-06 Micron Technology, Inc. Split embedded DRAM processor
US6026478A (en) 1997-08-01 2000-02-15 Micron Technology, Inc. Split embedded DRAM processor
US6157988A (en) 1997-08-01 2000-12-05 Micron Technology, Inc. Method and apparatus for high performance branching in pipelined microsystems
US6226738B1 (en) 1997-08-01 2001-05-01 Micron Technology, Inc. Split embedded DRAM processor
JPH1185515A (en) 1997-09-10 1999-03-30 Ricoh Co Ltd Microprocessor
JPH11143571A (en) 1997-11-05 1999-05-28 Mitsubishi Electric Corp Data processor
US6044458A (en) * 1997-12-12 2000-03-28 Motorola, Inc. System for monitoring program flow utilizing fixwords stored sequentially to opcodes
US6014743A (en) 1998-02-05 2000-01-11 Intergrated Device Technology, Inc. Apparatus and method for recording a floating point error pointer in zero cycles
US6151672A (en) * 1998-02-23 2000-11-21 Hewlett-Packard Company Methods and apparatus for reducing interference in a branch history table of a microprocessor
US6374349B2 (en) 1998-03-19 2002-04-16 Mcfarling Scott Branch predictor with serially connected predictor stages for improving branch prediction accuracy
US6289417B1 (en) 1998-05-18 2001-09-11 Arm Limited Operand supply to an execution unit
US6308279B1 (en) 1998-05-22 2001-10-23 Intel Corporation Method and apparatus for power mode transition in a multi-thread processor
JPH11353225A (en) 1998-05-26 1999-12-24 Internatl Business Mach Corp <Ibm> Memory that processor addressing gray code system in sequential execution style accesses and method for storing code and data in memory
US6466333B2 (en) * 1998-06-26 2002-10-15 Canon Kabushiki Kaisha Streamlined tetrahedral interpolation
US20020053015A1 (en) 1998-07-14 2002-05-02 Sony Corporation And Sony Electronics Inc. Digital signal processor particularly suited for decoding digital audio
US6327651B1 (en) 1998-09-08 2001-12-04 International Business Machines Corporation Wide shifting in the vector permute unit
US6253287B1 (en) * 1998-09-09 2001-06-26 Advanced Micro Devices, Inc. Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions
US6240521B1 (en) 1998-09-10 2001-05-29 International Business Machines Corp. Sleep mode transition between processors sharing an instruction set and an address space
US6347379B1 (en) 1998-09-25 2002-02-12 Intel Corporation Reducing power consumption of an electronic device
US6339822B1 (en) 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6862563B1 (en) 1998-10-14 2005-03-01 Arc International Method and apparatus for managing the configuration and functionality of a semiconductor design
US6671743B1 (en) * 1998-11-13 2003-12-30 Creative Technology, Ltd. Method and system for exposing proprietary APIs in a privileged device driver to an application
EP1351154A2 (en) * 1998-11-20 2003-10-08 Altera Corporation Reconfigurable programmable logic device computer system
US6189091B1 (en) * 1998-12-02 2001-02-13 Ip First, L.L.C. Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection
US6341348B1 (en) * 1998-12-03 2002-01-22 Sun Microsystems, Inc. Software branch prediction filtering for a microprocessor
US6957327B1 (en) * 1998-12-31 2005-10-18 Stmicroelectronics, Inc. Block-based branch target buffer
US6826748B1 (en) * 1999-01-28 2004-11-30 Ati International Srl Profiling program execution into registers of a computer
US6477683B1 (en) 1999-02-05 2002-11-05 Tensilica, Inc. Automated processor generation system for designing a configurable processor and method for the same
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6499101B1 (en) 1999-03-18 2002-12-24 I.P. First L.L.C. Static branch prediction mechanism for conditional branch instructions
US6427206B1 (en) * 1999-05-03 2002-07-30 Intel Corporation Optimized branch predictions for strongly predicted compiler branches
US6560754B1 (en) 1999-05-13 2003-05-06 Arc International Plc Method and apparatus for jump control in a pipelined processor
US6438700B1 (en) 1999-05-18 2002-08-20 Koninklijke Philips Electronics N.V. System and method to reduce power consumption in advanced RISC machine (ARM) based systems
US6772325B1 (en) 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6571333B1 (en) 1999-11-05 2003-05-27 Intel Corporation Initializing a memory controller by executing software in second memory to wakeup a system
US6546481B1 (en) * 1999-11-05 2003-04-08 Ip - First Llc Split history tables for branch prediction
US6909744B2 (en) 1999-12-09 2005-06-21 Redrock Semiconductor, Inc. Processor architecture for compression and decompression of video and images
KR100395763B1 (en) 2000-02-01 2003-08-25 삼성전자주식회사 A branch predictor for microprocessor having multiple processes
US6412038B1 (en) 2000-02-14 2002-06-25 Intel Corporation Integral modular cache for a processor
JP2001282548A (en) 2000-03-29 2001-10-12 Matsushita Electric Ind Co Ltd Communication equipment and communication method
US6519696B1 (en) 2000-03-30 2003-02-11 I.P. First, Llc Paired register exchange using renaming register map
US6681295B1 (en) 2000-08-31 2004-01-20 Hewlett-Packard Development Company, L.P. Fast lane prefetching
US6718460B1 (en) * 2000-09-05 2004-04-06 Sun Microsystems, Inc. Mechanism for error handling in a computer system
US20030070013A1 (en) 2000-10-27 2003-04-10 Daniel Hansson Method and apparatus for reducing power consumption in a digital processor
US6948054B2 (en) * 2000-11-29 2005-09-20 Lsi Logic Corporation Simple branch prediction and misprediction recovery method
TW477954B (en) * 2000-12-05 2002-03-01 Faraday Tech Corp Memory data accessing architecture and method for a processor
US20020073301A1 (en) * 2000-12-07 2002-06-13 International Business Machines Corporation Hardware for use with compiler generated branch information
US7139903B2 (en) * 2000-12-19 2006-11-21 Hewlett-Packard Development Company, L.P. Conflict free parallel read access to a bank interleaved branch predictor in a processor
US6877089B2 (en) 2000-12-27 2005-04-05 International Business Machines Corporation Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US20020087851A1 (en) * 2000-12-28 2002-07-04 Matsushita Electric Industrial Co., Ltd. Microprocessor and an instruction converter
US8285976B2 (en) * 2000-12-28 2012-10-09 Micron Technology, Inc. Method and apparatus for predicting branches using a meta predictor
US6823447B2 (en) 2001-03-01 2004-11-23 International Business Machines Corporation Software hint to improve the branch target prediction accuracy
EP1381957A2 (en) 2001-03-02 2004-01-21 Atsana Semiconductor Corp. Data processing apparatus and system and method for controlling memory access
JP3890910B2 (en) * 2001-03-21 2007-03-07 株式会社日立製作所 Instruction execution result prediction device
US7010558B2 (en) 2001-04-19 2006-03-07 Arc International Data processor with enhanced instruction execution and method
US7200740B2 (en) 2001-05-04 2007-04-03 Ip-First, Llc Apparatus and method for speculatively performing a return instruction in a microprocessor
US7165169B2 (en) * 2001-05-04 2007-01-16 Ip-First, Llc Speculative branch target address cache with selective override by secondary predictor based on branch instruction type
US20020194462A1 (en) * 2001-05-04 2002-12-19 Ip First Llc Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line
US7165168B2 (en) * 2003-01-14 2007-01-16 Ip-First, Llc Microprocessor with branch target address cache update queue
US20020194461A1 (en) 2001-05-04 2002-12-19 Ip First Llc Speculative branch target address cache
US6886093B2 (en) * 2001-05-04 2005-04-26 Ip-First, Llc Speculative hybrid branch direction predictor
GB0112275D0 (en) 2001-05-21 2001-07-11 Micron Technology Inc Method and circuit for normalization of floating point significands in a simd array mpp
GB0112269D0 (en) 2001-05-21 2001-07-11 Micron Technology Inc Method and circuit for alignment of floating point significands in a simd array mpp
CN1265286C (en) * 2001-06-29 2006-07-19 皇家菲利浦电子有限公司 Method, appts. and compiler for predicting indirect branch target addresses
US7162619B2 (en) 2001-07-03 2007-01-09 Ip-First, Llc Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer
US7010675B2 (en) * 2001-07-27 2006-03-07 Stmicroelectronics, Inc. Fetch branch architecture for reducing branch penalty without branch prediction
US6751331B2 (en) 2001-10-11 2004-06-15 United Global Sourcing Incorporated Communication headset
WO2003065165A2 (en) 2002-01-31 2003-08-07 Arc International Configurable data processor with multi-length instruction set architecture
US7529912B2 (en) 2002-02-12 2009-05-05 Via Technologies, Inc. Apparatus and method for instruction-level specification of floating point format
US7181596B2 (en) 2002-02-12 2007-02-20 Ip-First, Llc Apparatus and method for extending a microprocessor instruction set
US7328328B2 (en) 2002-02-19 2008-02-05 Ip-First, Llc Non-temporal memory reference control mechanism
US7315921B2 (en) 2002-02-19 2008-01-01 Ip-First, Llc Apparatus and method for selective memory attribute control
US7546446B2 (en) 2002-03-08 2009-06-09 Ip-First, Llc Selective interrupt suppression
US7395412B2 (en) 2002-03-08 2008-07-01 Ip-First, Llc Apparatus and method for extending data modes in a microprocessor
US7185180B2 (en) 2002-04-02 2007-02-27 Ip-First, Llc Apparatus and method for selective control of condition code write back
US7302551B2 (en) 2002-04-02 2007-11-27 Ip-First, Llc Suppression of store checking
US7155598B2 (en) 2002-04-02 2006-12-26 Ip-First, Llc Apparatus and method for conditional instruction execution
US7380103B2 (en) 2002-04-02 2008-05-27 Ip-First, Llc Apparatus and method for selective control of results write back
US7373483B2 (en) 2002-04-02 2008-05-13 Ip-First, Llc Mechanism for extending the number of registers in a microprocessor
US7380109B2 (en) 2002-04-15 2008-05-27 Ip-First, Llc Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor
US20030204705A1 (en) * 2002-04-30 2003-10-30 Oldfield William H. Prediction of branch instructions in a data processing apparatus
KR100450753B1 (en) 2002-05-17 2004-10-01 한국전자통신연구원 Programmable variable length decoder including interface of CPU processor
US6938151B2 (en) * 2002-06-04 2005-08-30 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US6718504B1 (en) 2002-06-05 2004-04-06 Arc International Method and apparatus for implementing a data processor adapted for turbo decoding
US7493480B2 (en) * 2002-07-18 2009-02-17 International Business Machines Corporation Method and apparatus for prefetching branch history information
US7000095B2 (en) * 2002-09-06 2006-02-14 Mips Technologies, Inc. Method and apparatus for clearing hazards using jump instructions
US20050125634A1 (en) * 2002-10-04 2005-06-09 Fujitsu Limited Processor and instruction control method
US6968444B1 (en) 2002-11-04 2005-11-22 Advanced Micro Devices, Inc. Microprocessor employing a fixed position dispatch unit
US7266676B2 (en) * 2003-03-21 2007-09-04 Analog Devices, Inc. Method and apparatus for branch prediction based on branch targets utilizing tag and data arrays
US7590829B2 (en) 2003-03-31 2009-09-15 Stretch, Inc. Extension adapter
US7174444B2 (en) * 2003-03-31 2007-02-06 Intel Corporation Preventing a read of a next sequential chunk in branch prediction of a subject chunk
US20040193855A1 (en) * 2003-03-31 2004-09-30 Nicolas Kacevas System and method for branch prediction access
US20040225870A1 (en) 2003-05-07 2004-11-11 Srinivasan Srikanth T. Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor
US7010676B2 (en) * 2003-05-12 2006-03-07 International Business Machines Corporation Last iteration loop branch prediction upon counter threshold and resolution upon counter one
US20040255104A1 (en) * 2003-06-12 2004-12-16 Intel Corporation Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor
US7668897B2 (en) 2003-06-16 2010-02-23 Arm Limited Result partitioning within SIMD data processing systems
US7783871B2 (en) * 2003-06-30 2010-08-24 Intel Corporation Method to remove stale branch predictions for an instruction prior to execution within a microprocessor
US7373642B2 (en) 2003-07-29 2008-05-13 Stretch, Inc. Defining instruction extensions in a standard programming language
US20050027974A1 (en) * 2003-07-31 2005-02-03 Oded Lempel Method and system for conserving resources in an instruction pipeline
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
JP2005078234A (en) * 2003-08-29 2005-03-24 Renesas Technology Corp Information processor
US7237098B2 (en) 2003-09-08 2007-06-26 Ip-First, Llc Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US20050066305A1 (en) 2003-09-22 2005-03-24 Lisanke Robert John Method and machine for efficient simulation of digital hardware within a software development environment
KR100980076B1 (en) 2003-10-24 2010-09-06 삼성전자주식회사 System and method for branch prediction with low-power consumption
US7219207B2 (en) * 2003-12-03 2007-05-15 Intel Corporation Reconfigurable trace cache
US8069336B2 (en) 2003-12-03 2011-11-29 Globalfoundries Inc. Transitioning from instruction cache to trace cache on label boundaries
US7401328B2 (en) * 2003-12-18 2008-07-15 Lsi Corporation Software-implemented grouping techniques for use in a superscalar data processing system
US7293164B2 (en) * 2004-01-14 2007-11-06 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions
US8607209B2 (en) * 2004-02-04 2013-12-10 Bluerisc Inc. Energy-focused compiler-assisted branch prediction
US7613911B2 (en) * 2004-03-12 2009-11-03 Arm Limited Prefetching exception vectors by early lookup exception vectors within a cache memory
US20050216713A1 (en) 2004-03-25 2005-09-29 International Business Machines Corporation Instruction text controlled selectively stated branches for prediction via a branch target buffer
US7281120B2 (en) 2004-03-26 2007-10-09 International Business Machines Corporation Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor
US20050223202A1 (en) * 2004-03-31 2005-10-06 Intel Corporation Branch prediction in a pipelined processor
US20050278517A1 (en) * 2004-05-19 2005-12-15 Kar-Lik Wong Systems and methods for performing branch prediction in a variable length instruction set microprocessor
US20060015706A1 (en) * 2004-06-30 2006-01-19 Chunrong Lai TLB correlated branch predictor and method for use thereof
TWI305323B (en) * 2004-08-23 2009-01-11 Faraday Tech Corp Method for verification branch prediction mechanisms and readable recording medium for storing program thereof

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5560036A (en) * 1989-12-14 1996-09-24 Mitsubishi Denki Kabushiki Kaisha Data processing having incircuit emulation function
US5636363A (en) * 1991-06-14 1997-06-03 Integrated Device Technology, Inc. Hardware control structure and method for off-chip monitoring entries of an on-chip cache
US5493687A (en) * 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5450586A (en) * 1991-08-14 1995-09-12 Hewlett-Packard Company System for analyzing and debugging embedded software through dynamic and interactive use of code markers
US5423011A (en) * 1992-06-11 1995-06-06 International Business Machines Corporation Apparatus for initializing branch prediction information
US5586279A (en) * 1993-02-03 1996-12-17 Motorola Inc. Data processing system and method for testing a data processor having a cache memory
US5530825A (en) * 1994-04-15 1996-06-25 Motorola, Inc. Data processor with branch target address cache and method of operation
US5809293A (en) * 1994-07-29 1998-09-15 International Business Machines Corporation System and method for program execution tracing within an integrated processor
US5920711A (en) * 1995-06-02 1999-07-06 Synopsys, Inc. System for frame-based protocol, graphical capture, synthesis, analysis, and simulation
US6292879B1 (en) * 1995-10-25 2001-09-18 Anthony S. Fong Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer
US5964884A (en) * 1996-09-30 1999-10-12 Advanced Micro Devices, Inc. Self-timed pulse control circuit
US5848264A (en) * 1996-10-25 1998-12-08 S3 Incorporated Debug and video queue for multi-processor chip
US6185732B1 (en) * 1997-04-08 2001-02-06 Advanced Micro Devices, Inc. Software debug port for a microprocessor
US6154857A (en) * 1997-04-08 2000-11-28 Advanced Micro Devices, Inc. Microprocessor-based device incorporating a cache for capturing software performance profiling data
US5808876A (en) * 1997-06-20 1998-09-15 International Business Machines Corporation Multi-function power distribution system
US5978909A (en) * 1997-11-26 1999-11-02 Intel Corporation System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer
US6622240B1 (en) * 1999-06-18 2003-09-16 Intrinsity, Inc. Method and apparatus for pre-branch instruction
US6550056B1 (en) * 1999-07-19 2003-04-15 Mitsubishi Denki Kabushiki Kaisha Source level debugger for debugging source programs
US6609194B1 (en) * 1999-11-12 2003-08-19 Ip-First, Llc Apparatus for performing branch target address calculation based on branch type
US6963554B1 (en) * 2000-12-27 2005-11-08 National Semiconductor Corporation Microwire dynamic sequencer pipeline stall
US20020100020A1 (en) * 2001-01-24 2002-07-25 Hunter Jeff L. Method for maintaining cache coherency in software in a shared memory system
US20020100019A1 (en) * 2001-01-24 2002-07-25 Hunter Jeff L. Software shared memory bus
US6925634B2 (en) * 2001-01-24 2005-08-02 Texas Instruments Incorporated Method for maintaining cache coherency in software in a shared memory system
US6823444B1 (en) * 2001-07-03 2004-11-23 Ip-First, Llc Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap
US20030046614A1 (en) * 2001-08-31 2003-03-06 Brokish Charles W. System and method for using embedded real-time analysis components
US7093165B2 (en) * 2001-10-24 2006-08-15 Kabushiki Kaisha Toshiba Debugging Method
US20030126508A1 (en) * 2001-12-28 2003-07-03 Timothe Litt Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip
US20030154463A1 (en) * 2002-02-08 2003-08-14 Betker Michael Richard Multiprocessor system with cache-based software breakpoints
US6774832B1 (en) * 2003-03-25 2004-08-10 Raytheon Company Multi-bit output DDS with real time delta sigma modulation look up from memory
US20050097398A1 (en) * 2003-10-30 2005-05-05 International Business Machines Corporation Program debug method and apparatus

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050289323A1 (en) * 2004-05-19 2005-12-29 Kar-Lik Wong Barrel shifter for a microprocessor
US20050278505A1 (en) * 2004-05-19 2005-12-15 Lim Seow C Microprocessor architecture including zero impact predictive data pre-fetch mechanism for pipeline data memory
US8719837B2 (en) 2004-05-19 2014-05-06 Synopsys, Inc. Microprocessor architecture having extendible logic
US9003422B2 (en) 2004-05-19 2015-04-07 Synopsys, Inc. Microprocessor architecture having extendible logic
US7971042B2 (en) 2005-09-28 2011-06-28 Synopsys, Inc. Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline
USRE47851E1 (en) * 2006-09-28 2020-02-11 Rambus Inc. Data processing system having cache memory debugging support and method therefor
USRE49305E1 (en) * 2006-09-28 2022-11-22 Rambus Inc. Data processing system having cache memory debugging support and method therefor
US20110320716A1 (en) * 2010-06-24 2011-12-29 International Business Machines Corporation Loading and unloading a memory element for debug
US8495287B2 (en) * 2010-06-24 2013-07-23 International Business Machines Corporation Clock-based debugging for embedded dynamic random access memory element in a processor core
US10372590B2 (en) * 2013-11-22 2019-08-06 International Business Corporation Determining instruction execution history in a debugger
US10552297B2 (en) 2013-11-22 2020-02-04 International Business Machines Corporation Determining instruction execution history in a debugger
US20150149984A1 (en) * 2013-11-22 2015-05-28 International Business Machines Corporation Determining instruction execution history in a debugger
US10977160B2 (en) 2013-11-22 2021-04-13 International Business Machines Corporation Determining instruction execution history in a debugger

Also Published As

Publication number Publication date
GB2428842A (en) 2007-02-07
US20050289323A1 (en) 2005-12-29
US20140208087A1 (en) 2014-07-24
US9003422B2 (en) 2015-04-07
TW200602974A (en) 2006-01-16
US20050278513A1 (en) 2005-12-15
WO2005114441A2 (en) 2005-12-01
US20050278505A1 (en) 2005-12-15
US20050289321A1 (en) 2005-12-29
US20050278517A1 (en) 2005-12-15
US8719837B2 (en) 2014-05-06
GB0622477D0 (en) 2006-12-20
WO2005114441A3 (en) 2007-01-18
CN101002169A (en) 2007-07-18

Similar Documents

Publication Publication Date Title
US20050273559A1 (en) Microprocessor architecture including unified cache debug unit
US5226130A (en) Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency
JP2939003B2 (en) High performance multiprocessor with floating point unit and method for improving its performance
US5835705A (en) Method and system for performance per-thread monitoring in a multithreaded processor
US5860107A (en) Processor and method for store gathering through merged store operations
JP2962876B2 (en) Conversion of internal processor register commands to I / O space addresses
US6549985B1 (en) Method and apparatus for resolving additional load misses and page table walks under orthogonal stalls in a single pipeline processor
JP3195378B2 (en) Branch prediction for high performance processors
US6279105B1 (en) Pipelined two-cycle branch target address cache
EP1343076A2 (en) integrated circuit with multiple functions sharing multiple internal signal buses according to distributed bus access and control arbitration
US20060236080A1 (en) Reducing the fetch time of target instructions of a predicted taken branch instruction
US6564315B1 (en) Scheduler which discovers non-speculative nature of an instruction after issuing and reissues the instruction
JPH0695964A (en) Error transition mode for multiprocessor system
JPH0695963A (en) Bus protocol for high-performance processor
JP2005508546A (en) System and method for reducing execution of instructions containing unreliable data in speculative processors
JPH06103167A (en) Combination queue for invalidate and returned data in multiprocessor system
KR19980063489A (en) Background Completion of Instructions and Associated Fetch Requests in a Multithreaded Processor
US20090164758A1 (en) System and Method for Performing Locked Operations
US5649137A (en) Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency
US6094711A (en) Apparatus and method for reducing data bus pin count of an interface while substantially maintaining performance
US11086631B2 (en) Illegal instruction exception handling
US11048516B2 (en) Systems, methods, and apparatuses for last branch record support compatible with binary translation and speculative execution using an architectural bit array and a write bit array
US11023342B2 (en) Cache diagnostic techniques
US7305586B2 (en) Accessing and manipulating microprocessor state
US5721867A (en) Method and apparatus for executing single beat write store instructions during a cache store linefill operation

Legal Events

Date Code Title Description
AS Assignment

Owner name: ARC INTERNATIONAL, UNITED KINGDOM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARISTODEMOU, ARIS;HANSSON, DANIEL;TAYLOR, MORGYN;AND OTHERS;REEL/FRAME:016909/0825

Effective date: 20050721

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION