US20090089510A1 - Speculative read in a cache coherent microprocessor - Google Patents
Speculative read in a cache coherent microprocessor Download PDFInfo
- Publication number
- US20090089510A1 US20090089510A1 US11/864,363 US86436307A US2009089510A1 US 20090089510 A1 US20090089510 A1 US 20090089510A1 US 86436307 A US86436307 A US 86436307A US 2009089510 A1 US2009089510 A1 US 2009089510A1
- Authority
- US
- United States
- Prior art keywords
- request
- speculative
- configurable
- coherent
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0817—Cache consistency protocols using directory methods
- G06F12/0828—Cache consistency protocols using directory methods with concurrent directory accessing, i.e. handling multiple concurrent coherency transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
- G06F12/0831—Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0871—Allocation or management of cache space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/50—Control mechanisms for virtual memory, cache or TLB
- G06F2212/507—Control mechanisms for virtual memory, cache or TLB using speculative control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/62—Details of cache specific to multiprocessor cache arrangements
- G06F2212/621—Coherency control relating to peripheral accessing, e.g. from DMA or I/O device
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to multiprocessor systems, and more particularly to performing a speculative request in a cache coherent multi-core microprocessor system.
- microprocessor clock speeds have given rise to considerable increases in microprocessor clock speeds. Although the same advances have also resulted in improvements in memory density and access times, the disparity between microprocessor clock speeds and memory access times continues to persist. To reduce latency, often one or more levels of high-speed cache memory are used to hold a subset of the data or instructions that are stored in the main memory. A number of techniques have been developed to increase the likelihood that the data/instructions held in the cache are repeatedly used by the microprocessor.
- microprocessors with a multitude of cores that execute instructions in parallel have been developed.
- the cores may be integrated within the same semiconductor die, or may be formed on different semiconductor dies coupled to one another within a package, or a combination of the two.
- Each core typically includes its own level-1 cache and an optional level-2 cache.
- a cache coherency protocol governs the traffic flow between the memory and the caches associated with the cores to ensure coherency between them. For example, the cache coherency protocol ensures that if a copy of a data item is modified in one of the caches, copies of the same data item stored in other caches and in the main memory are invalidated or updated in accordance with the modification.
- speculative read In order to reduce the average latency associated with a coherent read request, a technique commonly referred to as speculative read may be used. In accordance with this technique, concurrently with searching for the requested data in the caches, a speculative read request is also issued to the memory. If the requested data is stored in any of the caches, the speculative read is cancelled. If the requested data is not stored in any of the caches, the speculative read is confirmed and the data identified by the confirmed request is transferred from the memory to the requesting core.
- a multi-core microprocessor includes, in part, a cache coherence manager that maintains cache coherence among the multitude of cores and also minimizes latency associated with performing coherent requests.
- the cache coherence manager includes, in part, a request unit, an intervention unit, a response unit, and a memory interface unit.
- the request unit is configured to selectively issue a speculative request in response to a coherent request received from one of the cores.
- the intervention unit is configured to send an intervention message associated with the coherent request to the cores.
- the memory interface unit is configured to receive the speculative request and to selectively cancel or forward the speculative request to a memory.
- the memory interface unit includes at least three tables.
- An entry in the first table is an index to the second table.
- the entry in the second table is an index to the third table.
- the entry in the first table is allocated when a response to the intervention message is stored in the first table before the speculative request is stored in the memory interface unit.
- the entry in the second table is allocated when the request is stored in the memory interface unit.
- the entry in the third table is allocated when the speculative request is issued to the memory.
- the request unit includes, in part, a fourth table storing a multitude of addresses, and a logic block configured to compare an address associated with the request to the multitude of addresses stored in the fourth table.
- Each address stored in the fourth table is associated with a pending coherent request. If an address match is not detected, the logic block issues the speculative request and assigns an identifier thereto. The identifier is used as an index to the first entry in the first table. In another embodiment, the logic block issues the speculative request first, assigns a corresponding identifier, and subsequently compares the requested address to the addresses stored in the fourth table. If an address match is detected, the logic block cancels the speculative request. In one embodiment, the request unit does not issue a speculative request unless the number of unresolved speculative requests is less than the total number of entries of the third table.
- a method of operating a multi-core microprocessor having disposed therein a cache coherence manager includes, in part, receiving a coherent request from one of the cores, selectively issuing a speculative request in response, sending an intervention message associated with the coherent request to the cores, and selectively sending the issued speculative request to a memory.
- the memory interface unit includes at least three tables.
- An entry in the first table is an index to the second table.
- the entry in the second table is an index to the third table.
- the entry in the first table is allocated when a response to the intervention message arrives at the first table before the corresponding request is stored the memory interface unit.
- the entry in the second table is allocated when the speculative request is stored in the memory interface unit.
- the entry in the third table is allocated when the speculative request is issued to the memory.
- the address associated with the coherent request is compared to a multitude of addresses stored in a fourth table. Each address stored in the fourth table is associated with a pending coherent request. If an address match is not detected, the request is speculatively issued and an identifier is assigned to this request. The identifier is used as an index to the first entry in the first table. In another embodiment, the speculative request is first issued and a corresponding identifier is assigned. If a match is thereafter detected between the address associated with the request and any one of the addresses stored in the fourth table, the speculative request is canceled. In one embodiment, the coherent request is not speculatively issued unless the number of unresolved speculative requests is less than the total number of entries of the third table.
- FIG. 1 shows a multi-core microprocessor, in communication with a number of I/O devices and a system memory, in accordance with one embodiment of the present invention.
- FIG. 2 is a block diagram of the cache coherence manger disposed in the microprocessor of FIG. 1 , in accordance with one embodiment of the present invention.
- FIG. 3 is a more detailed block diagram of the cache coherence manager of FIG. 2 , in accordance with one embodiment of the present invention.
- FIGS. 4A , 4 B and 4 C form a flowchart showing a speculative request, in accordance with one embodiment of the present invention.
- FIG. 5 shows the flow of indices and data between a number of tables disposed in the cache coherence manager of in FIG. 3 .
- FIG. 6 shows the flow of speculative and non-speculative requests that may lead to a deadlock condition and which the present invention is adapted to inhibit.
- FIG. 7 shows an exemplary computer system in which the present invention may be embodied.
- a multi-core microprocessor includes, in part, a cache coherence manager that maintains coherence among the multitude of microprocessor cores, and further minimizes the latency associated with coherent read requests.
- the cache coherence manager includes, in part a request unit, an intervention unit, a memory interface unit and a response unit.
- the cache coherence manager supports speculative reads and includes an indexing scheme that efficiently manages the processing of the speculative requests and the corresponding intervention messages forwarded to and received from the cores.
- FIG. 1 is a block diagram of a microprocessor 100 , in accordance with one exemplary embodiment of the present invention, that is in communication with system memory 600 and I/O units 610 , 620 via system bus 630 .
- Microprocessor (hereinafter alternatively referred to as processor) 100 is shown as including, in part, four cores 105 1 , 105 2 , 105 3 and 105 4 , a cache coherency manger 200 , and an optional level-2 (L2) cache 605 .
- exemplary embodiment of processor 100 is shown as including four cores, it is understood that other embodiments of processor 100 may include more or fewer than four cores.
- Each processing core 110 i is adapted to perform a multitude of fixed or flexible sequence of operations in response to program instructions.
- Each processing core 110 i may conform to either CISC and/or RISC architectures to process scalar or vector data types using SISD or SIMD instructions.
- Each processing core 110 i may include general purpose and specialized register files and execution units configured to perform logic, arithmetic, and any other type of data processing functions.
- the processing cores 110 1 , 110 2 , 110 3 and 110 4 which are collectively referred to as either processing cores 110 i or processing cores 110 , may be configured to perform identical functions, or may alternatively be configured to perform different functions adapted to different applications.
- Processing cores 110 may be single-threaded or multi-threaded, i.e., capable of executing multiple sequences of program instructions in parallel.
- Each core 105 i is shown as including a level-1 (L1) cache.
- each core 110 i may include more levels of cache, e.g., level 2, level 3, etc.
- Each cache 115 i may include instructions and/or data.
- Each cache 115 i is typically organized to include a multitude of cache lines, with each line adapted to store a copy of the data corresponding with one or more virtual or physical memory addresses.
- Each cache line also stores additional information used to manage that cache line. Such additional information includes, for example, tag information used to identify the main memory address associated with the cache line, and cache coherency information used to synchronize the data in the cache line with other caches and/or with the main system memory.
- the cache tag may be formed from all or a portion of the memory address associated with the cache line.
- Each L1 cache 115 i is coupled to its associated processing core 110 i via a bus 125 i .
- Each bus 125 i includes a multitude of signal lines for carrying data and/or instructions.
- Each core 105 i is also shown as including a cache control logic 120 i to facilitate data transfer to and from its associated cache 115 i .
- Each cache 115 i may be fully associative, set associative with two or more ways, or direct mapped. For clarity, each cache 115 i is shown as a single cache memory for storing data and instructions required by core 105 i . Although not shown, it is understood that each core 105 i may include an L1 cache for storing data, and an L1 cache for storing instructions.
- Each cache 115 i is partitioned into a number of cache lines, with each cache line corresponding to a range of adjacent locations in shared system memory 300 .
- each line of each cache includes data to facilitate coherency between, e.g., cache 151 1 , main memory 600 and any other caches 115 2 , 115 3 , 115 4 , intended to remain coherent with cache 115 1 , as described further below.
- each cache line is marked as being modified “M”, exclusive “E”, Shared “S”, or Invalid “I”, as is well known.
- Other cache coherency protocols such as MSI, MOSI, and MOESI coherency protocols, are also supported by the embodiments of the present invention.
- Each core 105 i is coupled to a cache coherency manager 200 via an associated bus 135 i .
- Cache coherency manager 200 facilitates transfer of instructions and/or data between cores 105 i , system memory 600 , I/O units 310 , 320 and optional shared L2 cache 605 .
- Cache coherency manager 200 establishes the global ordering of requests, sends intervention requests, collects the responses to such requests, and sends the requested data back to the requesting core.
- Cache coherency manager 200 orders the requests so as to optimize memory accesses, load balance the requests, give priority to one or more cores over the other cores, and/or give priority to one or more types of requests over other requests.
- one or more of cores 105 i include a dedicated Level-2 (L2) cache when optional shared L2 cache 605 is not used.
- L2 cache Level-2
- FIG. 2 is a block diagram of cache coherency manager 200 , in accordance with one embodiment of the present invention.
- Cache coherency manager 200 is shown as including, in part, a request unit 220 , an intervention unit 250 , a response unit 280 , and a memory interface unit 300 .
- Request unit 220 includes input ports 202 adapted to receive, for example, read requests, write requests, write-back requests and any other cache memory related requests from cores 105 i .
- Request unit 220 serializes the requests it receives from cores 105 i and sends non-coherent read/write requests, speculative coherent read requests, as well as explicit and implicit writeback requests of modified cache data to memory interface unit 300 via port 204 .
- Request unit 220 sends coherent requests to intervention unit 250 via port 216 .
- the read address is compared against pending coherent requests that can generate write operations. If a match is detected as a result of this comparison, the read request is not started speculatively.
- intervention unit 250 In response to the coherent intervention requests received from request unit 220 , intervention unit 250 issues an intervention message via output ports 212 . A hit will cause the data to return to the intervention unit via input ports 245 . In another embodiment, the requested data is returned to the intervention unit 208 . Intervention unit 250 subsequently forwards this data to response unit 205 via output ports 218 . Response unit 280 forwards this data to the requesting (originating the request) core via output ports 212 . If there is a cache miss and the read request was not performed speculatively, intervention unit 250 requests access to this data by sending a coherent read or write request to memory interface unit 300 via output ports 206 . A read request may proceed without speculation when, for example, a request memory buffer disposed in request unit 220 and adapted to store and transfer the requests to memory interface unit 300 is full.
- Memory interface unit 300 receives non-coherent read/write requests from request unit 220 , as well as speculative requests and writeback requests from intervention unit 250 . In response, memory interface unit 300 accesses system memory 600 and/or higher level cache memories such as L2 cache 605 via input/output ports 255 to complete these requests. The data retrieved from memory 600 and/or higher level cache memories in response to such memory requests is forwarded to response unit 215 via output port 260 . The response unit 215 returns the data requested by the requesting core via output ports 265 . As is understood, the requested data may have been retrieved from an L1 cache of another core, from memory 600 , or from optional higher level cache memories.
- FIG. 3 is a more detailed view of cache coherence manager 200 disposed in a microprocessor having N cores, in accordance with one embodiment of the present invention.
- coherence manager 200 issues speculative read requests to memory 600 via memory interface unit 300 .
- the speculative read assumes that the requested data will not be found in any of the cores. If the requested data is found in response to the intervention message, the speculative read is canceled if it has not yet been issued by memory interface unit 300 , or alternatively the response is dropped when it returns from system memory 600 .
- the response to an intervention message may arrive at the intervention unit 250 at different points in time relative to the speculative request.
- the request may still be in the request unit 220 when the response to the associated intervention message is received by the intervention unit 250 .
- the request may be in the memory interface unit 300 when the response to the associated intervention message is received by intervention unit 250 .
- the request may have been issued to the memory by the time the response to the associated intervention message arrives at the intervention unit 250 .
- a number of data segments associated with the speculative read request may have been received by the memory interface unit 300 before the response to the associated intervention message is received by the intervention unit 250 .
- Coherence manager 200 is configured to handle speculative requests for all possible timing conditions described above, notwithstanding the outcome of the intervention message, i.e., cancel or confirm.
- Incoming coherent requests are serialized by serialized address register (SAR) 224 disposed in request unit 220 .
- the cache line address associated with each request is compared to the entries stored in the active address table (AAT) 222 .
- An address match indicates that a coherent request is already pending for that address and hence no speculative request is issued for that request.
- RMQ request memory queue
- SSH serialized request handler
- RQU 220 receives a coherent request that was erroneously issued due, for example, to software error, then RQU 220 will not issue a corresponding speculative request. Similarly, if RQU 220 issues a speculative request that bypasses the RMQ 228 —via signal line 230 —and subsequently detects an error with this request, RQU 220 will cancel this request.
- AAT 222 performs two functions. First, it keeps track of active coherent requests to inhibit read-after-write hazards. An intervention response to a coherent request may result in a read or write operation to the memory.
- AAT 222 is used to ensure that a speculative read to the same address does not occur before the updated data is written to the memory, thereby to avoid read-after-write hazards. Second, AAT 222 is used to tag the speculative requests to enable their identification as they flow between the IVU 250 , RQU 220 and MIU 300 .
- RMQ 228 stores both the speculative request as well as the AAT number associated with that request.
- SAR 224 issues the speculative request before it looks-up the address in AAT 222 .
- One clock cycle later if the look-up in AAT 222 indicates that an earlier issued speculative request is still pending for that address, the newly issued speculative request is canceled.
- a speculative request issued before an AAT 222 look-up may get stored in memory request queue (MRQ) 310 or get issued to the memory.
- MRQ memory request queue
- the speculative request is canceled if the subsequent AAT 222 look-up results in an address match.
- the cancellation of such a request results in deallocation of any corresponding numbers that may have been assigned to that request in speculative table 302 and/or request data buffer table 306 .
- the process of allocating and deallocating numbers in various tables disposed in memory interface unit 300 are described in detail below.
- Coherent read requests are received from SRH 226 and stored in intervention queue (IQ) 252 .
- Corresponding intervention messages are issued after these requests are stored in intervention output request queue 256 .
- Intervention messages that have been forwarded to the cores are stored in pending intervention queue (PIQ) 262 , and responses to these intervention messages are stored in intervention response queue 258 .
- the AAT number associated with a request is stored in PIQ 262 .
- Memory interface unit 300 includes, in part, a speculative table (SPT) 302 , an active address to speculation table (A2ST) 304 , and a read data buffer table (RDBT) 306 .
- SPT 302 tracks the confirm/cancel results for speculative requests that have been received by MIU 300 .
- An entry (alternatively referred to herein as number) in SPT 302 is allocated when a speculative request is received by MIU 300 from RQU 220 .
- the SPT entry is loaded into memory request queue (MRQ) 310 together with the request.
- Response data buffer table (RDBT) 306 tracks information associated with the requests that have been issued to the memory.
- An entry in RDBT 306 is allocated for every read request that is issued to the memory.
- the response to a read request is stored in the response data buffer (RDB) 316 at an address defined by the corresponding entry in RDBT 306 .
- A2ST 304 performs two functions. First, for each AAT entry, A2ST 304 supplies the corresponding SPT entry. Second, A2ST temporarily stores the confirm/cancel result for any speculative request that has not been received by MIU 300 and for which an SPT entry has not yet been allocated. In such conditions, when the request is received by MIU 300 , the confirm/cancel result is copied from A2ST 304 to the associated SPT entry newly allocated.
- the AAT number associated with that request is also stored in the newly allocated entry in SPT 302 .
- Confirm/cancel results returned in response to an intervention message and the AAT entry associated with the corresponding read request are delivered from Intervention Response Handler (IRSH) 266 to speculative handler (SPH) 312 .
- the confirm/cancel results are received in the same order as they are transmitted. This ordering ensures that the AAT number supplied by PIQ 262 and the confirm/cancel result supplied by IRSQ 258 are associated with the same request as they are delivered to IRSH 266 .
- the confirm/cancel result, and the associated AAT number are subsequently delivered from IRSH 266 to speculative handler (SPH) 312 .
- Memory output register (MOR) 318 and memory input data register (MIDR) 314 are the interfaces between MIU 300 and memory 600 . Outgoing requests are sent to memory 600 via MOR 318 , and data received form memory 600 is loaded in MDIR 314 . An entry is allocated for a request in RDBT 306 before that request is issued to memory 600 . The data loaded in MDIR 314 is stored in RDB 316 at an address defined by the entry allocated in RDBT 306 .
- the response to an intervention message may arrive at IVU 250 before the corresponding speculative request has been issued to MIU 300 .
- This condition may happen, for example, when MRQ 310 is full and cannot receive the speculative request at the time when the response to the corresponding intervention message arrives at IVU 250 .
- the AAT entry and the intervention response, i.e., confirm/cancel result associated with that request is delivered to SPH 312 by IRSH 266 .
- the confirm/cancel result is subsequently stored in A2ST 304 .
- a corresponding entry in SPT 302 is allocated and the intervention response, i.e., confirm/cancel result is copied from A2ST 304 to that entry in SPT 302 .
- the intervention response is a speculative confirm
- SPH 312 allocates an entry in RDBT 306 and deallocates the corresponding SPT entry.
- a read request is issued to the memory. If the intervention response is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated.
- the response to an intervention message may arrive at IVU 250 after the corresponding speculative request has been received by MIU 300 . Since the request is already stored in MRQ 310 , it has an assigned entry in SPT 302 . When the request reaches the head of the queue in MRQ 310 , the SPT entry associated with that request is looked-up. If the received intervention response is a speculative confirm, (i) an entry in RDBT 306 is allocated and its confirmed bit is set, (ii) the corresponding SPT entry is deallocated, and (iii) a read request is issued to the memory. If the received intervention is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated. If no intervention response is received after the request reaches the head of the queue in MRQ 310 , the speculative request is issued to the memory.
- MIU arbitration logic 308 arbitrates access to MRQ 310 between IVU 250 and RQU 220 . When no speculative request is made because RMQ 228 is full and it is subsequently determined that the none of the caches contain the requested data, a corresponding request is made to the memory by IVU 250 . This request is received by MIU arbitration logic 308 and delivered to MRQ 310 . This request is thereafter delivered to MOR 318 —via MOR arbitration logic 330 —for later submission to the memory. MRQ 310 is bypassed if it is empty, in turn, causing MIU arbitration logic 308 to transfer the request directly to MOR arbitration logic 308 . SPH 312 transfers confirm/cancel results from SPT 302 to RDBT 306 . MOR arbitration logic 308 gains access to SPT 302 and RDBT 306 via SPH 312 .
- the response to an intervention message may arrive at IVU 250 after the corresponding speculative request has been issued to the memory but before the requested data has been received from the memory.
- the RDBT's corresponding confirmed and canceled bits are both cleared if the intervention response is not known at the time the speculative request is issued to the memory.
- A2ST 304 uses the AAT supplied by PIQ 262 to find the corresponding SPT entry.
- the SPT entry is then used to supply the corresponding RDBT entry.
- the RDBT entry is then updated with the confirm/cancel result of the intervention response.
- the corresponding entry in SPT 320 is then deallocated.
- Each speculative read request that is sent to the memory is allocated a corresponding entry in RDBT 306 .
- the data supplied by the memory in response to the read request is received by MIDR 314 and is subsequently stored in RDB 316 at an address defined by the corresponding entry in RDBT 306 .
- Memory response handler (MRSH) 322 looks-up the status of the corresponding RDBT entry when the data is returned and stored in RDB 306 . If the speculative request has been confirmed, the data is delivered to controller 324 , which in turn, stores this data in memory read data queue (MRDQ) 326 . If the speculative request has been canceled, the RDB/RDBT entries are deallocated after all the segments of the requested data are received from the memory.
- Controller 324 may include a number of queues to accommodate the transfer of the data to MRDQ 326 .
- Data stored in MRDQ 326 is subsequently transferred to response output register (RSOR) 282 .
- RSOR 282 subsequently supplies this data to the requesting core.
- MRSH 322 is triggered to perform the look-up operation in RDBT 306 when the associated data is stored in RDB 316 .
- the response to an intervention message may arrive at IVU 250 after part or the entire data corresponding to the speculative request has been received from the memory. If the intervention response is not known by the time a segment, such as a double-word, of the requested data is received, the transaction is considered as a late completion. Late completions are handled by the late speculation completion handler (LCSH) 320 . If a late completion is marked so as to cancel the speculative request, LCSH 320 retires the RDBT/RDB entry after the entire segments of the requested data are received from the memory. If a late completion is marked so as to confirm the speculative request, LCSH 320 waits for the entire segments of the requested data to arrive.
- LCSH late speculation completion handler
- the received data are thereafter delivered to controller 324 , which in turn, stores this data in MRDQ 336 .
- the data present in MRDQ 336 is supplied to the requesting core after being stored in RSOR 282 .
- LSCH 320 is triggered when it receives the late intervention response.
- An AAT entry may be deallocated if the coherent request does not result in a required memory operation, such as a memory write-back operation to the memory, only if both the intervention response and the corresponding speculative request are received by MIU 300 . If a coherent request results in a required memory access operation, the AAT entry is not deallocated until after the intervention response, the corresponding speculative request, and any required memory access operation resulting from the intervention are all received by MIU 300 . This ensures that, for example, when a write-back to the memory is required, the memory write operation is in the MIU 300 ahead of the speculative request before the AAT entry is deallocated.
- FIGS. 4A , 4 B and 4 C are a flowchart 700 of steps carried out to perform a speculative request, in accordance with one embodiment of the present invention.
- the confirm/cancel result is stored 704 in the A2ST.
- the SPT is updated with the confirm/cancel result 708 .
- the speculative response is a confirm 710
- an RDBT number is allocated and used as an index to the RDBT to store the confirm result 712 .
- the SPT number is deallocated 714 and the request is thereafter sent to the memory.
- the speculative response is a cancel 710 , the request is canceled 718 and the SPT number is deallocated.
- a request stored in the MRQ has an allocated SPT number 750 . If an intervention response is available 754 after the request reaches 752 the top of the queue in the MRQ, the process moves to decision block 710 ( FIG. 4A ). If the intervention response is not available 754 after the request reaches 752 the top of the queue in the MRQ, an RDBT number is allocated to the request 758 , and its confirm/cancel bits are cleared. The request is thereafter sent to the memory 760 .
- the AAT number is used as an index to the A2ST to find the SPT number 772 .
- the SPT number is then used to find the RDBT number 774 . If the speculative response is a confirm 776 , the data received from the memory is transferred to the requesting core 778 , and the RDBT number is deallocated 782 . If the speculative response is a cancel 776 , the data received from the memory is discarded 780 and the RDBT number is deallocated.
- FIG. 5 shows the flow of indices and entries associated with a request between AAT 22 , SPT 302 , A2ST 304 , and RDBT 306 of coherence manager 200 .
- the AAT number allocated to each coherent request and used to track the associated speculative request as described above, is delivered to MIU 300 and IVU 250 .
- An SPT number is allocated by MIU arbitration logic 308 when a slot become available in MRQ 310 for that request; MRQ 310 stores the SPT number.
- An RDBT number is allocated by MOR arbitration logic 330 when the request is issued to the memory via MOR 318 .
- the SPT number is used as an index to store the AAT number in SPT 302
- the AAT number is used as an index to store the SPT number in A2ST 304 .
- the confirm/cancel (CC) result is first stored in A2ST table 340 .
- the SPT number allocated after the request is stored in MRQ 310 , is used as an index to store the AAT number in SPT 302 .
- the AAT number stored in PIQ 262 is used as an index to A2ST 304 to look-up the CC.
- the retrieved CC is copied to SPT 302 at the index defined by the SPT number. If the CC is a speculative confirm, an RDBT 306 number is allocated. The SPT number is then used as an index to SPT 302 to retrieve and copy the CC result to RDBT 306 . If the intervention response is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated.
- An SPT number exists if the request is already stored in MRQ 310 .
- the allocated SPT number is used as an index to SPT 302 to store the AAT number.
- the AAT number stored in PIQ 262 is used as an index to A2ST 304 to look-up the SPT number.
- the SPT number is then used as an index to SPT 302 to store the CC.
- the SPT number is interrogated. If the CC is a speculative confirm, a corresponding number in RDBT 306 is allocated. The RDBT number is used as an index to RDBT 306 to store the CC.
- the corresponding SPT number is deallocated, and a read request is issued to the memory. If the received intervention is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated.
- the response to an intervention message may arrive at IVU 250 after the corresponding speculative request has been issued to the memory but before the response to the request has been received from the memory.
- the AAT and RDBT numbers are stored in the SPT 302 at the index defined by the SPT number.
- the SPT number is stored in A2ST 304 at an index defined by the SPT number.
- the AAT number corresponding to the received CC is used to identify the corresponding SPT number in A2ST table 304 .
- the SPT number so identified is then used to find the corresponding RDBT number.
- the RDBT number is subsequently used as an index to RDBT 306 to store the CC result; the corresponding SPT number is then deallocated.
- the CC is a confirm, the data supplied by the memory is delivered to RSU 280 .
- If the CC is a cancel the RDB/RDBT entries are deallocated after all the segments of the requested data are received from the memory.
- the response to an intervention message may arrive at IVU 250 after part or the entire data corresponding to the speculative request has been received from the memory.
- the AAT and RDBT numbers are stored in the SPT 302 at the index defined by the SPT number.
- the SPT number is stored in A2ST 304 at an index defined by the SPT number.
- the AAT number corresponding to the received CC is used to identify the corresponding SPT number in A2ST table 304 .
- the SPT number so identified is then used to find the corresponding RDBT number.
- the RDBT number is subsequently used as an index to RDBT 306 to store the CC result. If the CC is a confirm, the data supplied by the memory is delivered to RSU 280 by LSCH 320 . If the CC is a cancel, the RDB/RDBT entries are deallocated after all the segments of the requested data are received from the memory.
- the queues, tables and the ports in coherence manager 200 are configurable to support different sizes and optimize power consumption and performance.
- a deadlock condition may thus occur when the sum of the sizes of IQ 252 and PIQ 262 is greater than the sum of the sizes of RDBT 306 and MRQ 310 .
- SAR 224 is adapted so as not issue any speculative requests unless the number of unresolved speculative requests, i.e., speculative requests for which the associated intervention responses have not yet been completed, is less than the total number of entries in RDBT 306 . For example, if RDBT 306 has a capacity to hold 16 entries, no more than 15 unresolved speculative requests may be pending at any given time.
- a first stream of requests 400 followed by a second stream of requests 402 are delivered to request unit 220 .
- RMQ 228 is full when the first stream of requests (RS) 400 is received; therefore these requests are not stored in RMQ 228 and are not speculated.
- RMQ 228 is empty when RS 402 is received; therefore these requests are stored in RMQ 228 and are speculated.
- Intervention messages (IM) 500 and 502 are forwarded to IVU 250 .
- Intervention responses (IR) 600 and 602 are assumed to be respectively associated with IMs 500 and 502 .
- RS 402 is subsequently transferred to and stored in MRQ 310 .
- a first portion 402 a of RS 402 is issued to the memory and their associated entries allocated in RDBT 306 fill all the slots in RDBT 306 .
- MRQ 310 is filled with the remaining portion 402 b of RS 402 , as well as with non-coherent stream of request 404 that subsequently arrive.
- MRQ 310 is full and cannot accept any new requests unless there is an entry available in RDBT 306 .
- the entries in RDBT 306 cannot be cleared since the confirm/cancel results needed to clear these requests are present in IR 602 , which is stuck behind IR 600 . Accordingly, a deadlock is created where no entry can be cleared and no movement of requests can flow through the coherence manager 200 .
- SAR 224 is adapted so as not issue any speculative requests unless the number of unresolved speculative requests is at least one less than the total number of entries in RDBT 306 .
- the cache coherence manager includes, in part, a multitude of configurable ports, a multitude of configurable tables, and a multitude of configurable queues.
- the configurability of the ports enables a user to select the number of desired ports. For example, assume that the coherence manager has 16 configurable ports. A user may, however, need only four such ports to adapt the coherence manager to a microprocessor having four cores. The user accordingly configures the coherence manager so as to use only four of the 16 ports.
- the configurability of the tables and queues enable a user to define their respective sizes to balance the competing goals of achieving optimum processor performance and minimizing the die size.
- FIG. 7 illustrates an exemplary computer system 1000 in which the present invention may be embodied.
- Computer system 1000 typically includes one or more output devices 1100 , including display devices such as a CRT, LCD, OLED, LED, gas plasma, electronic ink, or other types of displays, speakers and other audio output devices; and haptic output devices such as vibrating actuators; computer 1200 ; a keyboard 1300 ; input devices 1400 ; and a network interface 1500 .
- Input devices 1400 may include a computer mouse, a trackball, joystick, track pad, graphics tablet, touch screen, microphone, various sensors, and/or other wired or wireless input devices that allow a user or the environment to interact with computer system 1000 .
- Network interface 1500 typically provides wired or wireless communication with an electronic communications network, such as a local area network, a wide area network, for example the Internet, and/or virtual networks, for example a virtual private network (VPN).
- Network interface 1500 can implement one or more wired or wireless networking technologies, including Ethernet, one or more of the 802.11 standards, Bluetooth, and ultra-wideband networking technologies.
- Computer 1200 typically includes components such as one or more general purpose processors 1600 , and memory storage devices, such as a random access memory (RAM) 1700 and non-volatile memory 1800 .
- Non-volatile memory 1800 can include floppy disks; fixed or removable hard disks; optical storage media such as DVD-ROM, CD-ROM, and bar codes; non-volatile semiconductor memory devices such as flash memories; read-only-memories (ROMS); battery-backed volatile memories; paper or other printing mediums; and networked storage devices.
- System bus 1900 interconnects the above components.
- Processors 1600 may be a multi-processor system such as multi-processor 100 described above.
- RAM 1700 and non-volatile memory 1800 are examples of tangible media for storage of data, audio/video files, computer programs, applet interpreters or compilers, virtual machines, and embodiments of the present invention described above.
- the above described embodiments of the processors of the present invention may be represented as human-readable or computer-usable programs and data files that enable the design, description, modeling, simulation, testing, integration, and/or fabrication of integrated circuits and/or computer systems.
- Such programs and data files may be used to implement embodiments of the invention as separate integrated circuits or used to integrate embodiments of the invention with other components to form combined integrated circuits, such as microprocessors, microcontrollers, system on a chip (SoC), digital signal processors, embedded processors, or application specific integrated circuits (ASICs).
- SoC system on a chip
- ASICs application specific integrated circuits
- Programs and data files expressing embodiments of the present invention may use general-purpose programming or scripting languages, such as C or C++; hardware description languages, such as VHDL or Verilog; microcode implemented in RAM, ROM, or hard-wired and adapted to control and coordinate the operation of components within a processor or other integrated circuit; and/or standard or proprietary format data files suitable for use with electronic design automation software applications known in the art.
- Programs and data files can express embodiments of the invention at various levels of abstraction, including as a functional description, as a synthesized netlist of logic gates and other circuit components, and as an integrated circuit layout or set of masks suitable for use with semiconductor fabrication processes. These programs and data files can be processed by electronic design automation software executed by a computer to design a processor and generate masks for its fabrication.
- Computer 1200 can include specialized input, output, and communications subsystems for configuring, operating, simulating, testing, and communicating with specialized hardware and software used in the design, testing, and fabrication of integrated circuits.
- processors may have more or fewer than four cores.
- the arrangement and the number of the various devices shown in the block diagrams are for clarity and ease of understanding. It is understood that combinations of blocks, additions of new blocks, re-arrangement of blocks, and the like fall within alternative embodiments of the present invention.
- any number of I/Os, coherent multi-core processors, system memories, L2 and L3 caches, and non-coherent cached or cacheless processing cores may also be used.
- a semiconductor intellectual property core such as a microprocessor core (e.g. expressed as a hardware description language description or a synthesized netlist) and transformed to hardware in the production of integrated circuits.
- a microprocessor core e.g. expressed as a hardware description language description or a synthesized netlist
- the embodiments of the present invention may be implemented using combinations of hardware and software, including micro-code suitable for execution within a processor.
Abstract
Description
- The present invention relates to multiprocessor systems, and more particularly to performing a speculative request in a cache coherent multi-core microprocessor system.
- Advances in semiconductor fabrication technology have given rise to considerable increases in microprocessor clock speeds. Although the same advances have also resulted in improvements in memory density and access times, the disparity between microprocessor clock speeds and memory access times continues to persist. To reduce latency, often one or more levels of high-speed cache memory are used to hold a subset of the data or instructions that are stored in the main memory. A number of techniques have been developed to increase the likelihood that the data/instructions held in the cache are repeatedly used by the microprocessor.
- To improve performance at any given operating frequency, microprocessors with a multitude of cores that execute instructions in parallel have been developed. The cores may be integrated within the same semiconductor die, or may be formed on different semiconductor dies coupled to one another within a package, or a combination of the two. Each core typically includes its own level-1 cache and an optional level-2 cache.
- A cache coherency protocol governs the traffic flow between the memory and the caches associated with the cores to ensure coherency between them. For example, the cache coherency protocol ensures that if a copy of a data item is modified in one of the caches, copies of the same data item stored in other caches and in the main memory are invalidated or updated in accordance with the modification.
- In order to reduce the average latency associated with a coherent read request, a technique commonly referred to as speculative read may be used. In accordance with this technique, concurrently with searching for the requested data in the caches, a speculative read request is also issued to the memory. If the requested data is stored in any of the caches, the speculative read is cancelled. If the requested data is not stored in any of the caches, the speculative read is confirmed and the data identified by the confirmed request is transferred from the memory to the requesting core.
- In accordance with one embodiment of the present invention, a multi-core microprocessor includes, in part, a cache coherence manager that maintains cache coherence among the multitude of cores and also minimizes latency associated with performing coherent requests. The cache coherence manager includes, in part, a request unit, an intervention unit, a response unit, and a memory interface unit. The request unit is configured to selectively issue a speculative request in response to a coherent request received from one of the cores. The intervention unit is configured to send an intervention message associated with the coherent request to the cores. The memory interface unit is configured to receive the speculative request and to selectively cancel or forward the speculative request to a memory.
- In one embodiment, the memory interface unit includes at least three tables. An entry in the first table is an index to the second table. The entry in the second table is an index to the third table. The entry in the first table is allocated when a response to the intervention message is stored in the first table before the speculative request is stored in the memory interface unit. The entry in the second table is allocated when the request is stored in the memory interface unit. The entry in the third table is allocated when the speculative request is issued to the memory.
- In one embodiment, the request unit includes, in part, a fourth table storing a multitude of addresses, and a logic block configured to compare an address associated with the request to the multitude of addresses stored in the fourth table. Each address stored in the fourth table is associated with a pending coherent request. If an address match is not detected, the logic block issues the speculative request and assigns an identifier thereto. The identifier is used as an index to the first entry in the first table. In another embodiment, the logic block issues the speculative request first, assigns a corresponding identifier, and subsequently compares the requested address to the addresses stored in the fourth table. If an address match is detected, the logic block cancels the speculative request. In one embodiment, the request unit does not issue a speculative request unless the number of unresolved speculative requests is less than the total number of entries of the third table.
- In accordance with one embodiment of the present invention, a method of operating a multi-core microprocessor having disposed therein a cache coherence manager includes, in part, receiving a coherent request from one of the cores, selectively issuing a speculative request in response, sending an intervention message associated with the coherent request to the cores, and selectively sending the issued speculative request to a memory.
- In one embodiment, the memory interface unit includes at least three tables. An entry in the first table is an index to the second table. The entry in the second table is an index to the third table. The entry in the first table is allocated when a response to the intervention message arrives at the first table before the corresponding request is stored the memory interface unit. The entry in the second table is allocated when the speculative request is stored in the memory interface unit. The entry in the third table is allocated when the speculative request is issued to the memory.
- In one embodiment, the address associated with the coherent request is compared to a multitude of addresses stored in a fourth table. Each address stored in the fourth table is associated with a pending coherent request. If an address match is not detected, the request is speculatively issued and an identifier is assigned to this request. The identifier is used as an index to the first entry in the first table. In another embodiment, the speculative request is first issued and a corresponding identifier is assigned. If a match is thereafter detected between the address associated with the request and any one of the addresses stored in the fourth table, the speculative request is canceled. In one embodiment, the coherent request is not speculatively issued unless the number of unresolved speculative requests is less than the total number of entries of the third table.
-
FIG. 1 shows a multi-core microprocessor, in communication with a number of I/O devices and a system memory, in accordance with one embodiment of the present invention. -
FIG. 2 is a block diagram of the cache coherence manger disposed in the microprocessor ofFIG. 1 , in accordance with one embodiment of the present invention. -
FIG. 3 is a more detailed block diagram of the cache coherence manager ofFIG. 2 , in accordance with one embodiment of the present invention. -
FIGS. 4A , 4B and 4C form a flowchart showing a speculative request, in accordance with one embodiment of the present invention. -
FIG. 5 shows the flow of indices and data between a number of tables disposed in the cache coherence manager of inFIG. 3 . -
FIG. 6 shows the flow of speculative and non-speculative requests that may lead to a deadlock condition and which the present invention is adapted to inhibit. -
FIG. 7 shows an exemplary computer system in which the present invention may be embodied. - In accordance with one embodiment of the present invention, a multi-core microprocessor includes, in part, a cache coherence manager that maintains coherence among the multitude of microprocessor cores, and further minimizes the latency associated with coherent read requests. The cache coherence manager includes, in part a request unit, an intervention unit, a memory interface unit and a response unit. The cache coherence manager supports speculative reads and includes an indexing scheme that efficiently manages the processing of the speculative requests and the corresponding intervention messages forwarded to and received from the cores.
-
FIG. 1 is a block diagram of amicroprocessor 100, in accordance with one exemplary embodiment of the present invention, that is in communication withsystem memory 600 and I/O units system bus 630. Microprocessor (hereinafter alternatively referred to as processor) 100 is shown as including, in part, four cores 105 1, 105 2, 105 3 and 105 4, acache coherency manger 200, and an optional level-2 (L2)cache 605. Each core 105 i, where i is an integer ranging from 1 to N, where in this embodiment N=4, is shown as including, in part, a processing core 110 i, an L1 cache 115 i, and a cache control logic 120 i. Although exemplary embodiment ofprocessor 100 is shown as including four cores, it is understood that other embodiments ofprocessor 100 may include more or fewer than four cores. - Each processing core 110 i is adapted to perform a multitude of fixed or flexible sequence of operations in response to program instructions. Each processing core 110 i may conform to either CISC and/or RISC architectures to process scalar or vector data types using SISD or SIMD instructions. Each processing core 110 i may include general purpose and specialized register files and execution units configured to perform logic, arithmetic, and any other type of data processing functions. The processing cores 110 1, 110 2, 110 3 and 110 4, which are collectively referred to as either processing cores 110 i or processing cores 110, may be configured to perform identical functions, or may alternatively be configured to perform different functions adapted to different applications. Processing cores 110 may be single-threaded or multi-threaded, i.e., capable of executing multiple sequences of program instructions in parallel.
- Each core 105 i is shown as including a level-1 (L1) cache. In other embodiments, each core 110 i may include more levels of cache, e.g., level 2, level 3, etc. Each cache 115 i may include instructions and/or data. Each cache 115 i is typically organized to include a multitude of cache lines, with each line adapted to store a copy of the data corresponding with one or more virtual or physical memory addresses. Each cache line also stores additional information used to manage that cache line. Such additional information includes, for example, tag information used to identify the main memory address associated with the cache line, and cache coherency information used to synchronize the data in the cache line with other caches and/or with the main system memory. The cache tag may be formed from all or a portion of the memory address associated with the cache line.
- Each L1 cache 115 i is coupled to its associated processing core 110 i via a bus 125 i. Each bus 125 i includes a multitude of signal lines for carrying data and/or instructions. Each core 105 i is also shown as including a cache control logic 120 i to facilitate data transfer to and from its associated cache 115 i. Each cache 115 i may be fully associative, set associative with two or more ways, or direct mapped. For clarity, each cache 115 i is shown as a single cache memory for storing data and instructions required by core 105 i. Although not shown, it is understood that each core 105 i may include an L1 cache for storing data, and an L1 cache for storing instructions.
- Each cache 115 i is partitioned into a number of cache lines, with each cache line corresponding to a range of adjacent locations in shared
system memory 300. In one embodiment, each line of each cache, for example cache 115 1, includes data to facilitate coherency between, e.g., cache 151 1,main memory 600 and any other caches 115 2, 115 3, 115 4, intended to remain coherent with cache 115 1, as described further below. For example, in accordance with the MESI cache coherency protocol, each cache line is marked as being modified “M”, exclusive “E”, Shared “S”, or Invalid “I”, as is well known. Other cache coherency protocols, such as MSI, MOSI, and MOESI coherency protocols, are also supported by the embodiments of the present invention. - Each core 105 i is coupled to a
cache coherency manager 200 via an associated bus 135 i.Cache coherency manager 200 facilitates transfer of instructions and/or data between cores 105 i,system memory 600, I/O units L2 cache 605.Cache coherency manager 200 establishes the global ordering of requests, sends intervention requests, collects the responses to such requests, and sends the requested data back to the requesting core.Cache coherency manager 200 orders the requests so as to optimize memory accesses, load balance the requests, give priority to one or more cores over the other cores, and/or give priority to one or more types of requests over other requests. Although not shown, in some embodiments, one or more of cores 105 i include a dedicated Level-2 (L2) cache when optional sharedL2 cache 605 is not used. -
FIG. 2 is a block diagram ofcache coherency manager 200, in accordance with one embodiment of the present invention.Cache coherency manager 200 is shown as including, in part, arequest unit 220, anintervention unit 250, aresponse unit 280, and amemory interface unit 300.Request unit 220 includesinput ports 202 adapted to receive, for example, read requests, write requests, write-back requests and any other cache memory related requests from cores 105 i.Request unit 220 serializes the requests it receives from cores 105 i and sends non-coherent read/write requests, speculative coherent read requests, as well as explicit and implicit writeback requests of modified cache data tomemory interface unit 300 viaport 204.Request unit 220 sends coherent requests tointervention unit 250 viaport 216. In order to avoid a read after write hazard, the read address is compared against pending coherent requests that can generate write operations. If a match is detected as a result of this comparison, the read request is not started speculatively. - In response to the coherent intervention requests received from
request unit 220,intervention unit 250 issues an intervention message viaoutput ports 212. A hit will cause the data to return to the intervention unit via input ports 245. In another embodiment, the requested data is returned to theintervention unit 208.Intervention unit 250 subsequently forwards this data to response unit 205 viaoutput ports 218.Response unit 280 forwards this data to the requesting (originating the request) core viaoutput ports 212. If there is a cache miss and the read request was not performed speculatively,intervention unit 250 requests access to this data by sending a coherent read or write request tomemory interface unit 300 viaoutput ports 206. A read request may proceed without speculation when, for example, a request memory buffer disposed inrequest unit 220 and adapted to store and transfer the requests tomemory interface unit 300 is full. -
Memory interface unit 300 receives non-coherent read/write requests fromrequest unit 220, as well as speculative requests and writeback requests fromintervention unit 250. In response,memory interface unit 300 accessessystem memory 600 and/or higher level cache memories such asL2 cache 605 via input/output ports 255 to complete these requests. The data retrieved frommemory 600 and/or higher level cache memories in response to such memory requests is forwarded to response unit 215 via output port 260. The response unit 215 returns the data requested by the requesting core via output ports 265. As is understood, the requested data may have been retrieved from an L1 cache of another core, frommemory 600, or from optional higher level cache memories. -
FIG. 3 is a more detailed view ofcache coherence manager 200 disposed in a microprocessor having N cores, in accordance with one embodiment of the present invention. Referring toFIGS. 1 and 3 concurrently, in order to reduce average latency of a coherent read request from any of the N cores 105 i, where i is an integer varying from 1 to N,coherence manager 200 issues speculative read requests tomemory 600 viamemory interface unit 300. The speculative read assumes that the requested data will not be found in any of the cores. If the requested data is found in response to the intervention message, the speculative read is canceled if it has not yet been issued bymemory interface unit 300, or alternatively the response is dropped when it returns fromsystem memory 600. - The response to an intervention message may arrive at the
intervention unit 250 at different points in time relative to the speculative request. The request may still be in therequest unit 220 when the response to the associated intervention message is received by theintervention unit 250. The request may be in thememory interface unit 300 when the response to the associated intervention message is received byintervention unit 250. The request may have been issued to the memory by the time the response to the associated intervention message arrives at theintervention unit 250. A number of data segments associated with the speculative read request may have been received by thememory interface unit 300 before the response to the associated intervention message is received by theintervention unit 250.Coherence manager 200 is configured to handle speculative requests for all possible timing conditions described above, notwithstanding the outcome of the intervention message, i.e., cancel or confirm. - Incoming coherent requests are serialized by serialized address register (SAR) 224 disposed in
request unit 220. In one embodiment, the cache line address associated with each request is compared to the entries stored in the active address table (AAT) 222. An address match indicates that a coherent request is already pending for that address and hence no speculative request is issued for that request. If no address match is detected and a slot is available in request memory queue (RMQ) 228, serialized request handler (SRH) 226 loads the request inRMQ 228. If a slot is not available inRMQ 228, no speculative request is issued for that request even if no speculation is detected as being in progress for that address. Furthermore, ifRQU 220 receives a coherent request that was erroneously issued due, for example, to software error, then RQU 220 will not issue a corresponding speculative request. Similarly, ifRQU 220 issues a speculative request that bypasses theRMQ 228—viasignal line 230—and subsequently detects an error with this request,RQU 220 will cancel this request.AAT 222 performs two functions. First, it keeps track of active coherent requests to inhibit read-after-write hazards. An intervention response to a coherent request may result in a read or write operation to the memory.AAT 222 is used to ensure that a speculative read to the same address does not occur before the updated data is written to the memory, thereby to avoid read-after-write hazards. Second,AAT 222 is used to tag the speculative requests to enable their identification as they flow between theIVU 250,RQU 220 andMIU 300. - The AAT number associated with a speculative request travels with that request to
MIU 300. Accordingly,RMQ 228 stores both the speculative request as well as the AAT number associated with that request. In another embodiment,SAR 224 issues the speculative request before it looks-up the address inAAT 222. One clock cycle later, if the look-up inAAT 222 indicates that an earlier issued speculative request is still pending for that address, the newly issued speculative request is canceled. A speculative request issued before anAAT 222 look-up may get stored in memory request queue (MRQ) 310 or get issued to the memory. Under both conditions, the speculative request is canceled if thesubsequent AAT 222 look-up results in an address match. The cancellation of such a request results in deallocation of any corresponding numbers that may have been assigned to that request in speculative table 302 and/or request data buffer table 306. The process of allocating and deallocating numbers in various tables disposed inmemory interface unit 300 are described in detail below. - Coherent read requests are received from
SRH 226 and stored in intervention queue (IQ) 252. Corresponding intervention messages are issued after these requests are stored in interventionoutput request queue 256. Intervention messages that have been forwarded to the cores are stored in pending intervention queue (PIQ) 262, and responses to these intervention messages are stored inintervention response queue 258. The AAT number associated with a request is stored inPIQ 262. -
Memory interface unit 300 includes, in part, a speculative table (SPT) 302, an active address to speculation table (A2ST) 304, and a read data buffer table (RDBT) 306. As described further below,SPT 302 tracks the confirm/cancel results for speculative requests that have been received byMIU 300. An entry (alternatively referred to herein as number) inSPT 302 is allocated when a speculative request is received byMIU 300 fromRQU 220. The SPT entry is loaded into memory request queue (MRQ) 310 together with the request. Response data buffer table (RDBT) 306 tracks information associated with the requests that have been issued to the memory. An entry inRDBT 306 is allocated for every read request that is issued to the memory. The response to a read request is stored in the response data buffer (RDB) 316 at an address defined by the corresponding entry inRDBT 306.A2ST 304 performs two functions. First, for each AAT entry,A2ST 304 supplies the corresponding SPT entry. Second, A2ST temporarily stores the confirm/cancel result for any speculative request that has not been received byMIU 300 and for which an SPT entry has not yet been allocated. In such conditions, when the request is received byMIU 300, the confirm/cancel result is copied fromA2ST 304 to the associated SPT entry newly allocated. When a request stored inRMQ 228 is delivered and stored inMRQ 310, the AAT number associated with that request is also stored in the newly allocated entry inSPT 302. - Confirm/cancel results returned in response to an intervention message and the AAT entry associated with the corresponding read request are delivered from Intervention Response Handler (IRSH) 266 to speculative handler (SPH) 312. The confirm/cancel results are received in the same order as they are transmitted. This ordering ensures that the AAT number supplied by
PIQ 262 and the confirm/cancel result supplied byIRSQ 258 are associated with the same request as they are delivered toIRSH 266. The confirm/cancel result, and the associated AAT number are subsequently delivered fromIRSH 266 to speculative handler (SPH) 312. - Memory output register (MOR) 318 and memory input data register (MIDR) 314 are the interfaces between
MIU 300 andmemory 600. Outgoing requests are sent tomemory 600 viaMOR 318, and data receivedform memory 600 is loaded inMDIR 314. An entry is allocated for a request inRDBT 306 before that request is issued tomemory 600. The data loaded inMDIR 314 is stored inRDB 316 at an address defined by the entry allocated inRDBT 306. - The response to an intervention message may arrive at
IVU 250 before the corresponding speculative request has been issued toMIU 300. This condition may happen, for example, whenMRQ 310 is full and cannot receive the speculative request at the time when the response to the corresponding intervention message arrives atIVU 250. To handle such conditions, the AAT entry and the intervention response, i.e., confirm/cancel result, associated with that request is delivered toSPH 312 byIRSH 266. The confirm/cancel result is subsequently stored inA2ST 304. After the speculative request is stored inMRQ 310, a corresponding entry inSPT 302 is allocated and the intervention response, i.e., confirm/cancel result is copied fromA2ST 304 to that entry inSPT 302. If the intervention response is a speculative confirm,SPH 312 allocates an entry inRDBT 306 and deallocates the corresponding SPT entry. Subsequently, a read request is issued to the memory. If the intervention response is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated. - The response to an intervention message may arrive at
IVU 250 after the corresponding speculative request has been received byMIU 300. Since the request is already stored inMRQ 310, it has an assigned entry inSPT 302. When the request reaches the head of the queue inMRQ 310, the SPT entry associated with that request is looked-up. If the received intervention response is a speculative confirm, (i) an entry inRDBT 306 is allocated and its confirmed bit is set, (ii) the corresponding SPT entry is deallocated, and (iii) a read request is issued to the memory. If the received intervention is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated. If no intervention response is received after the request reaches the head of the queue inMRQ 310, the speculative request is issued to the memory. -
MIU arbitration logic 308 arbitrates access toMRQ 310 betweenIVU 250 andRQU 220. When no speculative request is made becauseRMQ 228 is full and it is subsequently determined that the none of the caches contain the requested data, a corresponding request is made to the memory byIVU 250. This request is received byMIU arbitration logic 308 and delivered toMRQ 310. This request is thereafter delivered toMOR 318—viaMOR arbitration logic 330—for later submission to the memory.MRQ 310 is bypassed if it is empty, in turn, causingMIU arbitration logic 308 to transfer the request directly toMOR arbitration logic 308.SPH 312 transfers confirm/cancel results fromSPT 302 toRDBT 306.MOR arbitration logic 308 gains access toSPT 302 andRDBT 306 viaSPH 312. - The response to an intervention message may arrive at
IVU 250 after the corresponding speculative request has been issued to the memory but before the requested data has been received from the memory. The RDBT's corresponding confirmed and canceled bits are both cleared if the intervention response is not known at the time the speculative request is issued to the memory. When the intervention response is received bySPH 312,A2ST 304 uses the AAT supplied byPIQ 262 to find the corresponding SPT entry. The SPT entry is then used to supply the corresponding RDBT entry. The RDBT entry is then updated with the confirm/cancel result of the intervention response. The corresponding entry inSPT 320 is then deallocated. Each speculative read request that is sent to the memory is allocated a corresponding entry inRDBT 306. The data supplied by the memory in response to the read request is received byMIDR 314 and is subsequently stored inRDB 316 at an address defined by the corresponding entry inRDBT 306. Memory response handler (MRSH) 322 looks-up the status of the corresponding RDBT entry when the data is returned and stored inRDB 306. If the speculative request has been confirmed, the data is delivered tocontroller 324, which in turn, stores this data in memory read data queue (MRDQ) 326. If the speculative request has been canceled, the RDB/RDBT entries are deallocated after all the segments of the requested data are received from the memory.Controller 324 may include a number of queues to accommodate the transfer of the data toMRDQ 326. Data stored inMRDQ 326 is subsequently transferred to response output register (RSOR) 282.RSOR 282 subsequently supplies this data to the requesting core.MRSH 322 is triggered to perform the look-up operation inRDBT 306 when the associated data is stored inRDB 316. - The response to an intervention message may arrive at
IVU 250 after part or the entire data corresponding to the speculative request has been received from the memory. If the intervention response is not known by the time a segment, such as a double-word, of the requested data is received, the transaction is considered as a late completion. Late completions are handled by the late speculation completion handler (LCSH) 320. If a late completion is marked so as to cancel the speculative request,LCSH 320 retires the RDBT/RDB entry after the entire segments of the requested data are received from the memory. If a late completion is marked so as to confirm the speculative request,LCSH 320 waits for the entire segments of the requested data to arrive. The received data are thereafter delivered tocontroller 324, which in turn, stores this data in MRDQ 336. The data present in MRDQ 336 is supplied to the requesting core after being stored inRSOR 282.LSCH 320 is triggered when it receives the late intervention response. An AAT entry may be deallocated if the coherent request does not result in a required memory operation, such as a memory write-back operation to the memory, only if both the intervention response and the corresponding speculative request are received byMIU 300. If a coherent request results in a required memory access operation, the AAT entry is not deallocated until after the intervention response, the corresponding speculative request, and any required memory access operation resulting from the intervention are all received byMIU 300. This ensures that, for example, when a write-back to the memory is required, the memory write operation is in theMIU 300 ahead of the speculative request before the AAT entry is deallocated. -
FIGS. 4A , 4B and 4C are aflowchart 700 of steps carried out to perform a speculative request, in accordance with one embodiment of the present invention. Referring toFIG. 4A , if the request is determined to be in the request unit when the response to the intervention message is received 702, the confirm/cancel result is stored 704 in the A2ST. After the request is stored 706 in the MRQ, the SPT is updated with the confirm/cancelresult 708. If the speculative response is aconfirm 710, an RDBT number is allocated and used as an index to the RDBT to store theconfirm result 712. The SPT number is deallocated 714 and the request is thereafter sent to the memory. If the speculative response is a cancel 710, the request is canceled 718 and the SPT number is deallocated. - Referring to
FIG. 4B , a request stored in the MRQ has an allocatedSPT number 750. If an intervention response is available 754 after the request reaches 752 the top of the queue in the MRQ, the process moves to decision block 710 (FIG. 4A ). If the intervention response is not available 754 after the request reaches 752 the top of the queue in the MRQ, an RDBT number is allocated to therequest 758, and its confirm/cancel bits are cleared. The request is thereafter sent to thememory 760. - Referring to
FIG. 4C , after the confirm/cancel result is received 770, the AAT number is used as an index to the A2ST to find theSPT number 772. The SPT number is then used to find theRDBT number 774. If the speculative response is aconfirm 776, the data received from the memory is transferred to the requestingcore 778, and the RDBT number is deallocated 782. If the speculative response is a cancel 776, the data received from the memory is discarded 780 and the RDBT number is deallocated. -
FIG. 5 shows the flow of indices and entries associated with a request between AAT 22,SPT 302,A2ST 304, andRDBT 306 ofcoherence manager 200. For clarity, only a few of the blocks disposed incoherence manager 200 are shown inFIG. 4 . The AAT number, allocated to each coherent request and used to track the associated speculative request as described above, is delivered toMIU 300 andIVU 250. An SPT number is allocated byMIU arbitration logic 308 when a slot become available inMRQ 310 for that request;MRQ 310 stores the SPT number. An RDBT number is allocated byMOR arbitration logic 330 when the request is issued to the memory viaMOR 318. The SPT number is used as an index to store the AAT number inSPT 302, and the AAT number is used as an index to store the SPT number inA2ST 304. - When the response to an intervention message arrives at
IVU 250 while the speculative read request is still inRMQ 228, the confirm/cancel (CC) result is first stored in A2ST table 340. The SPT number allocated after the request is stored inMRQ 310, is used as an index to store the AAT number inSPT 302. The AAT number stored inPIQ 262 is used as an index to A2ST 304 to look-up the CC. The retrieved CC is copied toSPT 302 at the index defined by the SPT number. If the CC is a speculative confirm, anRDBT 306 number is allocated. The SPT number is then used as an index toSPT 302 to retrieve and copy the CC result to RDBT 306. If the intervention response is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated. - An SPT number exists if the request is already stored in
MRQ 310. The allocated SPT number is used as an index toSPT 302 to store the AAT number. When the CC is received, the AAT number stored inPIQ 262 is used as an index to A2ST 304 to look-up the SPT number. The SPT number is then used as an index toSPT 302 to store the CC. When the request reaches the head of the queue inMRQ 310, the SPT number is interrogated. If the CC is a speculative confirm, a corresponding number inRDBT 306 is allocated. The RDBT number is used as an index to RDBT 306 to store the CC. The corresponding SPT number is deallocated, and a read request is issued to the memory. If the received intervention is a speculative cancel, the speculative request is canceled and the corresponding SPT entry is deallocated. - The response to an intervention message may arrive at
IVU 250 after the corresponding speculative request has been issued to the memory but before the response to the request has been received from the memory. In such conditions, the AAT and RDBT numbers are stored in theSPT 302 at the index defined by the SPT number. The SPT number is stored inA2ST 304 at an index defined by the SPT number. The AAT number corresponding to the received CC is used to identify the corresponding SPT number in A2ST table 304. The SPT number so identified is then used to find the corresponding RDBT number. The RDBT number is subsequently used as an index to RDBT 306 to store the CC result; the corresponding SPT number is then deallocated. If the CC is a confirm, the data supplied by the memory is delivered toRSU 280. If the CC is a cancel, the RDB/RDBT entries are deallocated after all the segments of the requested data are received from the memory. - The response to an intervention message may arrive at
IVU 250 after part or the entire data corresponding to the speculative request has been received from the memory. In such conditions, the AAT and RDBT numbers are stored in theSPT 302 at the index defined by the SPT number. The SPT number is stored inA2ST 304 at an index defined by the SPT number. The AAT number corresponding to the received CC is used to identify the corresponding SPT number in A2ST table 304. The SPT number so identified is then used to find the corresponding RDBT number. The RDBT number is subsequently used as an index to RDBT 306 to store the CC result. If the CC is a confirm, the data supplied by the memory is delivered toRSU 280 byLSCH 320. If the CC is a cancel, the RDB/RDBT entries are deallocated after all the segments of the requested data are received from the memory. - As described above, in accordance with one embodiment of the present invention, the queues, tables and the ports in
coherence manager 200 are configurable to support different sizes and optimize power consumption and performance. A deadlock condition may thus occur when the sum of the sizes ofIQ 252 andPIQ 262 is greater than the sum of the sizes ofRDBT 306 andMRQ 310. To prevent this condition from occurring, in accordance with one embodiment of the present invention,SAR 224 is adapted so as not issue any speculative requests unless the number of unresolved speculative requests, i.e., speculative requests for which the associated intervention responses have not yet been completed, is less than the total number of entries inRDBT 306. For example, ifRDBT 306 has a capacity to hold 16 entries, no more than 15 unresolved speculative requests may be pending at any given time. - Referring to
FIGS. 3 and 6 concurrently, assume that a first stream ofrequests 400 followed by a second stream ofrequests 402 are delivered to requestunit 220. Assume thatRMQ 228 is full when the first stream of requests (RS) 400 is received; therefore these requests are not stored inRMQ 228 and are not speculated. Assume thatRMQ 228 is empty whenRS 402 is received; therefore these requests are stored inRMQ 228 and are speculated. Intervention messages (IM) 500 and 502, corresponding respectively to requeststreams IVU 250. Intervention responses (IR) 600 and 602 are assumed to be respectively associated withIMs -
RS 402 is subsequently transferred to and stored inMRQ 310. Assume that afirst portion 402 a ofRS 402 is issued to the memory and their associated entries allocated inRDBT 306 fill all the slots inRDBT 306. Assume thatMRQ 310 is filled with the remainingportion 402 b ofRS 402, as well as with non-coherent stream ofrequest 404 that subsequently arrive. - Assume that one or more of the responses in
IR 600 contain cache misses. Because no speculative requests were issued for the associatedRS 400, these requests must be supplied to the memory and thus must first be written inMRQ 310. However,MRQ 310 is full and cannot accept any new requests unless there is an entry available inRDBT 306. The entries inRDBT 306 cannot be cleared since the confirm/cancel results needed to clear these requests are present inIR 602, which is stuck behindIR 600. Accordingly, a deadlock is created where no entry can be cleared and no movement of requests can flow through thecoherence manager 200. To prevent such deadlocks, in accordance with one embodiment of the present invention,SAR 224 is adapted so as not issue any speculative requests unless the number of unresolved speculative requests is at least one less than the total number of entries inRDBT 306. - In accordance with one embodiment of the present invention, the cache coherence manager includes, in part, a multitude of configurable ports, a multitude of configurable tables, and a multitude of configurable queues. The configurability of the ports enables a user to select the number of desired ports. For example, assume that the coherence manager has 16 configurable ports. A user may, however, need only four such ports to adapt the coherence manager to a microprocessor having four cores. The user accordingly configures the coherence manager so as to use only four of the 16 ports. The configurability of the tables and queues enable a user to define their respective sizes to balance the competing goals of achieving optimum processor performance and minimizing the die size.
-
FIG. 7 illustrates an exemplary computer system 1000 in which the present invention may be embodied. Computer system 1000 typically includes one ormore output devices 1100, including display devices such as a CRT, LCD, OLED, LED, gas plasma, electronic ink, or other types of displays, speakers and other audio output devices; and haptic output devices such as vibrating actuators;computer 1200; akeyboard 1300;input devices 1400; and anetwork interface 1500.Input devices 1400 may include a computer mouse, a trackball, joystick, track pad, graphics tablet, touch screen, microphone, various sensors, and/or other wired or wireless input devices that allow a user or the environment to interact with computer system 1000.Network interface 1500 typically provides wired or wireless communication with an electronic communications network, such as a local area network, a wide area network, for example the Internet, and/or virtual networks, for example a virtual private network (VPN).Network interface 1500 can implement one or more wired or wireless networking technologies, including Ethernet, one or more of the 802.11 standards, Bluetooth, and ultra-wideband networking technologies. -
Computer 1200 typically includes components such as one or moregeneral purpose processors 1600, and memory storage devices, such as a random access memory (RAM) 1700 andnon-volatile memory 1800.Non-volatile memory 1800 can include floppy disks; fixed or removable hard disks; optical storage media such as DVD-ROM, CD-ROM, and bar codes; non-volatile semiconductor memory devices such as flash memories; read-only-memories (ROMS); battery-backed volatile memories; paper or other printing mediums; and networked storage devices.System bus 1900 interconnects the above components.Processors 1600 may be a multi-processor system such asmulti-processor 100 described above. -
RAM 1700 andnon-volatile memory 1800 are examples of tangible media for storage of data, audio/video files, computer programs, applet interpreters or compilers, virtual machines, and embodiments of the present invention described above. For example, the above described embodiments of the processors of the present invention may be represented as human-readable or computer-usable programs and data files that enable the design, description, modeling, simulation, testing, integration, and/or fabrication of integrated circuits and/or computer systems. Such programs and data files may be used to implement embodiments of the invention as separate integrated circuits or used to integrate embodiments of the invention with other components to form combined integrated circuits, such as microprocessors, microcontrollers, system on a chip (SoC), digital signal processors, embedded processors, or application specific integrated circuits (ASICs). - Programs and data files expressing embodiments of the present invention may use general-purpose programming or scripting languages, such as C or C++; hardware description languages, such as VHDL or Verilog; microcode implemented in RAM, ROM, or hard-wired and adapted to control and coordinate the operation of components within a processor or other integrated circuit; and/or standard or proprietary format data files suitable for use with electronic design automation software applications known in the art. Programs and data files can express embodiments of the invention at various levels of abstraction, including as a functional description, as a synthesized netlist of logic gates and other circuit components, and as an integrated circuit layout or set of masks suitable for use with semiconductor fabrication processes. These programs and data files can be processed by electronic design automation software executed by a computer to design a processor and generate masks for its fabrication.
- Further embodiments of
computer 1200 can include specialized input, output, and communications subsystems for configuring, operating, simulating, testing, and communicating with specialized hardware and software used in the design, testing, and fabrication of integrated circuits. - Although some exemplary embodiments of the present invention are made with reference to a processor having four cores, it is understood that the processor may have more or fewer than four cores. The arrangement and the number of the various devices shown in the block diagrams are for clarity and ease of understanding. It is understood that combinations of blocks, additions of new blocks, re-arrangement of blocks, and the like fall within alternative embodiments of the present invention. For example, any number of I/Os, coherent multi-core processors, system memories, L2 and L3 caches, and non-coherent cached or cacheless processing cores may also be used.
- It is understood that the apparatus and methods described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g. expressed as a hardware description language description or a synthesized netlist) and transformed to hardware in the production of integrated circuits. Additionally, the embodiments of the present invention may be implemented using combinations of hardware and software, including micro-code suitable for execution within a processor.
- The above embodiments of the present invention are illustrative and not limitative. Various alternatives and equivalents are possible. The invention is not limited by the type of integrated circuit in which the present disclosure may be disposed. Nor is the invention limited to any specific type of process technology, e.g., CMOS, Bipolar, BICMOS, or otherwise, that may be used to manufacture the various embodiments of the present invention. Other additions, subtractions or modifications are obvious in view of the present invention and are intended to fall within the scope of the appended claims.
Claims (37)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/864,363 US20090089510A1 (en) | 2007-09-28 | 2007-09-28 | Speculative read in a cache coherent microprocessor |
US14/180,053 US8930634B2 (en) | 2007-09-28 | 2014-02-13 | Speculative read in a cache coherent microprocessor |
US14/557,715 US9141545B2 (en) | 2007-09-28 | 2014-12-02 | Speculative read in a cache coherent microprocessor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/864,363 US20090089510A1 (en) | 2007-09-28 | 2007-09-28 | Speculative read in a cache coherent microprocessor |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/180,053 Continuation US8930634B2 (en) | 2007-09-28 | 2014-02-13 | Speculative read in a cache coherent microprocessor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090089510A1 true US20090089510A1 (en) | 2009-04-02 |
Family
ID=40509691
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/864,363 Abandoned US20090089510A1 (en) | 2007-09-28 | 2007-09-28 | Speculative read in a cache coherent microprocessor |
US14/180,053 Expired - Fee Related US8930634B2 (en) | 2007-09-28 | 2014-02-13 | Speculative read in a cache coherent microprocessor |
US14/557,715 Active US9141545B2 (en) | 2007-09-28 | 2014-12-02 | Speculative read in a cache coherent microprocessor |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/180,053 Expired - Fee Related US8930634B2 (en) | 2007-09-28 | 2014-02-13 | Speculative read in a cache coherent microprocessor |
US14/557,715 Active US9141545B2 (en) | 2007-09-28 | 2014-12-02 | Speculative read in a cache coherent microprocessor |
Country Status (1)
Country | Link |
---|---|
US (3) | US20090089510A1 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080320233A1 (en) * | 2007-06-22 | 2008-12-25 | Mips Technologies Inc. | Reduced Handling of Writeback Data |
US20080320232A1 (en) * | 2007-06-22 | 2008-12-25 | Mips Technologies Inc. | Preventing Writeback Race in Multiple Core Processors |
US20090083493A1 (en) * | 2007-09-21 | 2009-03-26 | Mips Technologies, Inc. | Support for multiple coherence domains |
US20090150618A1 (en) * | 2007-12-10 | 2009-06-11 | Allen Jr James J | Structure for handling data access |
US20090150572A1 (en) * | 2007-12-10 | 2009-06-11 | Allen Jr James J | Structure for handling data requests |
US20090157981A1 (en) * | 2007-12-12 | 2009-06-18 | Mips Technologies, Inc. | Coherent instruction cache utilizing cache-op execution resources |
US20090193198A1 (en) * | 2008-01-29 | 2009-07-30 | International Business Machines Corporation | Method, system and computer program product for preventing lockout and stalling conditions in a multi-node system with speculative memory fetching |
US20090248988A1 (en) * | 2008-03-28 | 2009-10-01 | Mips Technologies, Inc. | Mechanism for maintaining consistency of data written by io devices |
US20100257322A1 (en) * | 2009-04-07 | 2010-10-07 | Robert Graham Isherwood | Method and apparatus for ensuring data cache coherency |
US20130185472A1 (en) * | 2012-01-17 | 2013-07-18 | Wilocity Ltd. | Techniques for improving throughput and performance of a distributed interconnect peripheral bus |
US20140310469A1 (en) * | 2013-04-11 | 2014-10-16 | Apple Inc. | Coherence processing with pre-kill mechanism to avoid duplicated transaction identifiers |
US20150052304A1 (en) * | 2013-08-19 | 2015-02-19 | Soft Machines, Inc. | Systems and methods for read request bypassing a last level cache that interfaces with an external fabric |
US9141545B2 (en) | 2007-09-28 | 2015-09-22 | Arm Finance Overseas Limited | Speculative read in a cache coherent microprocessor |
US9361227B2 (en) | 2013-08-30 | 2016-06-07 | Soft Machines, Inc. | Systems and methods for faster read after write forwarding using a virtual address |
US9606925B2 (en) * | 2015-03-26 | 2017-03-28 | Intel Corporation | Method, apparatus and system for optimizing cache memory transaction handling in a processor |
US9632947B2 (en) | 2013-08-19 | 2017-04-25 | Intel Corporation | Systems and methods for acquiring data for loads at different access times from hierarchical sources using a load queue as a temporary storage buffer and completing the load early |
US9665468B2 (en) | 2013-08-19 | 2017-05-30 | Intel Corporation | Systems and methods for invasive debug of a processor without processor execution of instructions |
US20180054486A1 (en) * | 2009-10-29 | 2018-02-22 | International Business Machines Corporation | Speculative Requests |
CN108694687A (en) * | 2017-04-07 | 2018-10-23 | 英特尔公司 | Device and method for protecting the content in virtualization and graphics environment |
US10802968B2 (en) | 2015-05-06 | 2020-10-13 | Apple Inc. | Processor to memory with coherency bypass |
US10896140B2 (en) * | 2019-04-19 | 2021-01-19 | International Business Machines Corporation | Controlling operation of multiple computational engines |
US11507510B2 (en) | 2016-05-23 | 2022-11-22 | Arteris, Inc. | Method for using victim buffer in cache coherent systems |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9129071B2 (en) * | 2012-10-24 | 2015-09-08 | Texas Instruments Incorporated | Coherence controller slot architecture allowing zero latency write commit |
US10613983B2 (en) * | 2018-03-20 | 2020-04-07 | Advanced Micro Devices, Inc. | Prefetcher based speculative dynamic random-access memory read request technique |
US11500911B2 (en) | 2018-07-31 | 2022-11-15 | Sap Se | Descriptive text generation for data visualizations |
US11269759B2 (en) | 2018-11-15 | 2022-03-08 | Sap Se | Intelligent regression fortifier |
US11231931B1 (en) | 2018-12-20 | 2022-01-25 | Advanced Micro Devices, Inc. | Mechanism for mitigating information leak via cache side channels during speculative execution |
US20220114099A1 (en) * | 2021-12-22 | 2022-04-14 | Intel Corporation | System, apparatus and methods for direct data reads from memory |
Citations (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5406504A (en) * | 1993-06-30 | 1995-04-11 | Digital Equipment | Multiprocessor cache examiner and coherency checker |
US5530933A (en) * | 1994-02-24 | 1996-06-25 | Hewlett-Packard Company | Multiprocessor system for maintaining cache coherency by checking the coherency in the order of the transactions being issued on the bus |
US5551005A (en) * | 1994-02-25 | 1996-08-27 | Intel Corporation | Apparatus and method of handling race conditions in mesi-based multiprocessor system with private caches |
US5715428A (en) * | 1994-02-28 | 1998-02-03 | Intel Corporation | Apparatus for maintaining multilevel cache hierarchy coherency in a multiprocessor computer system |
US5889779A (en) * | 1996-12-02 | 1999-03-30 | Rockwell Science Center | Scheduler utilizing dynamic schedule table |
US6073217A (en) * | 1996-02-14 | 2000-06-06 | Advanced Micro Devices | Method for detecting updates to instructions which are within an instruction processing pipeline of a microprocessor |
US6088771A (en) * | 1997-10-24 | 2000-07-11 | Digital Equipment Corporation | Mechanism for reducing latency of memory barrier operations on a multiprocessor system |
US6202127B1 (en) * | 1997-11-26 | 2001-03-13 | Compaq Computer Corporation | Apparatus for spatial and temporal sampling in a computer memory system |
US6216200B1 (en) * | 1994-10-14 | 2001-04-10 | Mips Technologies, Inc. | Address queue |
US20010005873A1 (en) * | 1999-12-24 | 2001-06-28 | Hitachi, Ltd. | Shared memory multiprocessor performing cache coherence control and node controller therefor |
US6266755B1 (en) * | 1994-10-14 | 2001-07-24 | Mips Technologies, Inc. | Translation lookaside buffer with virtual address conflict prevention |
US6393500B1 (en) * | 1999-08-12 | 2002-05-21 | Mips Technologies, Inc. | Burst-configurable data bus |
US6418517B1 (en) * | 1997-08-29 | 2002-07-09 | International Business Machines Corporation | Optimized function execution for a multiprocessor computer system |
US20020129029A1 (en) * | 2001-03-09 | 2002-09-12 | Warner Craig W. | Scalable transport layer protocol for multiprocessor interconnection networks that tolerates interconnection component failure |
US20020133674A1 (en) * | 2001-03-14 | 2002-09-19 | Martin Milo M.K. | Bandwidth-adaptive, hybrid, cache-coherence protocol |
US6490642B1 (en) * | 1999-08-12 | 2002-12-03 | Mips Technologies, Inc. | Locked read/write on separate address/data bus using write barrier |
US6493776B1 (en) * | 1999-08-12 | 2002-12-10 | Mips Technologies, Inc. | Scalable on-chip system bus |
US6507862B1 (en) * | 1999-05-11 | 2003-01-14 | Sun Microsystems, Inc. | Switching method in a multi-threaded processor |
US6594728B1 (en) * | 1994-10-14 | 2003-07-15 | Mips Technologies, Inc. | Cache memory with dual-way arrays and multiplexed parallel output |
US6604159B1 (en) * | 1999-08-12 | 2003-08-05 | Mips Technologies, Inc. | Data release to reduce latency in on-chip system bus |
US6651156B1 (en) * | 2001-03-30 | 2003-11-18 | Mips Technologies, Inc. | Mechanism for extending properties of virtual memory pages by a TLB |
US6681283B1 (en) * | 1999-08-12 | 2004-01-20 | Mips Technologies, Inc. | Coherent data apparatus for an on-chip split transaction system bus |
US20040019891A1 (en) * | 2002-07-25 | 2004-01-29 | Koenen David J. | Method and apparatus for optimizing performance in a multi-processing system |
US6721813B2 (en) * | 2001-01-30 | 2004-04-13 | Advanced Micro Devices, Inc. | Computer system implementing a system and method for tracking the progress of posted write transactions |
US6732208B1 (en) * | 1999-02-25 | 2004-05-04 | Mips Technologies, Inc. | Low latency system bus interface for multi-master processing environments |
US20040249880A1 (en) * | 2001-12-14 | 2004-12-09 | Martin Vorbach | Reconfigurable system |
US20050053057A1 (en) * | 1999-09-29 | 2005-03-10 | Silicon Graphics, Inc. | Multiprocessor node controller circuit and method |
US20050071722A1 (en) * | 2003-09-26 | 2005-03-31 | Arm Limited | Data processing apparatus and method for handling corrupted data values |
US6976155B2 (en) * | 2001-06-12 | 2005-12-13 | Intel Corporation | Method and apparatus for communicating between processing entities in a multi-processor |
US7003630B1 (en) * | 2002-06-27 | 2006-02-21 | Mips Technologies, Inc. | Mechanism for proxy management of multiprocessor storage hierarchies |
US7017025B1 (en) * | 2002-06-27 | 2006-03-21 | Mips Technologies, Inc. | Mechanism for proxy management of multiprocessor virtual memory |
US7047372B2 (en) * | 2003-04-15 | 2006-05-16 | Newisys, Inc. | Managing I/O accesses in multiprocessor systems |
US20060179429A1 (en) * | 2004-01-22 | 2006-08-10 | University Of Washington | Building a wavecache |
US7107567B1 (en) * | 2004-04-06 | 2006-09-12 | Altera Corporation | Electronic design protection circuit |
US20060282645A1 (en) * | 2005-06-14 | 2006-12-14 | Benjamin Tsien | Memory attribute speculation |
US7162615B1 (en) * | 2000-06-12 | 2007-01-09 | Mips Technologies, Inc. | Data transfer bus communication using single request to perform command and return data to destination indicated in context to allow thread context switch |
US7162590B2 (en) * | 2003-07-02 | 2007-01-09 | Arm Limited | Memory bus within a coherent multi-processing system having a main portion and a coherent multi-processing portion |
US20070043911A1 (en) * | 2005-08-17 | 2007-02-22 | Sun Microsystems, Inc. | Multiple independent coherence planes for maintaining coherency |
US20070043913A1 (en) * | 2005-08-17 | 2007-02-22 | Sun Microsystems, Inc. | Use of FBDIMM Channel as memory channel and coherence channel |
US20070113053A1 (en) * | 2005-02-04 | 2007-05-17 | Mips Technologies, Inc. | Multithreading instruction scheduler employing thread group priorities |
US7240165B2 (en) * | 2004-01-15 | 2007-07-03 | Hewlett-Packard Development Company, L.P. | System and method for providing parallel data requests |
US7257814B1 (en) * | 1998-12-16 | 2007-08-14 | Mips Technologies, Inc. | Method and apparatus for implementing atomicity of memory operations in dynamic multi-streaming processors |
US20080215815A1 (en) * | 2005-03-31 | 2008-09-04 | International Business Machines Corporation | System and method of improving task switching and page translation performance utilizing a multilevel translation lookaside buffer |
US20090019232A1 (en) * | 2007-07-11 | 2009-01-15 | Freescale Semiconductor, Inc. | Specification of coherence domain during address translation |
US20090083493A1 (en) * | 2007-09-21 | 2009-03-26 | Mips Technologies, Inc. | Support for multiple coherence domains |
US20090157981A1 (en) * | 2007-12-12 | 2009-06-18 | Mips Technologies, Inc. | Coherent instruction cache utilizing cache-op execution resources |
US20090248988A1 (en) * | 2008-03-28 | 2009-10-01 | Mips Technologies, Inc. | Mechanism for maintaining consistency of data written by io devices |
US20090276578A1 (en) * | 2008-04-30 | 2009-11-05 | Moyer William C | Cache coherency protocol in a data processing system |
US7739476B2 (en) * | 2005-11-04 | 2010-06-15 | Apple Inc. | R and C bit update handling |
US20100235579A1 (en) * | 2006-02-22 | 2010-09-16 | Stuart David Biles | Cache Management Within A Data Processing Apparatus |
US20100287342A1 (en) * | 2009-05-07 | 2010-11-11 | Freescale Semiconductor, Inc. | Processing of coherent and incoherent accesses at a uniform cache |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9409148D0 (en) | 1994-05-09 | 1994-06-29 | Secr Defence | Data cache |
US7099913B1 (en) * | 2000-08-31 | 2006-08-29 | Hewlett-Packard Development Company, L.P. | Speculative directory writes in a directory based cache coherent nonuniform memory access protocol |
US7529893B2 (en) * | 2003-04-11 | 2009-05-05 | Sun Microsystems, Inc. | Multi-node system with split ownership and access right coherence mechanism |
US7644237B1 (en) | 2003-06-23 | 2010-01-05 | Mips Technologies, Inc. | Method and apparatus for global ordering to insure latency independent coherence |
US7536513B2 (en) * | 2005-03-31 | 2009-05-19 | International Business Machines Corporation | Data processing system, cache system and method for issuing a request on an interconnect fabric without reference to a lower level cache based upon a tagged cache state |
US7398360B2 (en) * | 2005-08-17 | 2008-07-08 | Sun Microsystems, Inc. | Multi-socket symmetric multiprocessing (SMP) system for chip multi-threaded (CMT) processors |
US7600078B1 (en) * | 2006-03-29 | 2009-10-06 | Intel Corporation | Speculatively performing read transactions |
US7769958B2 (en) * | 2007-06-22 | 2010-08-03 | Mips Technologies, Inc. | Avoiding livelock using intervention messages in multiple core processors |
US20090089510A1 (en) | 2007-09-28 | 2009-04-02 | Mips Technologies, Inc. | Speculative read in a cache coherent microprocessor |
US8615013B2 (en) * | 2010-05-18 | 2013-12-24 | Agere Systems Llc | Packet scheduling with guaranteed minimum rate in a traffic manager of a network processor |
US8301865B2 (en) * | 2009-06-29 | 2012-10-30 | Oracle America, Inc. | System and method to manage address translation requests |
-
2007
- 2007-09-28 US US11/864,363 patent/US20090089510A1/en not_active Abandoned
-
2014
- 2014-02-13 US US14/180,053 patent/US8930634B2/en not_active Expired - Fee Related
- 2014-12-02 US US14/557,715 patent/US9141545B2/en active Active
Patent Citations (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5406504A (en) * | 1993-06-30 | 1995-04-11 | Digital Equipment | Multiprocessor cache examiner and coherency checker |
US5530933A (en) * | 1994-02-24 | 1996-06-25 | Hewlett-Packard Company | Multiprocessor system for maintaining cache coherency by checking the coherency in the order of the transactions being issued on the bus |
US5551005A (en) * | 1994-02-25 | 1996-08-27 | Intel Corporation | Apparatus and method of handling race conditions in mesi-based multiprocessor system with private caches |
US5715428A (en) * | 1994-02-28 | 1998-02-03 | Intel Corporation | Apparatus for maintaining multilevel cache hierarchy coherency in a multiprocessor computer system |
US6216200B1 (en) * | 1994-10-14 | 2001-04-10 | Mips Technologies, Inc. | Address queue |
US6266755B1 (en) * | 1994-10-14 | 2001-07-24 | Mips Technologies, Inc. | Translation lookaside buffer with virtual address conflict prevention |
US6594728B1 (en) * | 1994-10-14 | 2003-07-15 | Mips Technologies, Inc. | Cache memory with dual-way arrays and multiplexed parallel output |
US6073217A (en) * | 1996-02-14 | 2000-06-06 | Advanced Micro Devices | Method for detecting updates to instructions which are within an instruction processing pipeline of a microprocessor |
US5889779A (en) * | 1996-12-02 | 1999-03-30 | Rockwell Science Center | Scheduler utilizing dynamic schedule table |
US6418517B1 (en) * | 1997-08-29 | 2002-07-09 | International Business Machines Corporation | Optimized function execution for a multiprocessor computer system |
US6088771A (en) * | 1997-10-24 | 2000-07-11 | Digital Equipment Corporation | Mechanism for reducing latency of memory barrier operations on a multiprocessor system |
US6202127B1 (en) * | 1997-11-26 | 2001-03-13 | Compaq Computer Corporation | Apparatus for spatial and temporal sampling in a computer memory system |
US7257814B1 (en) * | 1998-12-16 | 2007-08-14 | Mips Technologies, Inc. | Method and apparatus for implementing atomicity of memory operations in dynamic multi-streaming processors |
US6732208B1 (en) * | 1999-02-25 | 2004-05-04 | Mips Technologies, Inc. | Low latency system bus interface for multi-master processing environments |
US6507862B1 (en) * | 1999-05-11 | 2003-01-14 | Sun Microsystems, Inc. | Switching method in a multi-threaded processor |
US6681283B1 (en) * | 1999-08-12 | 2004-01-20 | Mips Technologies, Inc. | Coherent data apparatus for an on-chip split transaction system bus |
US6493776B1 (en) * | 1999-08-12 | 2002-12-10 | Mips Technologies, Inc. | Scalable on-chip system bus |
US6490642B1 (en) * | 1999-08-12 | 2002-12-03 | Mips Technologies, Inc. | Locked read/write on separate address/data bus using write barrier |
US6604159B1 (en) * | 1999-08-12 | 2003-08-05 | Mips Technologies, Inc. | Data release to reduce latency in on-chip system bus |
US6393500B1 (en) * | 1999-08-12 | 2002-05-21 | Mips Technologies, Inc. | Burst-configurable data bus |
US20050053057A1 (en) * | 1999-09-29 | 2005-03-10 | Silicon Graphics, Inc. | Multiprocessor node controller circuit and method |
US20010005873A1 (en) * | 1999-12-24 | 2001-06-28 | Hitachi, Ltd. | Shared memory multiprocessor performing cache coherence control and node controller therefor |
US7162615B1 (en) * | 2000-06-12 | 2007-01-09 | Mips Technologies, Inc. | Data transfer bus communication using single request to perform command and return data to destination indicated in context to allow thread context switch |
US6721813B2 (en) * | 2001-01-30 | 2004-04-13 | Advanced Micro Devices, Inc. | Computer system implementing a system and method for tracking the progress of posted write transactions |
US20020129029A1 (en) * | 2001-03-09 | 2002-09-12 | Warner Craig W. | Scalable transport layer protocol for multiprocessor interconnection networks that tolerates interconnection component failure |
US20020133674A1 (en) * | 2001-03-14 | 2002-09-19 | Martin Milo M.K. | Bandwidth-adaptive, hybrid, cache-coherence protocol |
US6651156B1 (en) * | 2001-03-30 | 2003-11-18 | Mips Technologies, Inc. | Mechanism for extending properties of virtual memory pages by a TLB |
US6976155B2 (en) * | 2001-06-12 | 2005-12-13 | Intel Corporation | Method and apparatus for communicating between processing entities in a multi-processor |
US20040249880A1 (en) * | 2001-12-14 | 2004-12-09 | Martin Vorbach | Reconfigurable system |
US7577822B2 (en) * | 2001-12-14 | 2009-08-18 | Pact Xpp Technologies Ag | Parallel task operation in processor and reconfigurable coprocessor configured based on information in link list including termination information for synchronization |
US7003630B1 (en) * | 2002-06-27 | 2006-02-21 | Mips Technologies, Inc. | Mechanism for proxy management of multiprocessor storage hierarchies |
US7017025B1 (en) * | 2002-06-27 | 2006-03-21 | Mips Technologies, Inc. | Mechanism for proxy management of multiprocessor virtual memory |
US20040019891A1 (en) * | 2002-07-25 | 2004-01-29 | Koenen David J. | Method and apparatus for optimizing performance in a multi-processing system |
US7047372B2 (en) * | 2003-04-15 | 2006-05-16 | Newisys, Inc. | Managing I/O accesses in multiprocessor systems |
US7162590B2 (en) * | 2003-07-02 | 2007-01-09 | Arm Limited | Memory bus within a coherent multi-processing system having a main portion and a coherent multi-processing portion |
US20050071722A1 (en) * | 2003-09-26 | 2005-03-31 | Arm Limited | Data processing apparatus and method for handling corrupted data values |
US7240165B2 (en) * | 2004-01-15 | 2007-07-03 | Hewlett-Packard Development Company, L.P. | System and method for providing parallel data requests |
US20060179429A1 (en) * | 2004-01-22 | 2006-08-10 | University Of Washington | Building a wavecache |
US7107567B1 (en) * | 2004-04-06 | 2006-09-12 | Altera Corporation | Electronic design protection circuit |
US20070113053A1 (en) * | 2005-02-04 | 2007-05-17 | Mips Technologies, Inc. | Multithreading instruction scheduler employing thread group priorities |
US20080215815A1 (en) * | 2005-03-31 | 2008-09-04 | International Business Machines Corporation | System and method of improving task switching and page translation performance utilizing a multilevel translation lookaside buffer |
US20060282645A1 (en) * | 2005-06-14 | 2006-12-14 | Benjamin Tsien | Memory attribute speculation |
US20070043913A1 (en) * | 2005-08-17 | 2007-02-22 | Sun Microsystems, Inc. | Use of FBDIMM Channel as memory channel and coherence channel |
US7353340B2 (en) * | 2005-08-17 | 2008-04-01 | Sun Microsystems, Inc. | Multiple independent coherence planes for maintaining coherency |
US20070043911A1 (en) * | 2005-08-17 | 2007-02-22 | Sun Microsystems, Inc. | Multiple independent coherence planes for maintaining coherency |
US7739476B2 (en) * | 2005-11-04 | 2010-06-15 | Apple Inc. | R and C bit update handling |
US20100235579A1 (en) * | 2006-02-22 | 2010-09-16 | Stuart David Biles | Cache Management Within A Data Processing Apparatus |
US20090019232A1 (en) * | 2007-07-11 | 2009-01-15 | Freescale Semiconductor, Inc. | Specification of coherence domain during address translation |
US20090083493A1 (en) * | 2007-09-21 | 2009-03-26 | Mips Technologies, Inc. | Support for multiple coherence domains |
US20090157981A1 (en) * | 2007-12-12 | 2009-06-18 | Mips Technologies, Inc. | Coherent instruction cache utilizing cache-op execution resources |
US20090248988A1 (en) * | 2008-03-28 | 2009-10-01 | Mips Technologies, Inc. | Mechanism for maintaining consistency of data written by io devices |
US20090276578A1 (en) * | 2008-04-30 | 2009-11-05 | Moyer William C | Cache coherency protocol in a data processing system |
US20100287342A1 (en) * | 2009-05-07 | 2010-11-11 | Freescale Semiconductor, Inc. | Processing of coherent and incoherent accesses at a uniform cache |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7769957B2 (en) * | 2007-06-22 | 2010-08-03 | Mips Technologies, Inc. | Preventing writeback race in multiple core processors |
US20080320232A1 (en) * | 2007-06-22 | 2008-12-25 | Mips Technologies Inc. | Preventing Writeback Race in Multiple Core Processors |
US20080320233A1 (en) * | 2007-06-22 | 2008-12-25 | Mips Technologies Inc. | Reduced Handling of Writeback Data |
US20090083493A1 (en) * | 2007-09-21 | 2009-03-26 | Mips Technologies, Inc. | Support for multiple coherence domains |
US8131941B2 (en) | 2007-09-21 | 2012-03-06 | Mips Technologies, Inc. | Support for multiple coherence domains |
US9141545B2 (en) | 2007-09-28 | 2015-09-22 | Arm Finance Overseas Limited | Speculative read in a cache coherent microprocessor |
US8032713B2 (en) | 2007-12-10 | 2011-10-04 | International Business Machines Corporation | Structure for handling data access |
US20090150618A1 (en) * | 2007-12-10 | 2009-06-11 | Allen Jr James J | Structure for handling data access |
US20090150572A1 (en) * | 2007-12-10 | 2009-06-11 | Allen Jr James J | Structure for handling data requests |
US7937533B2 (en) * | 2007-12-10 | 2011-05-03 | International Business Machines Corporation | Structure for handling data requests |
US8392663B2 (en) | 2007-12-12 | 2013-03-05 | Mips Technologies, Inc. | Coherent instruction cache utilizing cache-op execution resources |
US20090157981A1 (en) * | 2007-12-12 | 2009-06-18 | Mips Technologies, Inc. | Coherent instruction cache utilizing cache-op execution resources |
US7934059B2 (en) * | 2008-01-29 | 2011-04-26 | International Business Machines Corporation | Method, system and computer program product for preventing lockout and stalling conditions in a multi-node system with speculative memory fetching |
US20090193198A1 (en) * | 2008-01-29 | 2009-07-30 | International Business Machines Corporation | Method, system and computer program product for preventing lockout and stalling conditions in a multi-node system with speculative memory fetching |
US20090248988A1 (en) * | 2008-03-28 | 2009-10-01 | Mips Technologies, Inc. | Mechanism for maintaining consistency of data written by io devices |
US8234455B2 (en) * | 2009-04-07 | 2012-07-31 | Imagination Technologies Limited | Method and apparatus for ensuring data cache coherency |
US9703709B2 (en) | 2009-04-07 | 2017-07-11 | Imagination Technologies Limited | Method and apparatus for ensuring data cache coherency |
US20100257322A1 (en) * | 2009-04-07 | 2010-10-07 | Robert Graham Isherwood | Method and apparatus for ensuring data cache coherency |
US20180054486A1 (en) * | 2009-10-29 | 2018-02-22 | International Business Machines Corporation | Speculative Requests |
US20130185472A1 (en) * | 2012-01-17 | 2013-07-18 | Wilocity Ltd. | Techniques for improving throughput and performance of a distributed interconnect peripheral bus |
US9256564B2 (en) * | 2012-01-17 | 2016-02-09 | Qualcomm Incorporated | Techniques for improving throughput and performance of a distributed interconnect peripheral bus |
US9311265B2 (en) | 2012-01-17 | 2016-04-12 | Qualcomm Incorporated | Techniques for improving throughput and performance of a distributed interconnect peripheral bus connected to a host controller |
US20140310469A1 (en) * | 2013-04-11 | 2014-10-16 | Apple Inc. | Coherence processing with pre-kill mechanism to avoid duplicated transaction identifiers |
US9465740B2 (en) * | 2013-04-11 | 2016-10-11 | Apple Inc. | Coherence processing with pre-kill mechanism to avoid duplicated transaction identifiers |
US20150052304A1 (en) * | 2013-08-19 | 2015-02-19 | Soft Machines, Inc. | Systems and methods for read request bypassing a last level cache that interfaces with an external fabric |
US9632947B2 (en) | 2013-08-19 | 2017-04-25 | Intel Corporation | Systems and methods for acquiring data for loads at different access times from hierarchical sources using a load queue as a temporary storage buffer and completing the load early |
US9665468B2 (en) | 2013-08-19 | 2017-05-30 | Intel Corporation | Systems and methods for invasive debug of a processor without processor execution of instructions |
US9619382B2 (en) * | 2013-08-19 | 2017-04-11 | Intel Corporation | Systems and methods for read request bypassing a last level cache that interfaces with an external fabric |
US10552334B2 (en) | 2013-08-19 | 2020-02-04 | Intel Corporation | Systems and methods for acquiring data for loads at different access times from hierarchical sources using a load queue as a temporary storage buffer and completing the load early |
US10296432B2 (en) | 2013-08-19 | 2019-05-21 | Intel Corporation | Systems and methods for invasive debug of a processor without processor execution of instructions |
US10402322B2 (en) | 2013-08-30 | 2019-09-03 | Intel Corporation | Systems and methods for faster read after write forwarding using a virtual address |
US9767020B2 (en) | 2013-08-30 | 2017-09-19 | Intel Corporation | Systems and methods for faster read after write forwarding using a virtual address |
US9361227B2 (en) | 2013-08-30 | 2016-06-07 | Soft Machines, Inc. | Systems and methods for faster read after write forwarding using a virtual address |
US9606925B2 (en) * | 2015-03-26 | 2017-03-28 | Intel Corporation | Method, apparatus and system for optimizing cache memory transaction handling in a processor |
US10802968B2 (en) | 2015-05-06 | 2020-10-13 | Apple Inc. | Processor to memory with coherency bypass |
US11507510B2 (en) | 2016-05-23 | 2022-11-22 | Arteris, Inc. | Method for using victim buffer in cache coherent systems |
CN108694687A (en) * | 2017-04-07 | 2018-10-23 | 英特尔公司 | Device and method for protecting the content in virtualization and graphics environment |
US10896140B2 (en) * | 2019-04-19 | 2021-01-19 | International Business Machines Corporation | Controlling operation of multiple computational engines |
Also Published As
Publication number | Publication date |
---|---|
US20140164714A1 (en) | 2014-06-12 |
US8930634B2 (en) | 2015-01-06 |
US9141545B2 (en) | 2015-09-22 |
US20150089157A1 (en) | 2015-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8930634B2 (en) | Speculative read in a cache coherent microprocessor | |
US8131941B2 (en) | Support for multiple coherence domains | |
US11803486B2 (en) | Write merging on stores with different privilege levels | |
US8001283B2 (en) | Efficient, scalable and high performance mechanism for handling IO requests | |
US9513904B2 (en) | Computer processor employing cache memory with per-byte valid bits | |
US7774549B2 (en) | Horizontally-shared cache victims in multiple core processors | |
US6751710B2 (en) | Scalable multiprocessor system and cache coherence method | |
JP6381541B2 (en) | Methods, circuit configurations, integrated circuit devices, program products for processing instructions in a data processing system (conversion management instructions for updating address translation data structures in remote processing nodes) | |
US6622217B2 (en) | Cache coherence protocol engine system and method for processing memory transaction in distinct address subsets during interleaved time periods in a multiprocessor system | |
US7389389B2 (en) | System and method for limited fanout daisy chaining of cache invalidation requests in a shared-memory multiprocessor system | |
US6675265B2 (en) | Multiprocessor cache coherence system and method in which processor nodes and input/output nodes are equal participants | |
US6640287B2 (en) | Scalable multiprocessor system and cache coherence method incorporating invalid-to-dirty requests | |
US8392663B2 (en) | Coherent instruction cache utilizing cache-op execution resources | |
US7769957B2 (en) | Preventing writeback race in multiple core processors | |
US20090248988A1 (en) | Mechanism for maintaining consistency of data written by io devices | |
US20080320233A1 (en) | Reduced Handling of Writeback Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MIPS TECHNOLOGIES, INC,, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, WILLIAM;BERG, THOMAS BENJAMIN;REEL/FRAME:019989/0322 Effective date: 20071009 |
|
AS | Assignment |
Owner name: BRIDGE CROSSING, LLC, NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MIPS TECHNOLOGIES, INC.;REEL/FRAME:030202/0440 Effective date: 20130206 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: ARM FINANCE OVERSEAS LIMITED, GREAT BRITAIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BRIDGE CROSSING, LLC;REEL/FRAME:033074/0058 Effective date: 20140131 |