US6484238B1 - Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache - Google Patents

Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache Download PDF

Info

Publication number
US6484238B1
US6484238B1 US09/467,352 US46735299A US6484238B1 US 6484238 B1 US6484238 B1 US 6484238B1 US 46735299 A US46735299 A US 46735299A US 6484238 B1 US6484238 B1 US 6484238B1
Authority
US
United States
Prior art keywords
address
snoop
snoop hit
victim
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/467,352
Inventor
Douglas J Cutter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Priority to US09/467,352 priority Critical patent/US6484238B1/en
Assigned to HEWLETT-PACKARD COMPANY reassignment HEWLETT-PACKARD COMPANY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CUTTER, DOUGLAS J.
Application granted granted Critical
Publication of US6484238B1 publication Critical patent/US6484238B1/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD COMPANY
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means

Definitions

  • the present invention relates to an apparatus and method for cache coherency by detecting snoop hits on victim lines issued to a higher level cache in a multiprocessor system.
  • each processor has its own local cache for storing data.
  • Each processor may write to and read from a shared higher-level cache. Therefore, each processor can access both its own local cache and the shared cache for the entire system.
  • Cache coherency is required to ensure that two processors do not attempt to simultaneously access the same address space of the shared cache.
  • cache coherency must ensure that attempts to access particular portions of a cache are prioritized.
  • a processor when a processor attempts to replace a line in its local cache, it sends a victim address to its victim buffer in order to victimize an address space. At the same time, it transmits the victim address to a bus cluster, which is an internal on-chip interface between the processor and a system bus.
  • the bus cluster manages prioritization of attempts to access the cache. Due to a propagation delay, the victim address transmitted to the bus cluster may require, for example, two clock cycles to reach the bus cluster. During those two clock cycles, another processor may attempt to access the same address space in the shared cache. If that occurs, the bus cluster will not be aware of the conflict resulting from attempts by both processors to access the same portion of the shared cache due to the two clock cycle delay. Therefore, circuitry must account for this type of conflict. In particular, a need exists for detecting snoop hits occurring on the same address space during a propagation delay when transmitting a victim address from a processor to a bus cluster in order to avoid conflicts while accessing the cache.
  • a method and apparatus consistent with the present invention includes receiving a victim address for a local cache in a multiprocessor system and transmitting the victim address to a bus cluster interfacing a processor with a system bus.
  • a snoop is received during transmission of the victim address to the bus cluster, and it is determined if the snoop hits the victim address. If the snoop hits the victim address, a unique snoop hit signal is provided.
  • Another apparatus consistent with the present invention includes a plurality of wordlines corresponding to a victim address that was sent to a bus cluster and a snoop match line for detecting a snoop hit.
  • Logic circuitry connected to the plurality of wordlines and the snoop hit line, operates to determine if the snoop hit relates to the victim address that is being transmitted to the bus cluster interfacing a processor with a system bus.
  • the logic circuitry also operates to provide a snoop hit signal if the snoop hits a victim address stored in the victim buffer and not yet issued to the bus cluster.
  • FIG. 1 is a block diagram of a multiprocessor system for implementing an embodiment consistent with the present invention.
  • FIG. 2 is a block diagram of logic circuitry within cache control circuitry in a processor for detecting snoop hits on victim lines issued to a cache during propagation delays in a decoded wordline address being transferred to a bus cluster.
  • FIG. 3 is a timing diagram illustrating the operation of the logic circuitry in FIG. 2 .
  • FIGS. 4 a and 4 b are flow chart illustrating the operation of the logic circuitry in FIG. 2 .
  • FIG. 1 is a block diagram of a multiprocessor system 10 for implementing an embodiment consistent with the present invention.
  • Multiprocessor system 10 includes a plurality of processors 12 and 14 coupled to a system bus 24 .
  • System 10 also includes typical components of a main memory 26 coupled to system bus 24 and an input/output (I/O) unit 28 coupled to system bus 24 .
  • a clock 22 controls, for example, operation of processor 12 and other components within system 10 .
  • Processor 12 illustrates, for example, certain components used with a local cache.
  • processor 12 includes cache control circuitry 16 coupled to a local cache 18 and a bus cluster 20 .
  • Cache control circuitry 16 may include conventional components for controlling writing to and reading from cache 18 by processor 12 .
  • Bus cluster 20 may include conventional components for interfacing processor 12 with system bus 24 .
  • bus cluster 20 typically includes conventional components for handling requests by external processors, such as processor 14 , to write to and read from local cache 18 . Therefore, bus cluster 20 along with cache control circuitry 16 provides for cache coherency by prioritizing requests to access cache 18 and resolving conflicts between such requests.
  • Bus clusters for resolving conflicts in accessing memory are known in the art, and a bus cluster includes any component for interfacing a processor with a system bus and for potentially resolving such conflicts.
  • FIG. 2 is a block diagram of cache control circuitry 16 including particular components used in controlling cache 18 , in addition to other conventional components which may be used.
  • Cache control circuitry 16 handles snoops received during transmission of a victim address to bus cluster 20 .
  • a snoop is a method for maintaining cache coherency by sending a desired snoop address onto the system bus in a multiprocessor system so that other cache controllers are able to determine whether or not they have a copy of the desired address in their local cache.
  • cache control circuitry 16 includes a victim array 30 , which receives a victim address on line 32 from processor 12 when the processor attempts to replace a line in its cache 18 .
  • a victim address specifies an address space in a cache.
  • the victim address is also transmitted on line 34 to bus cluster 20 , through a queue structure in victim array 30 having some delay, in order to notify bus cluster 20 of the access to the victim address.
  • Bus cluster 20 may thus prioritize and resolve conflicts and attempts to write to and read from that same address space by other processors.
  • Victim array 30 can also receive snoop addresses on line 36 from external processors, such as processor 14 , in system 10 attempting to use a certain address space.
  • victim array 30 contains the same address as that received from an external processor on line 36 , it generates a snoop hit indicating an external processor attempts to access the same address space which it has already picked to victimize.
  • Victim array 30 outputs the snoop hit on a snoop hit line 38 .
  • Victim array circuitry such as victim array 30 , for generating such snoop hits is known in the art, and those types of circuitry are also referred to as a victim buffer or a victim queue.
  • a victim array includes necessary components for detecting snoop hits in a multiprocessor system. Snooping includes known techniques for cache coherency in a multiprocessor system, and a snoop hit includes any indication of an attempt to access a cache by an external processor.
  • each of the wordlines 40 is transmitted through two latches, in this example, latches 42 , 44 , 46 , and 48 . Although only four latches are shown in FIG.
  • latches there are two latches for each wordline, or thirty-two latches in this example. These latches are typically already present in the circuitry of victim array 30 and, therefore, need not in this example be added for the additional logic to detect snoop hits.
  • the exemplary embodiment thus makes use of components already on-chip in a processor and hence reduces the number of additional gates or components otherwise added to the processor for detecting the snoop hits during the propagation delay.
  • the latches ( 42 , 44 , 46 , and 48 ) are further used in stages 50 and 56 to detect snoop hits during the two clock cycle delay in transmission of the decoded wordline address to bus cluster 20 .
  • the latched wordlines 40 are logically compared with snoop hits on line 38 to detect a snoop hit during each cycle of the two clock cycle delay.
  • first stage 50 includes a plurality of AND gates 52 and 54 , each AND gate receiving as inputs the snoop hit on line 38 and one of the address wordlines 42 / 46 .
  • first stage 50 includes sixteen latches for latching each of the sixteen wordlines 40 as inputs, and sixteen AND gates, in addition to receiving as another input the snoop hit on line 38 .
  • the outputs of AND gates 52 and 54 are input to OR gate 62 and output on line 60 as a snoop hit 11 c.
  • Second stage 56 likewise includes a plurality of AND gates 58 and 60 , of which only two are shown for simplicity. It actually includes sixteen latches in this example for latching as inputs the address lines 42 / 46 of the 16-bit wordline, and sixteen AND gates, as well as the snoop hit on line 38 as another input.
  • An OR gate 64 receives as inputs the outputs from AND gates 58 and 60 in second stage 56 and provides an output on line 68 as a snoop hit 11 w .
  • the snoop hits 11 c and 11 w are input to an OR gate 70 which provides a snoop hit output on line 72 to bus cluster 20 .
  • the terms 11 c and 11 w are used only as labels for the snoop detection signals in the two stages 50 and 56 .
  • first stage 50 through each of its plurality of AND gates receives an output high on line 38 for the snoop hit and receives at least one high signal on lines 42 / 46 .
  • the wordline may be all zeroes or one out of sixteen bits logically high. This guarantees only one high input to OR gate 62 .
  • Second stage 56 likewise functions to detect a snoop hit during the second clock cycle of the propagation delay.
  • snoop hit line 38 receives a high signal indicating the snoop hit and that high signal is logically ANDed in second stage 56 with the decoded wordline address on address lines 44 / 48 , thus providing for at most one high input to OR gate 64 .
  • OR gate 70 performs a logic OR operation of the signals from lines 60 and 68 , snoop hit 11 c and snoop at 11 w , and provides a snoop hit signal on line 72 . Accordingly, a snoop hit occurring during either the first or the second clock cycle of the propagation delay in transmitting the victim address to bus cluster 20 generates a snoop hit on line 72 to bus cluster 20 .
  • bus cluster 20 may include conventional circuitry for processing the snoop hit signal and determining prioritization of the attempts to access the same address space in cache 18 .
  • the snoop hit signal in this example is a logic one or high signal; alternatively, other signals or logic levels may be used for indicating a snoop hit occurring during the propagation delay.
  • FIG. 3 is a timing diagram illustrating detection of snoop hits during each of the clock cycles of the propagation delay in transmitting the victim address to bus cluster 20 .
  • the timing diagram includes three consecutive clock cycles 74 (L 1 d ), 76 (L 1 c ), and 78 (L 1 w ).
  • the victim address is transmitted from victim array 30 to bus cluster 20 .
  • any snoop hit on line 38 is logically ANDed with a bit of the wordline via AND gate 52 .
  • AND gate 52 Since at most one bit of the wordline will be a logic one, only one of the AND gates 52 will receive both a high input from the snoop hit on line 38 and a high input from a bit of the wordlines 46 , if a snoop hit occurs during this clock cycle. Therefore, AND gate 52 , if the snoop hit occurs during clock cycle 76 , outputs a logic one or high signal on line 60 as snoop hit 11 c.
  • a snoop hit on line 38 is logically ANDed with the bit of wordline via AND gate 58 . If a snoop hit occurs during this clock cycle, at most one bit of the wordline will be a logic one, meaning that only one of the AND gates 58 will receive a high input from the snoop hit on line 38 and a bit of the wordlines 48 , outputting a logic one or high signal on line 68 and providing for a snoop hit 11 w signal.
  • the victim address arrives at bus cluster 20 , as well as any snoop hit signal on line 72 resulting from snoop hit 11 c or snoop hit 11 w.
  • FIGS. 4 a and 4 b are a flow chart illustrating a method 80 for operation of the logic circuitry in cache control circuitry 16 , implemented in hardware modules having the exemplary components described above.
  • victim array 30 receives the victim address from processor 12 on line 32 (step 82 ).
  • an address specifying which victim address to read is decoded using a four-to-sixteen decoder to form wordlines, and the victim address is sent to bus cluster 20 (steps 84 and 85 ). This may take several clock cycles to propagate to and be stored in a snoopable location within bus cluster 20 .
  • the decoded wordline address is transmitted to snoop detection circuitry in victim array 30 (step 86 ), which detects whether a snoop hit occurs during the first clock cycle from a snoop address received on line 36 from an external processor (step 88 ). If a snoop hit occurs, the snoop hit is transmitted to first stage 50 (step 89 ), which determines if the snoop hit and the wordlines 11 c are the same for the entry (step 90 ). If so, the logic circuitry outputs a snoop hit 11 c signal (step 92 ). During the second clock cycle of the propagation delay, victim array 30 determines if a snoop hit occurs (step 94 ).
  • the snoop hit is transmitted to second stage 56 (step 95 ), which determines if the snoop hit and the wordlines 11 c are the same for the entry (step 96 ). If so, the logic circuitry outputs snoop hit 11 w signal (step 98 ).
  • the logic circuitry determines if a snoop hit 11 c or a snoop hit 11 w signal is present, as detected in this example by OR gates 62 , 64 and 70 (step 100 ). If the snoop hit 11 c or snoop hit 11 w signal is present, the OR gates transmit a snoop hit signal to bus cluster 20 on line 72 (step 102 ), and the victim address also arrives at bus cluster 20 after the second clock cycle propagation delay (step 104 ). Bus cluster 20 subsequently may use conventional circuitry for receiving the victim address and the snoop hit signal and for determining prioritization of signals for access to cache 18 (step 106 ).
  • stages 50 and 56 may be used depending upon a propagation delay required to transmit a snoop address to a bus cluster. More or fewer AND gates may be used in stages 50 and 56 , depending upon a number of address lines present in a particular embodiment. Also, aside from use of AND gates, other logic circuitry may be used to detect a snoop hit during the propagation delay through other types of logical comparisons between snoop addresses and snoop hits.

Abstract

Circuitry for detecting snoop hits during the propagation and storage delay when transmitting a victim address to a bus cluster in a multiprocessor system. The circuitry includes stages for detecting the snoop hits during each cycle of the propagation delay. Each stage includes logic gates for comparing the wordline address with a snoop hit and for outputting a snoop hit signal upon detection of a snoop hit relating to the snoop address.

Description

FIELD OF THE INVENTION
The present invention relates to an apparatus and method for cache coherency by detecting snoop hits on victim lines issued to a higher level cache in a multiprocessor system.
BACKGROUND OF THE INVENTION
In a multiprocessor system, each processor has its own local cache for storing data. Each processor may write to and read from a shared higher-level cache. Therefore, each processor can access both its own local cache and the shared cache for the entire system. Cache coherency is required to ensure that two processors do not attempt to simultaneously access the same address space of the shared cache. In addition, due to propagation delays within the circuitry of each processor, cache coherency must ensure that attempts to access particular portions of a cache are prioritized.
In particular, when a processor attempts to replace a line in its local cache, it sends a victim address to its victim buffer in order to victimize an address space. At the same time, it transmits the victim address to a bus cluster, which is an internal on-chip interface between the processor and a system bus. The bus cluster manages prioritization of attempts to access the cache. Due to a propagation delay, the victim address transmitted to the bus cluster may require, for example, two clock cycles to reach the bus cluster. During those two clock cycles, another processor may attempt to access the same address space in the shared cache. If that occurs, the bus cluster will not be aware of the conflict resulting from attempts by both processors to access the same portion of the shared cache due to the two clock cycle delay. Therefore, circuitry must account for this type of conflict. In particular, a need exists for detecting snoop hits occurring on the same address space during a propagation delay when transmitting a victim address from a processor to a bus cluster in order to avoid conflicts while accessing the cache.
SUMMARY OF THE INVENTION
A method and apparatus consistent with the present invention includes receiving a victim address for a local cache in a multiprocessor system and transmitting the victim address to a bus cluster interfacing a processor with a system bus. A snoop is received during transmission of the victim address to the bus cluster, and it is determined if the snoop hits the victim address. If the snoop hits the victim address, a unique snoop hit signal is provided.
Another apparatus consistent with the present invention includes a plurality of wordlines corresponding to a victim address that was sent to a bus cluster and a snoop match line for detecting a snoop hit. Logic circuitry, connected to the plurality of wordlines and the snoop hit line, operates to determine if the snoop hit relates to the victim address that is being transmitted to the bus cluster interfacing a processor with a system bus. The logic circuitry also operates to provide a snoop hit signal if the snoop hits a victim address stored in the victim buffer and not yet issued to the bus cluster.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a multiprocessor system for implementing an embodiment consistent with the present invention.
FIG. 2 is a block diagram of logic circuitry within cache control circuitry in a processor for detecting snoop hits on victim lines issued to a cache during propagation delays in a decoded wordline address being transferred to a bus cluster.
FIG. 3 is a timing diagram illustrating the operation of the logic circuitry in FIG. 2.
FIGS. 4a and 4 b are flow chart illustrating the operation of the logic circuitry in FIG. 2.
DETAILED DESCRIPTION
FIG. 1 is a block diagram of a multiprocessor system 10 for implementing an embodiment consistent with the present invention. Multiprocessor system 10 includes a plurality of processors 12 and 14 coupled to a system bus 24. System 10 also includes typical components of a main memory 26 coupled to system bus 24 and an input/output (I/O) unit 28 coupled to system bus 24. A clock 22 controls, for example, operation of processor 12 and other components within system 10.
Processor 12 illustrates, for example, certain components used with a local cache. In particular, processor 12 includes cache control circuitry 16 coupled to a local cache 18 and a bus cluster 20. Cache control circuitry 16 may include conventional components for controlling writing to and reading from cache 18 by processor 12. Bus cluster 20 may include conventional components for interfacing processor 12 with system bus 24. In addition, by interfacing processor 12 with system bus 24, bus cluster 20 typically includes conventional components for handling requests by external processors, such as processor 14, to write to and read from local cache 18. Therefore, bus cluster 20 along with cache control circuitry 16 provides for cache coherency by prioritizing requests to access cache 18 and resolving conflicts between such requests. Bus clusters for resolving conflicts in accessing memory are known in the art, and a bus cluster includes any component for interfacing a processor with a system bus and for potentially resolving such conflicts.
FIG. 2 is a block diagram of cache control circuitry 16 including particular components used in controlling cache 18, in addition to other conventional components which may be used. Cache control circuitry 16 handles snoops received during transmission of a victim address to bus cluster 20. A snoop is a method for maintaining cache coherency by sending a desired snoop address onto the system bus in a multiprocessor system so that other cache controllers are able to determine whether or not they have a copy of the desired address in their local cache. In this example, cache control circuitry 16 includes a victim array 30, which receives a victim address on line 32 from processor 12 when the processor attempts to replace a line in its cache 18. A victim address specifies an address space in a cache. The victim address is also transmitted on line 34 to bus cluster 20, through a queue structure in victim array 30 having some delay, in order to notify bus cluster 20 of the access to the victim address. Bus cluster 20 may thus prioritize and resolve conflicts and attempts to write to and read from that same address space by other processors.
Victim array 30 can also receive snoop addresses on line 36 from external processors, such as processor 14, in system 10 attempting to use a certain address space. When victim array 30 contains the same address as that received from an external processor on line 36, it generates a snoop hit indicating an external processor attempts to access the same address space which it has already picked to victimize. Victim array 30 outputs the snoop hit on a snoop hit line 38.
Victim array circuitry, such as victim array 30, for generating such snoop hits is known in the art, and those types of circuitry are also referred to as a victim buffer or a victim queue. A victim array includes necessary components for detecting snoop hits in a multiprocessor system. Snooping includes known techniques for cache coherency in a multiprocessor system, and a snoop hit includes any indication of an attempt to access a cache by an external processor.
When the victim address is transmitted to bus cluster 20 on line 34, it requires in this example a two clock cycle propagation delay to be stored in a queue in bus cluster 20. Therefore, circuitry must ensure that during the two clock cycle delay, snoop hits are detected in victim array 30 and accounted for in order to provide for cache coherency in attempts to access cache 18. The wordlines used to issue the victim address are transmitted on decoded wordlines 40 through two stages 50 and 56, which serve to isolate the snoop hit during each clock cycle of the propagation delay. In particular, each of the wordlines 40 is transmitted through two latches, in this example, latches 42, 44, 46, and 48. Although only four latches are shown in FIG. 2 for simplicity, there are two latches for each wordline, or thirty-two latches in this example. These latches are typically already present in the circuitry of victim array 30 and, therefore, need not in this example be added for the additional logic to detect snoop hits. The exemplary embodiment thus makes use of components already on-chip in a processor and hence reduces the number of additional gates or components otherwise added to the processor for detecting the snoop hits during the propagation delay.
The latches (42, 44, 46, and 48) are further used in stages 50 and 56 to detect snoop hits during the two clock cycle delay in transmission of the decoded wordline address to bus cluster 20. In each of the stages, the latched wordlines 40 are logically compared with snoop hits on line 38 to detect a snoop hit during each cycle of the two clock cycle delay. In particular, first stage 50 includes a plurality of AND gates 52 and 54, each AND gate receiving as inputs the snoop hit on line 38 and one of the address wordlines 42/46. Although only two are shown for simplicity, in this example first stage 50 includes sixteen latches for latching each of the sixteen wordlines 40 as inputs, and sixteen AND gates, in addition to receiving as another input the snoop hit on line 38. The outputs of AND gates 52 and 54 are input to OR gate 62 and output on line 60 as a snoop hit 11 c.
Second stage 56 likewise includes a plurality of AND gates 58 and 60, of which only two are shown for simplicity. It actually includes sixteen latches in this example for latching as inputs the address lines 42/46 of the 16-bit wordline, and sixteen AND gates, as well as the snoop hit on line 38 as another input. An OR gate 64 receives as inputs the outputs from AND gates 58 and 60 in second stage 56 and provides an output on line 68 as a snoop hit 11 w. The snoop hits 11 c and 11 w are input to an OR gate 70 which provides a snoop hit output on line 72 to bus cluster 20. The terms 11 c and 11 w are used only as labels for the snoop detection signals in the two stages 50 and 56.
Therefore, if a snoop hit occurs in victim array 30 during the first clock cycle of the propagation delay, first stage 50 through each of its plurality of AND gates receives an output high on line 38 for the snoop hit and receives at least one high signal on lines 42/46. The wordline may be all zeroes or one out of sixteen bits logically high. This guarantees only one high input to OR gate 62. Second stage 56 likewise functions to detect a snoop hit during the second clock cycle of the propagation delay. In particular, if a snoop hit is detected by victim array 30 during the second clock cycle, snoop hit line 38 receives a high signal indicating the snoop hit and that high signal is logically ANDed in second stage 56 with the decoded wordline address on address lines 44/48, thus providing for at most one high input to OR gate 64.
Therefore, OR gate 70 performs a logic OR operation of the signals from lines 60 and 68, snoop hit 11 c and snoop at 11 w, and provides a snoop hit signal on line 72. Accordingly, a snoop hit occurring during either the first or the second clock cycle of the propagation delay in transmitting the victim address to bus cluster 20 generates a snoop hit on line 72 to bus cluster 20. In receiving the snoop hit signal, bus cluster 20 may include conventional circuitry for processing the snoop hit signal and determining prioritization of the attempts to access the same address space in cache 18. The snoop hit signal in this example is a logic one or high signal; alternatively, other signals or logic levels may be used for indicating a snoop hit occurring during the propagation delay.
FIG. 3 is a timing diagram illustrating detection of snoop hits during each of the clock cycles of the propagation delay in transmitting the victim address to bus cluster 20. For exemplary purposes only, only one bit of the wordlines 40 is shown in this diagram. The timing diagram includes three consecutive clock cycles 74 (L1 d), 76 (L1 c), and 78 (L1 w). During clock cycle 74, the victim address is transmitted from victim array 30 to bus cluster 20. During the first clock cycle 76 of the propagation delay, any snoop hit on line 38 is logically ANDed with a bit of the wordline via AND gate 52. Since at most one bit of the wordline will be a logic one, only one of the AND gates 52 will receive both a high input from the snoop hit on line 38 and a high input from a bit of the wordlines 46, if a snoop hit occurs during this clock cycle. Therefore, AND gate 52, if the snoop hit occurs during clock cycle 76, outputs a logic one or high signal on line 60 as snoop hit 11 c.
Likewise, during the second clock cycle 78 of the propagation delay, a snoop hit on line 38 is logically ANDed with the bit of wordline via AND gate 58. If a snoop hit occurs during this clock cycle, at most one bit of the wordline will be a logic one, meaning that only one of the AND gates 58 will receive a high input from the snoop hit on line 38 and a bit of the wordlines 48, outputting a logic one or high signal on line 68 and providing for a snoop hit 11 w signal. At the next clock cycle, the victim address arrives at bus cluster 20, as well as any snoop hit signal on line 72 resulting from snoop hit 11 c or snoop hit 11 w.
FIGS. 4a and 4 b are a flow chart illustrating a method 80 for operation of the logic circuitry in cache control circuitry 16, implemented in hardware modules having the exemplary components described above. In method 80, victim array 30 receives the victim address from processor 12 on line 32 (step 82). After some delay 83 in the victim buffer, an address specifying which victim address to read is decoded using a four-to-sixteen decoder to form wordlines, and the victim address is sent to bus cluster 20 (steps 84 and 85). This may take several clock cycles to propagate to and be stored in a snoopable location within bus cluster 20.
The decoded wordline address is transmitted to snoop detection circuitry in victim array 30 (step 86), which detects whether a snoop hit occurs during the first clock cycle from a snoop address received on line 36 from an external processor (step 88). If a snoop hit occurs, the snoop hit is transmitted to first stage 50 (step 89), which determines if the snoop hit and the wordlines 11 c are the same for the entry (step 90). If so, the logic circuitry outputs a snoop hit 11 c signal (step 92). During the second clock cycle of the propagation delay, victim array 30 determines if a snoop hit occurs (step 94). If the snoop hit occurs, the snoop hit is transmitted to second stage 56 (step 95), which determines if the snoop hit and the wordlines 11 c are the same for the entry (step 96). If so, the logic circuitry outputs snoop hit 11 w signal (step 98).
When the snoop hit address arrives at bus cluster 20 after the two clock cycle delay, the logic circuitry determines if a snoop hit 11 c or a snoop hit 11 w signal is present, as detected in this example by OR gates 62, 64 and 70 (step 100). If the snoop hit 11 c or snoop hit 11 w signal is present, the OR gates transmit a snoop hit signal to bus cluster 20 on line 72 (step 102), and the victim address also arrives at bus cluster 20 after the second clock cycle propagation delay (step 104). Bus cluster 20 subsequently may use conventional circuitry for receiving the victim address and the snoop hit signal and for determining prioritization of signals for access to cache 18 (step 106).
More or fewer stages, similar to stages 50 and 56, may be used depending upon a propagation delay required to transmit a snoop address to a bus cluster. More or fewer AND gates may be used in stages 50 and 56, depending upon a number of address lines present in a particular embodiment. Also, aside from use of AND gates, other logic circuitry may be used to detect a snoop hit during the propagation delay through other types of logical comparisons between snoop addresses and snoop hits.
While the present invention has been described in connection with an exemplary embodiment, it will be understood that many modifications will be readily apparent to those skilled in the art, and this application is intended to cover any adaptations or variations thereof. For example, different numbers of processors, capacities of the busses, types of processors, types of busses, and labels for the various entities and busses may be used without departing from the scope of the invention. This invention should be limited only by the claims and equivalents thereof.

Claims (20)

What is claimed is:
1. A method for providing an indication of address conflicts in attempts to access a local cache in a multiprocessor system, comprising:
victimizing an address for the local cache;
transmitting the victim address to a bus cluster interfacing a processor with a system bus;
receiving a snoop during transmission of the victim address to the bus cluster;
determining if the snoop hits the victim address by logically comparing the snoop hit with a plurality of wordlines corresponding to the victim address; and
providing a snoop hit signal if the snoop hits the victim address.
2. The method of claim 1 wherein the determining step includes latching decoded wordline addresses used to transmit the victim address to the bus cluster for multiple clock cycles.
3. The method of claim 2 wherein the determining step includes performing a logic AND operation of the decoded wordline address and the snoop hit.
4. The method of claim 3 wherein the performing step includes performing the logic AND operation for each clock cycle required to transmit the victim address to the bus cluster.
5. The method of claim 4, further including performing a logic OR operation for outputs of the logic AND operations.
6. The method of claim 1 wherein the providing step includes transmitting a snoop hit signal to the bus cluster.
7. The method of claim 1 wherein the determining step includes logically comparing the snoop hit with a decoded wordline address.
8. An apparatus for providing an indication of address conflicts in attempts to access a local cache in a multiprocessor system, comprising:
a module that receives a victim address for the local cache;
a module that transmits the victim address to a bus cluster interfacing a processor with a system bus;
a module that receives a snoop hit during transmission of the victim address to the bus cluster;
a module that determines if the snoop hits the victim address by logically comparing the snoop hit with a plurality of wordlines corresponding to the victim address; and
a module that provides a snoop hit signal if the snoop hits the victim address.
9. The apparatus of claim 8 wherein the determining module includes a module that latches a decoded wordline address used to transmit the victim address to the bus cluster for multiple clock cycles.
10. The apparatus of claim 9 wherein the determining module includes a module that performs a logic AND operation of the decoded wordline address and the snoop hit.
11. The apparatus of claim 10 wherein the performing module includes a module that performs the logic AND operation for each clock cycle required to transmit the victim address to the bus cluster.
12. The apparatus of claim 11, further including a module that performs a logic OR operation for outputs of the logic AND operations.
13. The apparatus of claim 9 wherein the determining module includes a module that logically compares the snoop hit with the decoded wordline address.
14. The apparatus of claim 8 wherein the providing module includes a module that transmits a snoop hit signal to the bus cluster.
15. An apparatus for providing an indication of address conflicts in attempts to access a local cache in a multiprocessor system, comprising:
a plurality of decoded wordlines for issuing a victim address;
a snoop hit line for indicating a snoop hit; and
logic circuitry connected to the plurality of decoded wordline address lines and the snoop hit line, the logic circuitry operating to:
determine if the snoop hit relates to the decoded wordline address while the victim address is being transmitted to a bus cluster interfacing processor with a system bus; and
provide a snoop hit signal if the snoop hit relates to the victim address.
16. The apparatus of claim 15 wherein the logic circuitry includes a plurality of stages for logically comparing the decoded wordline address with the snoop hit during each clock cycle of transmission of the victim address to the bus cluster.
17. The apparatus of claim 16 wherein each of the plurality of stages includes a plurality of logic AND gates each having inputs coupled to receive a bit of the decoded wordline address and snoop hit.
18. The apparatus of claim 17, further including a logic OR gate having inputs coupled to receive outputs of the AND gates and having an output providing a snoop hit signal.
19. The apparatus of claim 15 wherein the logic circuitry operates to logically AND the snoop hit with the decoded wordline address.
20. The apparatus of claim 19 wherein the logic circuitry further operates to logically OR outputs of the logic AND operation.
US09/467,352 1999-12-20 1999-12-20 Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache Expired - Fee Related US6484238B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/467,352 US6484238B1 (en) 1999-12-20 1999-12-20 Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/467,352 US6484238B1 (en) 1999-12-20 1999-12-20 Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache

Publications (1)

Publication Number Publication Date
US6484238B1 true US6484238B1 (en) 2002-11-19

Family

ID=23855346

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/467,352 Expired - Fee Related US6484238B1 (en) 1999-12-20 1999-12-20 Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache

Country Status (1)

Country Link
US (1) US6484238B1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040123034A1 (en) * 2002-12-23 2004-06-24 Rogers Paul L. Multiple cache coherency
US20040243768A1 (en) * 2003-05-27 2004-12-02 Dodd James M. Method and apparatus to improve multi-CPU system performance for accesses to memory
US20110113197A1 (en) * 2002-01-04 2011-05-12 Intel Corporation Queue arrays in network devices
US20150178221A1 (en) * 2010-09-28 2015-06-25 Texas Instruments Incorporated Level One Data Cache Line Lock and Enhanced Snoop Protocol During Cache Victims and Writebacks to Maintain Level One Data Cache and Level Two Cache Coherence

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4755930A (en) 1985-06-27 1988-07-05 Encore Computer Corporation Hierarchical cache memory system and method
US5228136A (en) 1990-01-16 1993-07-13 International Business Machines Corporation Method and apparatus to maintain cache coherency in a multiprocessor system with each processor's private cache updating or invalidating its contents based upon set activity
US5263144A (en) 1990-06-29 1993-11-16 Digital Equipment Corporation Method and apparatus for sharing data between processors in a computer system
US5303362A (en) 1991-03-20 1994-04-12 Digital Equipment Corporation Coupled memory multiprocessor computer system including cache coherency management protocols
US5404482A (en) 1990-06-29 1995-04-04 Digital Equipment Corporation Processor and method for preventing access to a locked memory block by recording a lock in a content addressable memory with outstanding cache fills
US5511226A (en) 1992-08-25 1996-04-23 Intel Corporation System for generating snoop addresses and conditionally generating source addresses whenever there is no snoop hit, the source addresses lagging behind the corresponding snoop addresses
US5708792A (en) 1992-04-29 1998-01-13 Sun Microsystems, Inc. Method and apparatus for a coherent copy-back buffer in a multipressor computer system
US5717898A (en) 1991-10-11 1998-02-10 Intel Corporation Cache coherency mechanism for multiprocessor computer systems
US5765196A (en) * 1996-02-27 1998-06-09 Sun Microsystems, Inc. System and method for servicing copyback requests in a multiprocessor system with a shared memory
US5860017A (en) 1996-06-28 1999-01-12 Intel Corporation Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction
US5859999A (en) 1996-10-03 1999-01-12 Idea Corporation System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4755930A (en) 1985-06-27 1988-07-05 Encore Computer Corporation Hierarchical cache memory system and method
US5228136A (en) 1990-01-16 1993-07-13 International Business Machines Corporation Method and apparatus to maintain cache coherency in a multiprocessor system with each processor's private cache updating or invalidating its contents based upon set activity
US5263144A (en) 1990-06-29 1993-11-16 Digital Equipment Corporation Method and apparatus for sharing data between processors in a computer system
US5404482A (en) 1990-06-29 1995-04-04 Digital Equipment Corporation Processor and method for preventing access to a locked memory block by recording a lock in a content addressable memory with outstanding cache fills
US5303362A (en) 1991-03-20 1994-04-12 Digital Equipment Corporation Coupled memory multiprocessor computer system including cache coherency management protocols
US5717898A (en) 1991-10-11 1998-02-10 Intel Corporation Cache coherency mechanism for multiprocessor computer systems
US5708792A (en) 1992-04-29 1998-01-13 Sun Microsystems, Inc. Method and apparatus for a coherent copy-back buffer in a multipressor computer system
US5511226A (en) 1992-08-25 1996-04-23 Intel Corporation System for generating snoop addresses and conditionally generating source addresses whenever there is no snoop hit, the source addresses lagging behind the corresponding snoop addresses
US5765196A (en) * 1996-02-27 1998-06-09 Sun Microsystems, Inc. System and method for servicing copyback requests in a multiprocessor system with a shared memory
US5860017A (en) 1996-06-28 1999-01-12 Intel Corporation Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction
US5859999A (en) 1996-10-03 1999-01-12 Idea Corporation System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A. Wolfe, "Techniques of prediction and speculation detailed", Electronic Engineering Times, Feb. 1999; pp. 43-44.

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110113197A1 (en) * 2002-01-04 2011-05-12 Intel Corporation Queue arrays in network devices
US8380923B2 (en) * 2002-01-04 2013-02-19 Intel Corporation Queue arrays in network devices
US20040123034A1 (en) * 2002-12-23 2004-06-24 Rogers Paul L. Multiple cache coherency
US7032077B2 (en) * 2002-12-23 2006-04-18 Hewlett-Packard Development Company, L.P. Multiple cache coherency
US20040243768A1 (en) * 2003-05-27 2004-12-02 Dodd James M. Method and apparatus to improve multi-CPU system performance for accesses to memory
WO2004107184A2 (en) * 2003-05-27 2004-12-09 Intel Corporation A method and apparatus to improve multi-cpu system performance for accesses to memory
WO2004107184A3 (en) * 2003-05-27 2005-01-27 Intel Corp A method and apparatus to improve multi-cpu system performance for accesses to memory
GB2416055A (en) * 2003-05-27 2006-01-11 Intel Corp A method and apparatus to improve multi-CPU system performance for accesses to memory
GB2416055B (en) * 2003-05-27 2007-03-21 Intel Corp A method and apparatus to improve multi-CPU system performance for accesses to memory
US7404047B2 (en) 2003-05-27 2008-07-22 Intel Corporation Method and apparatus to improve multi-CPU system performance for accesses to memory
US20150178221A1 (en) * 2010-09-28 2015-06-25 Texas Instruments Incorporated Level One Data Cache Line Lock and Enhanced Snoop Protocol During Cache Victims and Writebacks to Maintain Level One Data Cache and Level Two Cache Coherence
US9268708B2 (en) * 2010-09-28 2016-02-23 Texas Instruments Incorporated Level one data cache line lock and enhanced snoop protocol during cache victims and writebacks to maintain level one data cache and level two cache coherence

Similar Documents

Publication Publication Date Title
US4449183A (en) Arbitration scheme for a multiported shared functional device for use in multiprocessing systems
US4768148A (en) Read in process memory apparatus
KR100190351B1 (en) Apparatus and method for reducing interference in two-level cache memory
US6026464A (en) Memory control system and method utilizing distributed memory controllers for multibank memory
US4912632A (en) Memory control subsystem
US5623632A (en) System and method for improving multilevel cache performance in a multiprocessing system
US5809280A (en) Adaptive ahead FIFO with LRU replacement
US6078983A (en) Multiprocessor system having distinct data bus and address bus arbiters
US5265231A (en) Refresh control arrangement and a method for refreshing a plurality of random access memory banks in a memory system
US4967398A (en) Read/write random access memory with data prefetch
US5133074A (en) Deadlock resolution with cache snooping
US4513372A (en) Universal memory
US4586133A (en) Multilevel controller for a cache memory interface in a multiprocessing system
JP4866646B2 (en) How to select commands to send to memory, memory controller, computer system
US5659709A (en) Write-back and snoop write-back buffer to prevent deadlock and to enhance performance in an in-order protocol multiprocessing bus
US20030046356A1 (en) Method and apparatus for transaction tag assignment and maintenance in a distributed symmetric multiprocessor system
US5966728A (en) Computer system and method for snooping date writes to cacheable memory locations in an expansion memory device
US5737564A (en) Cache memory system having multiple caches with each cache mapped to a different area of main memory to avoid memory contention and to lessen the number of cache snoops
US6425056B2 (en) Method for controlling a direct mapped or two way set associative cache memory in a computer system
WO1998013763A2 (en) Multiport cache memory with address conflict detection
US5657055A (en) Method and apparatus for reading ahead display data into a display FIFO of a graphics controller
US6484238B1 (en) Apparatus and method for detecting snoop hits on victim lines issued to a higher level cache
US5586298A (en) Effective use of memory bus in a multiprocessing environment by controlling end of data intervention by a snooping cache
JPH05108484A (en) Cache memory
US5809534A (en) Performing a write cycle to memory in a multi-processor system

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD COMPANY, COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CUTTER, DOUGLAS J.;REEL/FRAME:010773/0185

Effective date: 19991216

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:026945/0699

Effective date: 20030131

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20141119