WO2001061500A1 - Processor with cache divided for processor core and pixel engine uses - Google Patents

Processor with cache divided for processor core and pixel engine uses Download PDF

Info

Publication number
WO2001061500A1
WO2001061500A1 PCT/US2000/004008 US0004008W WO0161500A1 WO 2001061500 A1 WO2001061500 A1 WO 2001061500A1 US 0004008 W US0004008 W US 0004008W WO 0161500 A1 WO0161500 A1 WO 0161500A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
memory
pixel engine
cache
processor core
Prior art date
Application number
PCT/US2000/004008
Other languages
French (fr)
Inventor
Guy Peled
Alexander D. Peleg
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to AU2000237002A priority Critical patent/AU2000237002A1/en
Priority to PCT/US2000/004008 priority patent/WO2001061500A1/en
Publication of WO2001061500A1 publication Critical patent/WO2001061500A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory

Definitions

  • the present invention relates to processors and, more particular, to processors having a cache that is divided for processor core and pixel engine uses.
  • Background Art Caches hold a relatively small amount of data that can be accessed relatively quickly by a processor core.
  • main memory holds a relatively large amount of data that is accessed relatively slowly by the processor core.
  • Caches may be formed of high speed static random access memory (SRAM).
  • SRAM static random access memory
  • Processors such as the Pentium® II processor manufactured by Intel Corporation, Santa Clara, California, include one or more levels of cache, sometimes referred to as an LI cache and an
  • L2 cache for example, to hold data for the processor core.
  • the processor core and cache may be on different dice.
  • a cache line sometimes called a block, is the smallest replaceable unit in a cache.
  • a cache controller may read from or write to all or merely part of a cache line at a time.
  • the term "cache line” is sometimes used to mean two different, but related, things: (1) data and (2) a group of memory locations in the cache that may hold the data.
  • data cache line refers to the data
  • cache line locations refers to the group of memory locations in the cache that may hold the data cache line. Different data cache lines may be held in the same cache line locations at different times.
  • a data cache line may be held in different architectures.
  • a data cache line may be held in only one cache line location in the cache.
  • a data cache line may be held anywhere in the cache.
  • a data cache line may be held in a “set associative” cache.
  • a "set” is a group of two or more cache line locations in the cache. If a set may hold 2 data cache lines, it is 2-way set- associative. If a set may hold 4 data cache lines, it is 4-way set-associative.
  • Caches may be organized into banks, sets, and ways. For example, referring to FIG. 1, a cache 10 is organized into banks 0, 1, ... 7; sets 0, 1, ..., 15; and ways 0, 1, 2, and 3.
  • a bank identifies certain memory locations that have something in common. For example, in some architectures, bank 0 holds a unit 0 of the data cache line, bank 1 holds a unit 1 of the data cache line, etc. In other architectures, banks may include memory locations that are powered for read or writing at the same time. There are various other uses of banks. For ease of illustration, cache 10 is made smaller than is typical.
  • a pixel engine sometimes called a graphics accelerator, is used to generate graphics signals (images) to be displayed. Pixel engines are included in a card that is connected to the motherboard. Pixel engines have their own caches, which may be formed of high SRAMS, referred to as video RAM (VRAM).
  • VRAM video RAM
  • processor core and pixel engine are on a common substrate such a printed circuit board, if not a common die. Further, there is a need for a processor in which the processor core and pixel engine share a cache.
  • a processor includes a processor core, a pixel engine, and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine.
  • FIG. 1 is a schematic representation of a prior art cache.
  • FIG. 2 is a block diagram representation of a computer system according to one embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating details of one embodiment of the system of
  • FIG. 4 is a block diagram representation of a processor core, pixel engine, and cache, wherein the cache holds data used by only the processor core.
  • FIG. 5 is a block diagram representation of processor core, pixel engine, and cache, wherein part of the cache holds data used by only the processor core and part of the cache uses data used by only the pixel engine.
  • FIG. 6 is a schematic representation of a first possible division of a cache into first and second memory subsets.
  • FIG. 7 is a schematic representation of a second possible division of a cache into first and second memory subsets.
  • FIG. 8 is a block diagram representation of a cache controller and first and second memory subsets according to one embodiment of the present invention.
  • FIG. 9 is a block diagram representation of first and second cache controllers and first and second memory subsets according to an alternative embodiment of the present invention.
  • FIG. 10 is a schematic representation of a third possible division of a cache into first and second memory subsets.
  • FIG. 11 is a schematic representation of a fourth possible division of a cache into first and second memory subsets.
  • FIG. 12 is a block diagram representation of a computer system according to one embodiment of the present invention.
  • FIG. 13 is a block diagram representation of a processor core, pixel engine, and cache memory wherein the cache holds particular types of data in particular ways of the cache.
  • a computer system 20 includes a processor 24, chipset/memory 26, and frame buffer 28.
  • Processor 24 includes a processor core 32, a pixel engine 34, and a cache 38.
  • Processor core 32 performs those functions typically performed by a processor core and communicates with chipset/memory 26 through processor bus 42.
  • Pixel engine 34 generates graphics signals to be supplied to frame buffer 28 through a bus 46. Alternatively, the graphics signals may be passed through processor bus 42.
  • Processor core 32 may also provide graphics signals to frame buffer 28 through bus 46 and/or bus 42.
  • Processor core 32 and pixel engine 34 are examples of computation devices.
  • processor core 32 and pixel engine 34 share cache 38.
  • Cache 38 may be used for holding merely data that does not include instructions, or may be used for holding data that does include instructions. That is, cache 38 may be, but is not required to be, a unified cache.
  • Processor core 32, pixel engine 34, and cache 38 may all be on the same die.
  • the outer boundary of processor 24 in FIG. 2 may represent a die.
  • cache 38 and/or pixel engine 34 may be on a separate die from the die of processor core 32.
  • the outer boundary of processor 24 in FIG. 2 may represent a substrate on which the dice are placed.
  • processor 24 may include other components in addition to those illustrated.
  • FIG. 3 illustrates certain functional blocks that could be included in processor 24.
  • processor core 32 includes a CPU core 54 and a floating point unit 56, each which may include multiple execution units.
  • Processor 24 also includes a bus unit 58.
  • Pixel engine 34 includes a 2D video engine 64, a 2D BLT engine 66, and a 3D pixel engine 68, as examples.
  • a cache controller 50 controls reading from and writing to cache 38. There may be snooping and other communications between chipset/memory 26 and cache controller 50 and cache 38.
  • Cache controller 50 may be separate from processor core 32 and pixel engine 34 or included within processor core 32 and/or pixel engine 34. There are a variety of approaches by which processor core 32 and pixel engine 34 may share cache 38.
  • cache controller 50 allows writes to and reads from cache 38 without consideration as to whether the data is for processor core 32 or pixel engine 34.
  • An advantage of this approach is that if one or the other of processor core 32 or pixel engine 34 needs a large portion of cache 38 at a particular time, that portion may be available.
  • a disadvantage is that one of processor core 32 or pixel engine 34 will utilize so much of cache 38 that the other of processor core 32 or pixel engine 34 will not have enough of cache 38 to adequately perform. For example, pixel engine 34 may utilize so much of cache 38 that read instructions in processor core 32 will result in an excessive number of cache misses, stalling processor core 32.
  • a first cache memory subset is always dedicated to data for processor core 32 and a second cache memory subset is always dedicated to data for pixel engine 34. (Note that the term "subset” does not necessarily involve dividing "sets" as in set associativity.)
  • both the first and second memory subsets are dedicated to data for processor core 32, while under other conditions, the first memory subset is dedicated to data for processor core 32 and the second memory subset is dedicated to data for pixel engine 32.
  • the first memory subset of cache 38 includes ways 0 and 1
  • the second memory subset of cache 38 includes ways 2 and 3.
  • the cache may be divided 50/50 or by some other division.
  • pixel engine 34 is inactive and ways 0-3 of cache 38 are used to hold data for processor core 32.
  • pixel engine 34 is active. Ways 0 and 1 of cache 38 are used to hold data for processor core 32, and ways 2 and 3 of cache 38 are used to hold data for pixel engine 34.
  • the size of the first and second memory subsets dynamically changes, depending on the needs of processor core 32 and pixel engine 34.
  • a difference between the third and fourth approaches is that in the third approach, the second memory subset is either completely dedicated to processor core 32 or completely to pixel engine 34.
  • the size of the memory subset dedicated to pixel engine 34 may dynamically change from zero to some maximum including at least one intermediate value.
  • memory subsets 38 A and 38B of cache 38 have the same or essentially the same size.
  • cache 38 is illustrated as including banks 0, 1, ..., 7; sets 0,
  • Banks 0, 1, ..., 7 are further divided into banks 0A, 0B, 1A, IB, ..., 7 A, and 7B, respectively.
  • Memory subset 38A includes banks 0A, 1A, ..., 7 A, which include all sets for ways 0 and 1.
  • Memory subset 38B includes banks 0B, IB, ..., 7B, which include all sets for ways 2 and 3. Accordingly, in FIG. 4, cache 38 is divided according to ways. A cache does not have to be divided according to ways.
  • cache 38 may include more than eight banks and sixteen sets. Further, a greater or lesser number of ways may be used.
  • the structure illustrated in FIG. 4 includes word lines, bit lines, sense amplifiers, etc.
  • the banks are not necessarily limited to holding particular bits. For example, bank 0 may hold merely a bit 0 of particular data, or may hold bits not necessarily limited to bit 0.
  • Cache 38 does not have to include only contiguous memory cells and may be more extensive than illustrated in FIG. 6.
  • FIG. 7 illustrates a second possible division of cache 38 into memory subsets 38A and 38B. In FIG. 7, memory subset 38A has a greater number of ways than does memory subset 38B.
  • Memory subset 38A includes sets 0, 1, ..., 15 for a bank 0 and ways 0, 1, 2, 3, 4, and 5.
  • Memory subset 38B includes sets 0, 1, ..., 15 for banks 0, 1, ..., bank 7 and ways 6 and 7.
  • Memory subset 38B could include more than one bank. Subsets 38 A and
  • FIGS. 6, 7, 8, and 9 illustrate two different approaches to providing address, data, and perhaps control signals to first and second memory subsets 38A and 38B.
  • conductors 80 carry address, data, and perhaps control signals to first and second memory subsets 38A and 38B.
  • conductors 80A carry address, data, and perhaps controls signals to memory subset 38A and conductors 80B carry address, data, and perhaps control signals to memory subset 38B.
  • conductors 80 A and 80B may be first and second connection points, points A and B, between memory subsets 38 A and 38B and processor core 32 and pixel engine 34, respectively.
  • conductors 80 is a connection point between memory subsets 38A and 38B and processor core 32 and pixel engine 34, respectively.
  • a single cache controller 50 may control reading from and writing to both memory subsets 38A and 38B through conductors 80.
  • Conductors 80 may include address, data, and perhaps control lines.
  • the address and data lines may include, for example, word lines and bit and bit* lines.
  • cache controller 50 may include two different cache controllers: a cache controller 50A for processor core 32 and memory subset 38A and a cache controller 50B for pixel engine 34 and memory subset 38B.
  • Two controllers may be able to read to and write from different memory subsets with a greater degree of independence than can only controller, but may require some redundancy for purposes of changing use of memory subset 38B from pixel engine 32 to processor core 32.
  • FIG. 10 illustrates a third possible division of cache 38 into memory subsets 38A and 38B. In FIG. 10, memory subset 38A has a greater number of ways than does memory subset 38B.
  • Memory subset 38A includes sets 0, 1, ..., 15 for a bank 0 and ways 0, 1, 2, 3, 4, 5, and for a bank 0A for a way 6A.
  • Memory subset 38B includes sets 0, 1, ..., 15 for banks 4, 5, 6, and 7 and a way 6B, and for banks 0, 1, ..., 7 and a way 7. Ways
  • FIG. 10 shows that the division does not have to be in integer ways (e.g., can be 6.5 and 1.5).
  • Cache 38 of FIG. 10 may receive address data, and perhaps control signals through conductors 80 for both memory subsets 38A and 38B, or through conductors 80A and 80B separately for memory subset 38A and 38B.
  • cache 38 is similar to FIG. 10, except that memory subset 38B includes only banks 0 and 1, rather than multiple banks as in FIG. 10. Again, a non-integer split is shown.
  • FIG. 12 illustrates additional details of one embodiment of computer system 20.
  • Dashed line 24 shows the boundary of a die, although cache 38 could be off the die in another embodiment.
  • portions of chipset 26 A and frame buffer RAMDAC are shown in FIG. 12 to illustrate additional details of one embodiment of computer system 20.
  • pixel engine 34 includes command processing unit 84, scanline interpolator unit 86 (e.g., for color, texture, fog, specular light), Z-buffering unit 88, texturing & fogging unit 90 (e.g., MIP - mapping anisotropic filtering), shading unit 92
  • FIG. 13 illustrates how different types of data may be stored in different ways of cache 38. For example, way 2 may act as a texture tile cache, while a pixel chunk in way 3 may be directly addressed by pixel engine 34. Additional Information and Embodiments
  • Embodiments in according with the present invention may include various circuits, including well known circuits, and be constructed according to various techniques, processes, and materials, including well known techniques, processes, and materials.
  • in one embodiment means that the particular feature, structure, or characteristic associated with the phrase is included in at least one embodiment of the invention, and may be included in more than one embodiment of the invention. Also, the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same one embodiment.

Abstract

Under one aspect of the invention, a processor includes a processor core, a pixel engine, and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine.

Description

PROCESSOR WITH CACHE DIVIDED FOR PROCESSOR CORE
AND PIXEL ENGINE USES
Background of the Invention
Technical Field of the Invention: The present invention relates to processors and, more particular, to processors having a cache that is divided for processor core and pixel engine uses. Background Art: Caches hold a relatively small amount of data that can be accessed relatively quickly by a processor core. By contrast, main memory holds a relatively large amount of data that is accessed relatively slowly by the processor core. Caches may be formed of high speed static random access memory (SRAM). Processors, such as the Pentium® II processor manufactured by Intel Corporation, Santa Clara, California, include one or more levels of cache, sometimes referred to as an LI cache and an
L2 cache, for example, to hold data for the processor core. The processor core and cache may be on different dice.
A cache line, sometimes called a block, is the smallest replaceable unit in a cache. However, in some architectures, a cache controller may read from or write to all or merely part of a cache line at a time. The term "cache line" is sometimes used to mean two different, but related, things: (1) data and (2) a group of memory locations in the cache that may hold the data. To avoid confusion, the term "data cache line" refers to the data and the term "cache line locations" refers to the group of memory locations in the cache that may hold the data cache line. Different data cache lines may be held in the same cache line locations at different times.
In different architectures, there are different possible cache line locations in which a data cache line may be held. In a "direct mapped" cache, a data cache line may be held in only one cache line location in the cache. In a "fully associative" cache, a data cache line may be held anywhere in the cache. In a "set associative" cache, a data cache line may be held in a restricted set of cache line locations in the cache. A "set" is a group of two or more cache line locations in the cache. If a set may hold 2 data cache lines, it is 2-way set- associative. If a set may hold 4 data cache lines, it is 4-way set-associative.
Caches may be organized into banks, sets, and ways. For example, referring to FIG. 1, a cache 10 is organized into banks 0, 1, ... 7; sets 0, 1, ..., 15; and ways 0, 1, 2, and 3. A bank identifies certain memory locations that have something in common. For example, in some architectures, bank 0 holds a unit 0 of the data cache line, bank 1 holds a unit 1 of the data cache line, etc. In other architectures, banks may include memory locations that are powered for read or writing at the same time. There are various other uses of banks. For ease of illustration, cache 10 is made smaller than is typical. A pixel engine, sometimes called a graphics accelerator, is used to generate graphics signals (images) to be displayed. Pixel engines are included in a card that is connected to the motherboard. Pixel engines have their own caches, which may be formed of high SRAMS, referred to as video RAM (VRAM).
It is expensive to provide the pixel engine on a separate add-in card from the processor core. Further, it is expensive to provide separate caches for the processor core and the pixel engine.
Accordingly, there is a need for a processor in which the processor core and pixel engine are on a common substrate such a printed circuit board, if not a common die. Further, there is a need for a processor in which the processor core and pixel engine share a cache.
Summary of the Invention Under one aspect of the invention, a processor includes a processor core, a pixel engine, and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine. Brief Description of the Drawings
The invention will be understood more fully from the detailed description given below and from the accompanying drawings of embodiments of the invention which, however, should not be taken to limit the invention to the specific embodiments described, but are for explanation and understanding only.
FIG. 1 is a schematic representation of a prior art cache.
FIG. 2 is a block diagram representation of a computer system according to one embodiment of the present invention. FIG. 3 is a block diagram illustrating details of one embodiment of the system of
FIG. 2.
FIG. 4 is a block diagram representation of a processor core, pixel engine, and cache, wherein the cache holds data used by only the processor core.
FIG. 5 is a block diagram representation of processor core, pixel engine, and cache, wherein part of the cache holds data used by only the processor core and part of the cache uses data used by only the pixel engine.
FIG. 6 is a schematic representation of a first possible division of a cache into first and second memory subsets.
FIG. 7 is a schematic representation of a second possible division of a cache into first and second memory subsets.
FIG. 8 is a block diagram representation of a cache controller and first and second memory subsets according to one embodiment of the present invention.
FIG. 9 is a block diagram representation of first and second cache controllers and first and second memory subsets according to an alternative embodiment of the present invention.
FIG. 10 is a schematic representation of a third possible division of a cache into first and second memory subsets. FIG. 11 is a schematic representation of a fourth possible division of a cache into first and second memory subsets.
FIG. 12 is a block diagram representation of a computer system according to one embodiment of the present invention. FIG. 13 is a block diagram representation of a processor core, pixel engine, and cache memory wherein the cache holds particular types of data in particular ways of the cache.
Detailed Description of Preferred Embodiments Referring to FIG. 2, a computer system 20 includes a processor 24, chipset/memory 26, and frame buffer 28. Processor 24 includes a processor core 32, a pixel engine 34, and a cache 38. Processor core 32 performs those functions typically performed by a processor core and communicates with chipset/memory 26 through processor bus 42. Pixel engine 34 generates graphics signals to be supplied to frame buffer 28 through a bus 46. Alternatively, the graphics signals may be passed through processor bus 42. Processor core 32 may also provide graphics signals to frame buffer 28 through bus 46 and/or bus 42. Processor core 32 and pixel engine 34 are examples of computation devices.
In contrast to prior art computer systems, processor core 32 and pixel engine 34 share cache 38. Cache 38 may be used for holding merely data that does not include instructions, or may be used for holding data that does include instructions. That is, cache 38 may be, but is not required to be, a unified cache. Processor core 32, pixel engine 34, and cache 38 may all be on the same die. The outer boundary of processor 24 in FIG. 2 may represent a die. Alternatively, cache 38 and/or pixel engine 34 may be on a separate die from the die of processor core 32. The outer boundary of processor 24 in FIG. 2 may represent a substrate on which the dice are placed. Of course, processor 24 may include other components in addition to those illustrated. Merely as examples, FIG. 3 illustrates certain functional blocks that could be included in processor 24. There are, of course, other possibilities. In the embodiment of FIG. 3, processor core 32 includes a CPU core 54 and a floating point unit 56, each which may include multiple execution units. Processor 24 also includes a bus unit 58. Pixel engine 34 includes a 2D video engine 64, a 2D BLT engine 66, and a 3D pixel engine 68, as examples. A cache controller 50 controls reading from and writing to cache 38. There may be snooping and other communications between chipset/memory 26 and cache controller 50 and cache 38. Cache controller 50 may be separate from processor core 32 and pixel engine 34 or included within processor core 32 and/or pixel engine 34. There are a variety of approaches by which processor core 32 and pixel engine 34 may share cache 38. In a first approach, cache controller 50 allows writes to and reads from cache 38 without consideration as to whether the data is for processor core 32 or pixel engine 34. An advantage of this approach is that if one or the other of processor core 32 or pixel engine 34 needs a large portion of cache 38 at a particular time, that portion may be available. A disadvantage is that one of processor core 32 or pixel engine 34 will utilize so much of cache 38 that the other of processor core 32 or pixel engine 34 will not have enough of cache 38 to adequately perform. For example, pixel engine 34 may utilize so much of cache 38 that read instructions in processor core 32 will result in an excessive number of cache misses, stalling processor core 32. In a second approach, a first cache memory subset is always dedicated to data for processor core 32 and a second cache memory subset is always dedicated to data for pixel engine 34. (Note that the term "subset" does not necessarily involve dividing "sets" as in set associativity.)
Referring to FIGS. 4 and 5, in a third approach, under certain conditions, both the first and second memory subsets are dedicated to data for processor core 32, while under other conditions, the first memory subset is dedicated to data for processor core 32 and the second memory subset is dedicated to data for pixel engine 32. As an example, in FIGS. 4 and 5, the first memory subset of cache 38 includes ways 0 and 1, and the second memory subset of cache 38 includes ways 2 and 3. As is illustrated and described herein, the cache may be divided 50/50 or by some other division. In FIG. 3, pixel engine 34 is inactive and ways 0-3 of cache 38 are used to hold data for processor core 32. In FIG. 4, pixel engine 34 is active. Ways 0 and 1 of cache 38 are used to hold data for processor core 32, and ways 2 and 3 of cache 38 are used to hold data for pixel engine 34.
Under a fourth approach, the size of the first and second memory subsets dynamically changes, depending on the needs of processor core 32 and pixel engine 34. A difference between the third and fourth approaches is that in the third approach, the second memory subset is either completely dedicated to processor core 32 or completely to pixel engine 34. In the fourth approach, the size of the memory subset dedicated to pixel engine 34 may dynamically change from zero to some maximum including at least one intermediate value.
There are a variety of approaches to dividing a cache into two or more memory subsets. In FIG. 6, memory subsets 38 A and 38B of cache 38 have the same or essentially the same size. For simplicity, cache 38 is illustrated as including banks 0, 1, ..., 7; sets 0,
1, ..., 15; and ways 0, 1, 2, and 3. Banks 0, 1, ..., 7 are further divided into banks 0A, 0B, 1A, IB, ..., 7 A, and 7B, respectively. Memory subset 38A includes banks 0A, 1A, ..., 7 A, which include all sets for ways 0 and 1. Memory subset 38B includes banks 0B, IB, ..., 7B, which include all sets for ways 2 and 3. Accordingly, in FIG. 4, cache 38 is divided according to ways. A cache does not have to be divided according to ways.
Rather, the cache could be divided by banks, sets, or some other measure or dimension of the cache. In practice, cache 38 may include more than eight banks and sixteen sets. Further, a greater or lesser number of ways may be used. The structure illustrated in FIG. 4 includes word lines, bit lines, sense amplifiers, etc. The banks are not necessarily limited to holding particular bits. For example, bank 0 may hold merely a bit 0 of particular data, or may hold bits not necessarily limited to bit 0. Cache 38 does not have to include only contiguous memory cells and may be more extensive than illustrated in FIG. 6. FIG. 7 illustrates a second possible division of cache 38 into memory subsets 38A and 38B. In FIG. 7, memory subset 38A has a greater number of ways than does memory subset 38B. Memory subset 38A includes sets 0, 1, ..., 15 for a bank 0 and ways 0, 1, 2, 3, 4, and 5. Memory subset 38B includes sets 0, 1, ..., 15 for banks 0, 1, ..., bank 7 and ways 6 and 7. Memory subset 38B could include more than one bank. Subsets 38 A and
38B do not have to have the same number of banks.
FIGS. 6, 7, 8, and 9 illustrate two different approaches to providing address, data, and perhaps control signals to first and second memory subsets 38A and 38B. In a first approach, illustrated in FIGS. 7 and 8, conductors 80 carry address, data, and perhaps control signals to first and second memory subsets 38A and 38B. In a second approach, illustrated in FIGS. 6 and 9, conductors 80A carry address, data, and perhaps controls signals to memory subset 38A and conductors 80B carry address, data, and perhaps control signals to memory subset 38B. In FIG. 6, conductors 80 A and 80B may be first and second connection points, points A and B, between memory subsets 38 A and 38B and processor core 32 and pixel engine 34, respectively. In FIG. 7, conductors 80 is a connection point between memory subsets 38A and 38B and processor core 32 and pixel engine 34, respectively.
Referring to FIG. 8, a single cache controller 50 may control reading from and writing to both memory subsets 38A and 38B through conductors 80. Conductors 80 may include address, data, and perhaps control lines. The address and data lines may include, for example, word lines and bit and bit* lines.
Alternatively, referring to FIG. 9, cache controller 50 may include two different cache controllers: a cache controller 50A for processor core 32 and memory subset 38A and a cache controller 50B for pixel engine 34 and memory subset 38B. Two controllers may be able to read to and write from different memory subsets with a greater degree of independence than can only controller, but may require some redundancy for purposes of changing use of memory subset 38B from pixel engine 32 to processor core 32. FIG. 10 illustrates a third possible division of cache 38 into memory subsets 38A and 38B. In FIG. 10, memory subset 38A has a greater number of ways than does memory subset 38B. Memory subset 38A includes sets 0, 1, ..., 15 for a bank 0 and ways 0, 1, 2, 3, 4, 5, and for a bank 0A for a way 6A. Memory subset 38B includes sets 0, 1, ..., 15 for banks 4, 5, 6, and 7 and a way 6B, and for banks 0, 1, ..., 7 and a way 7. Ways
6A and 6B are each sub-ways. FIG. 10 shows that the division does not have to be in integer ways (e.g., can be 6.5 and 1.5). Cache 38 of FIG. 10 may receive address data, and perhaps control signals through conductors 80 for both memory subsets 38A and 38B, or through conductors 80A and 80B separately for memory subset 38A and 38B. In FIG. 1 1, cache 38 is similar to FIG. 10, except that memory subset 38B includes only banks 0 and 1, rather than multiple banks as in FIG. 10. Again, a non-integer split is shown.
FIG. 12 illustrates additional details of one embodiment of computer system 20. Dashed line 24 shows the boundary of a die, although cache 38 could be off the die in another embodiment. Further, portions of chipset 26 A and frame buffer RAMDAC
(digital-to-analog converter) could be on the die. System memory 26B is controlled by chipset 26A. Display 82 receives signals from frame buffer & RAMDAC 28. In the particular illustrated embodiment, pixel engine 34 includes command processing unit 84, scanline interpolator unit 86 (e.g., for color, texture, fog, specular light), Z-buffering unit 88, texturing & fogging unit 90 (e.g., MIP - mapping anisotropic filtering), shading unit 92
(e.g., Gouroud/Phong), and alpha lending & fog unit 94. Cache 38 is addressed to hold respective data of the different units. A 2D Bit & video engine 98 may also be included in pixel engine 34. Chunk frame buffer 0 and chunk frame buffer 1 receive and provide data in a ping-pong fashion from alpha blending & fog unit 94 and to 2D bit & video engine 98. FIG. 13 illustrates how different types of data may be stored in different ways of cache 38. For example, way 2 may act as a texture tile cache, while a pixel chunk in way 3 may be directly addressed by pixel engine 34. Additional Information and Embodiments
The specification does not describe or illustrate various well known components, features, and conductors, a discussion of which is not necessary to understand the invention and inclusion of which would tend to obscure the invention. Furthermore, in constructing an embodiment of the invention, there are design tradeoffs and choices, which would vary depending on the embodiment. There are a variety of ways of implementing the illustrated and unillustrated components. Embodiments in according with the present invention, including those illustrated, may include various circuits, including well known circuits, and be constructed according to various techniques, processes, and materials, including well known techniques, processes, and materials.
The borders of the boxes in the figures are for illustrative purposes and do not restrict the boundaries of the components, which may overlap. The relative size of the illustrative components does not to suggest actual relative sizes. The term "conductor" is intended to be interpreted broadly and includes devices that conduct although they also have some insulating properties. There may be intermediate components or conductors between the illustrated components and conductors.
The phrase "in one embodiment" means that the particular feature, structure, or characteristic associated with the phrase is included in at least one embodiment of the invention, and may be included in more than one embodiment of the invention. Also, the appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same one embodiment.
If the specification states a component or feature "may", "can", "could", or "might" be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic. The term "responsive" includes completely or partially responsive.
Those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present invention. Accordingly, it is the following claims including any amendments thereto that define the scope of the invention.

Claims

CLAIMSWhat is claimed is:
1. A processor, comprising: a processor core; a pixel engine; and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine.
2. The processor of claim 1, wherein the processor core, pixel engine, and cache memory are on a single die.
3. The processor of claim 1, wherein the first and second memory subset have essentially an equal number of memory locations.
4. The processor of claim 1, wherein the first and second memory subset have a substantially different number of memory locations.
5. The processor of claim 1, wherein the cache memory includes N ways, and the first memory subset includes a first portion of the N ways and the second memory subset includes a second portion of the N ways.
6. The processor of claim 1, wherein at least one way of the second memory subset is directly addressed for various pixel engine purposes.
7. The processor of claim 1 , further comprising a cache controller to control reading to and writing from the first and second memory subsets.
8. The processor of claim 7, wherein the cache controller is included in the processor core and pixel engine.
9. The processor of claim 1, further comprising a first cache controller to control reading to and writing from the first memory subset and a second cache controller to control reading to and writing from and second memory subset.
10. The processor of claim 9, wherein the first cache controller is included within the processor core and the second cache controller is included within the pixel engine.
11. The processor of claim 1, wherein sizes of the first and second memory subset changes dynamically, depending on use requirements of the processor core and pixel engine.
12. The processor of claim 1, wherein the second memory subset is either completely dedicated to holding data for the pixel engine or dedicated to holding data for the processor core.
13. The processor of claim 1 , wherein an amount of the second memory subset that is dedicated to holding data for the pixel engine may dynamically change between zero and a maximum amount.
14. The processor of claim 1, wherein an amount of the second memory subset that is dedicated to holding data for the pixel engine may dynamically change between zero and a maximum amount by at least one increment.
15. A monolithic memory apparatus comprising: a first memory subset including a first connection point at which the first memory subset can be coupled to a first computation device; and a second memory subset including a second connection point at which the second memory subset can be coupled to a second computation device.
16. The apparatus of claim 15, wherein the first computation device is a processor core and the second computation device is a pixel engine.
17. A monolithic processor comprising : a processor core; a pixel engine; and a cache memory coupled to the processor core and to the pixel engine and including, a first memory subset which is coupled only for use by the processor core, and a second memory subset which is selectively couplable for use by the processor core in a first operating mode of the monolithic processor, and for use by the pixel engine in a second operating mode of the monolithic processor.
18. The monolithic processor of claim 17, further comprising: a unified bus unit coupled to the processor core and to the pixel engine for enabling both to access a common external bus.
19. The monolithic processor of claim 17, wherein in the second operating mode of the monolithic processor: the second memory subset is further coupled for use by the processor core; there is general purpose microprocessor data stored in the first memory subset; and there is pixel engine data stored in the second memory subset.
PCT/US2000/004008 2000-02-16 2000-02-16 Processor with cache divided for processor core and pixel engine uses WO2001061500A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2000237002A AU2000237002A1 (en) 2000-02-16 2000-02-16 Processor with cache divided for processor core and pixel engine uses
PCT/US2000/004008 WO2001061500A1 (en) 2000-02-16 2000-02-16 Processor with cache divided for processor core and pixel engine uses

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2000/004008 WO2001061500A1 (en) 2000-02-16 2000-02-16 Processor with cache divided for processor core and pixel engine uses

Publications (1)

Publication Number Publication Date
WO2001061500A1 true WO2001061500A1 (en) 2001-08-23

Family

ID=21741066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/004008 WO2001061500A1 (en) 2000-02-16 2000-02-16 Processor with cache divided for processor core and pixel engine uses

Country Status (2)

Country Link
AU (1) AU2000237002A1 (en)
WO (1) WO2001061500A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002025447A2 (en) * 2000-09-22 2002-03-28 Intel Corporation Cache dynamically configured for simultaneous accesses by multiple computing engines
WO2004102376A2 (en) * 2003-05-09 2004-11-25 Intel Corporation (A Delaware Corporation) Apparatus and method to provide multithreaded computer processing
WO2006004875A1 (en) * 2004-06-30 2006-01-12 Sun Microsystems, Inc. Multiple-core processor with support for multiple virtual processors
US7685354B1 (en) 2004-06-30 2010-03-23 Sun Microsystems, Inc. Multiple-core processor with flexible mapping of processor cores to cache banks
FR2993378A1 (en) * 2012-07-12 2014-01-17 Univ Bretagne Sud Data processing system, has cache memory connected to programmable processor through communication bus, and coprocessor directly connected to cache memory such that programmable processor and coprocessor exchange data through cache memory

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553262A (en) * 1988-01-21 1996-09-03 Mitsubishi Denki Kabushiki Kaisha Memory apparatus and method capable of setting attribute of information to be cached
US5761720A (en) * 1996-03-15 1998-06-02 Rendition, Inc. Pixel engine pipeline processor data caching mechanism
US5860158A (en) * 1996-11-15 1999-01-12 Samsung Electronics Company, Ltd. Cache control unit with a cache request transaction-oriented protocol

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5553262A (en) * 1988-01-21 1996-09-03 Mitsubishi Denki Kabushiki Kaisha Memory apparatus and method capable of setting attribute of information to be cached
US5553262B1 (en) * 1988-01-21 1999-07-06 Mitsubishi Electric Corp Memory apparatus and method capable of setting attribute of information to be cached
US5761720A (en) * 1996-03-15 1998-06-02 Rendition, Inc. Pixel engine pipeline processor data caching mechanism
US5860158A (en) * 1996-11-15 1999-01-12 Samsung Electronics Company, Ltd. Cache control unit with a cache request transaction-oriented protocol

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HANSEN C: "MICROUNITY'S MEDIA PROCESSOR ARCHITECTURE", IEEE MICRO,US,IEEE INC. NEW YORK, vol. 16, no. 4, 1 August 1996 (1996-08-01), pages 34 - 41, XP000596511, ISSN: 0272-1732 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002025447A2 (en) * 2000-09-22 2002-03-28 Intel Corporation Cache dynamically configured for simultaneous accesses by multiple computing engines
WO2002025447A3 (en) * 2000-09-22 2002-11-28 Intel Corp Cache dynamically configured for simultaneous accesses by multiple computing engines
GB2383868A (en) * 2000-09-22 2003-07-09 Intel Corp Cache dynamically configured for simultaneous accesses by multiple computing engines
US6665775B1 (en) 2000-09-22 2003-12-16 Intel Corporation Cache dynamically configured for simultaneous accesses by multiple computing engines
GB2383868B (en) * 2000-09-22 2005-02-02 Intel Corp Cache dynamically configured for simultaneous accesses by multiple computing engines
WO2004102376A2 (en) * 2003-05-09 2004-11-25 Intel Corporation (A Delaware Corporation) Apparatus and method to provide multithreaded computer processing
WO2004102376A3 (en) * 2003-05-09 2005-07-07 Intel Corp Apparatus and method to provide multithreaded computer processing
WO2006004875A1 (en) * 2004-06-30 2006-01-12 Sun Microsystems, Inc. Multiple-core processor with support for multiple virtual processors
US7685354B1 (en) 2004-06-30 2010-03-23 Sun Microsystems, Inc. Multiple-core processor with flexible mapping of processor cores to cache banks
US7873776B2 (en) 2004-06-30 2011-01-18 Oracle America, Inc. Multiple-core processor with support for multiple virtual processors
FR2993378A1 (en) * 2012-07-12 2014-01-17 Univ Bretagne Sud Data processing system, has cache memory connected to programmable processor through communication bus, and coprocessor directly connected to cache memory such that programmable processor and coprocessor exchange data through cache memory

Also Published As

Publication number Publication date
AU2000237002A1 (en) 2001-08-27

Similar Documents

Publication Publication Date Title
US6667744B2 (en) High speed video frame buffer
US5767865A (en) Semiconductor integrated circuit device allowing fast rewriting of image data and image data processing system using the same
US6591347B2 (en) Dynamic replacement technique in a shared cache
JP3779748B2 (en) Frame buffer memory, computer system, pixel access method, and block fill operation method
US6801207B1 (en) Multimedia processor employing a shared CPU-graphics cache
US7805587B1 (en) Memory addressing controlled by PTE fields
US6195106B1 (en) Graphics system with multiported pixel buffers for accelerated pixel processing
US6097402A (en) System and method for placement of operands in system memory
US20090077320A1 (en) Direct access of cache lock set data without backing memory
KR100648293B1 (en) Graphic system and graphic processing method for the same
US7990391B2 (en) Memory system having multiple address allocation formats and method for use thereof
US7180522B2 (en) Apparatus and method for distributed memory control in a graphics processing system
US20060098021A1 (en) Graphics system and memory device for three-dimensional graphics acceleration and method for three dimensional graphics processing
US6650333B1 (en) Multi-pool texture memory management
WO2009023637A2 (en) Memory device and method having on-board processing logic for facilitating interface with multiple processors, and computer system using same
US6529201B1 (en) Method and apparatus for storing and accessing texture maps
JP2016509280A (en) Multi-mode memory access technique for graphics processing unit based memory transfer operations
EP0777233A1 (en) A memory architecture using conserved adressing and systems and methods using the same
US7948498B1 (en) Efficient texture state cache
US8217954B2 (en) Reconfigurable dual texture pipeline with shared texture cache
US6646646B2 (en) Memory system having programmable multiple and continuous memory regions and method of use thereof
US20140160876A1 (en) Address bit remapping scheme to reduce access granularity of dram accesses
US6683615B1 (en) Doubly-virtualized texture memory
WO2001061500A1 (en) Processor with cache divided for processor core and pixel engine uses
US5278964A (en) Microprocessor system including a cache controller which remaps cache address bits to confine page data to a particular block of cache

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase