WO2001061500A1

WO2001061500A1 - Processor with cache divided for processor core and pixel engine uses

Info

Publication number: WO2001061500A1
Application number: PCT/US2000/004008
Authority: WO
Inventors: Guy Peled; Alexander D. Peleg
Original assignee: Intel Corporation
Priority date: 2000-02-16
Filing date: 2000-02-16
Publication date: 2001-08-23
Also published as: AU2000237002A1

Abstract

Under one aspect of the invention, a processor includes a processor core, a pixel engine, and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine.

Description

PROCESSOR WITH CACHE DIVIDED FOR PROCESSOR CORE

AND PIXEL ENGINE USES

Background of the Invention

Technical Field of the Invention: The present invention relates to processors and, more particular, to processors having a cache that is divided for processor core and pixel engine uses. Background Art: Caches hold a relatively small amount of data that can be accessed relatively quickly by a processor core. By contrast, main memory holds a relatively large amount of data that is accessed relatively slowly by the processor core. Caches may be formed of high speed static random access memory (SRAM). Processors, such as the Pentium® II processor manufactured by Intel Corporation, Santa Clara, California, include one or more levels of cache, sometimes referred to as an LI cache and an

L2 cache, for example, to hold data for the processor core. The processor core and cache may be on different dice.

A cache line, sometimes called a block, is the smallest replaceable unit in a cache. However, in some architectures, a cache controller may read from or write to all or merely part of a cache line at a time. The term "cache line" is sometimes used to mean two different, but related, things: (1) data and (2) a group of memory locations in the cache that may hold the data. To avoid confusion, the term "data cache line" refers to the data and the term "cache line locations" refers to the group of memory locations in the cache that may hold the data cache line. Different data cache lines may be held in the same cache line locations at different times.

In different architectures, there are different possible cache line locations in which a data cache line may be held. In a "direct mapped" cache, a data cache line may be held in only one cache line location in the cache. In a "fully associative" cache, a data cache line may be held anywhere in the cache. In a "set associative" cache, a data cache line may be held in a restricted set of cache line locations in the cache. A "set" is a group of two or more cache line locations in the cache. If a set may hold 2 data cache lines, it is 2-way set- associative. If a set may hold 4 data cache lines, it is 4-way set-associative.

Caches may be organized into banks, sets, and ways. For example, referring to FIG. 1, a cache 10 is organized into banks 0, 1, ... 7; sets 0, 1, ..., 15; and ways 0, 1, 2, and 3. A bank identifies certain memory locations that have something in common. For example, in some architectures, bank 0 holds a unit 0 of the data cache line, bank 1 holds a unit 1 of the data cache line, etc. In other architectures, banks may include memory locations that are powered for read or writing at the same time. There are various other uses of banks. For ease of illustration, cache 10 is made smaller than is typical. A pixel engine, sometimes called a graphics accelerator, is used to generate graphics signals (images) to be displayed. Pixel engines are included in a card that is connected to the motherboard. Pixel engines have their own caches, which may be formed of high SRAMS, referred to as video RAM (VRAM).

It is expensive to provide the pixel engine on a separate add-in card from the processor core. Further, it is expensive to provide separate caches for the processor core and the pixel engine.

Accordingly, there is a need for a processor in which the processor core and pixel engine are on a common substrate such a printed circuit board, if not a common die. Further, there is a need for a processor in which the processor core and pixel engine share a cache.

Summary of the Invention Under one aspect of the invention, a processor includes a processor core, a pixel engine, and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine. Brief Description of the Drawings

The invention will be understood more fully from the detailed description given below and from the accompanying drawings of embodiments of the invention which, however, should not be taken to limit the invention to the specific embodiments described, but are for explanation and understanding only.

FIG. 1 is a schematic representation of a prior art cache.

FIG. 2 is a block diagram representation of a computer system according to one embodiment of the present invention. FIG. 3 is a block diagram illustrating details of one embodiment of the system of

FIG. 2.

FIG. 4 is a block diagram representation of a processor core, pixel engine, and cache, wherein the cache holds data used by only the processor core.

FIG. 5 is a block diagram representation of processor core, pixel engine, and cache, wherein part of the cache holds data used by only the processor core and part of the cache uses data used by only the pixel engine.

FIG. 6 is a schematic representation of a first possible division of a cache into first and second memory subsets.

FIG. 7 is a schematic representation of a second possible division of a cache into first and second memory subsets.

FIG. 8 is a block diagram representation of a cache controller and first and second memory subsets according to one embodiment of the present invention.

FIG. 9 is a block diagram representation of first and second cache controllers and first and second memory subsets according to an alternative embodiment of the present invention.

FIG. 10 is a schematic representation of a third possible division of a cache into first and second memory subsets. FIG. 11 is a schematic representation of a fourth possible division of a cache into first and second memory subsets.

FIG. 12 is a block diagram representation of a computer system according to one embodiment of the present invention. FIG. 13 is a block diagram representation of a processor core, pixel engine, and cache memory wherein the cache holds particular types of data in particular ways of the cache.

Detailed Description of Preferred Embodiments Referring to FIG. 2, a computer system 20 includes a processor 24, chipset/memory 26, and frame buffer 28. Processor 24 includes a processor core 32, a pixel engine 34, and a cache 38. Processor core 32 performs those functions typically performed by a processor core and communicates with chipset/memory 26 through processor bus 42. Pixel engine 34 generates graphics signals to be supplied to frame buffer 28 through a bus 46. Alternatively, the graphics signals may be passed through processor bus 42. Processor core 32 may also provide graphics signals to frame buffer 28 through bus 46 and/or bus 42. Processor core 32 and pixel engine 34 are examples of computation devices.

In contrast to prior art computer systems, processor core 32 and pixel engine 34 share cache 38. Cache 38 may be used for holding merely data that does not include instructions, or may be used for holding data that does include instructions. That is, cache 38 may be, but is not required to be, a unified cache. Processor core 32, pixel engine 34, and cache 38 may all be on the same die. The outer boundary of processor 24 in FIG. 2 may represent a die. Alternatively, cache 38 and/or pixel engine 34 may be on a separate die from the die of processor core 32. The outer boundary of processor 24 in FIG. 2 may represent a substrate on which the dice are placed. Of course, processor 24 may include other components in addition to those illustrated. Merely as examples, FIG. 3 illustrates certain functional blocks that could be included in processor 24. There are, of course, other possibilities. In the embodiment of FIG. 3, processor core 32 includes a CPU core 54 and a floating point unit 56, each which may include multiple execution units. Processor 24 also includes a bus unit 58. Pixel engine 34 includes a 2D video engine 64, a 2D BLT engine 66, and a 3D pixel engine 68, as examples. A cache controller 50 controls reading from and writing to cache 38. There may be snooping and other communications between chipset/memory 26 and cache controller 50 and cache 38. Cache controller 50 may be separate from processor core 32 and pixel engine 34 or included within processor core 32 and/or pixel engine 34. There are a variety of approaches by which processor core 32 and pixel engine 34 may share cache 38. In a first approach, cache controller 50 allows writes to and reads from cache 38 without consideration as to whether the data is for processor core 32 or pixel engine 34. An advantage of this approach is that if one or the other of processor core 32 or pixel engine 34 needs a large portion of cache 38 at a particular time, that portion may be available. A disadvantage is that one of processor core 32 or pixel engine 34 will utilize so much of cache 38 that the other of processor core 32 or pixel engine 34 will not have enough of cache 38 to adequately perform. For example, pixel engine 34 may utilize so much of cache 38 that read instructions in processor core 32 will result in an excessive number of cache misses, stalling processor core 32. In a second approach, a first cache memory subset is always dedicated to data for processor core 32 and a second cache memory subset is always dedicated to data for pixel engine 34. (Note that the term "subset" does not necessarily involve dividing "sets" as in set associativity.)

Referring to FIGS. 4 and 5, in a third approach, under certain conditions, both the first and second memory subsets are dedicated to data for processor core 32, while under other conditions, the first memory subset is dedicated to data for processor core 32 and the second memory subset is dedicated to data for pixel engine 32. As an example, in FIGS. 4 and 5, the first memory subset of cache 38 includes ways 0 and 1, and the second memory subset of cache 38 includes ways 2 and 3. As is illustrated and described herein, the cache may be divided 50/50 or by some other division. In FIG. 3, pixel engine 34 is inactive and ways 0-3 of cache 38 are used to hold data for processor core 32. In FIG. 4, pixel engine 34 is active. Ways 0 and 1 of cache 38 are used to hold data for processor core 32, and ways 2 and 3 of cache 38 are used to hold data for pixel engine 34.

Under a fourth approach, the size of the first and second memory subsets dynamically changes, depending on the needs of processor core 32 and pixel engine 34. A difference between the third and fourth approaches is that in the third approach, the second memory subset is either completely dedicated to processor core 32 or completely to pixel engine 34. In the fourth approach, the size of the memory subset dedicated to pixel engine 34 may dynamically change from zero to some maximum including at least one intermediate value.

There are a variety of approaches to dividing a cache into two or more memory subsets. In FIG. 6, memory subsets 38 A and 38B of cache 38 have the same or essentially the same size. For simplicity, cache 38 is illustrated as including banks 0, 1, ..., 7; sets 0,

1, ..., 15; and ways 0, 1, 2, and 3. Banks 0, 1, ..., 7 are further divided into banks 0A, 0B, 1A, IB, ..., 7 A, and 7B, respectively. Memory subset 38A includes banks 0A, 1A, ..., 7 A, which include all sets for ways 0 and 1. Memory subset 38B includes banks 0B, IB, ..., 7B, which include all sets for ways 2 and 3. Accordingly, in FIG. 4, cache 38 is divided according to ways. A cache does not have to be divided according to ways.

Rather, the cache could be divided by banks, sets, or some other measure or dimension of the cache. In practice, cache 38 may include more than eight banks and sixteen sets. Further, a greater or lesser number of ways may be used. The structure illustrated in FIG. 4 includes word lines, bit lines, sense amplifiers, etc. The banks are not necessarily limited to holding particular bits. For example, bank 0 may hold merely a bit 0 of particular data, or may hold bits not necessarily limited to bit 0. Cache 38 does not have to include only contiguous memory cells and may be more extensive than illustrated in FIG. 6. FIG. 7 illustrates a second possible division of cache 38 into memory subsets 38A and 38B. In FIG. 7, memory subset 38A has a greater number of ways than does memory subset 38B. Memory subset 38A includes sets 0, 1, ..., 15 for a bank 0 and ways 0, 1, 2, 3, 4, and 5. Memory subset 38B includes sets 0, 1, ..., 15 for banks 0, 1, ..., bank 7 and ways 6 and 7. Memory subset 38B could include more than one bank. Subsets 38 A and

38B do not have to have the same number of banks.

FIGS. 6, 7, 8, and 9 illustrate two different approaches to providing address, data, and perhaps control signals to first and second memory subsets 38A and 38B. In a first approach, illustrated in FIGS. 7 and 8, conductors 80 carry address, data, and perhaps control signals to first and second memory subsets 38A and 38B. In a second approach, illustrated in FIGS. 6 and 9, conductors 80A carry address, data, and perhaps controls signals to memory subset 38A and conductors 80B carry address, data, and perhaps control signals to memory subset 38B. In FIG. 6, conductors 80 A and 80B may be first and second connection points, points A and B, between memory subsets 38 A and 38B and processor core 32 and pixel engine 34, respectively. In FIG. 7, conductors 80 is a connection point between memory subsets 38A and 38B and processor core 32 and pixel engine 34, respectively.

Referring to FIG. 8, a single cache controller 50 may control reading from and writing to both memory subsets 38A and 38B through conductors 80. Conductors 80 may include address, data, and perhaps control lines. The address and data lines may include, for example, word lines and bit and bit* lines.

Alternatively, referring to FIG. 9, cache controller 50 may include two different cache controllers: a cache controller 50A for processor core 32 and memory subset 38A and a cache controller 50B for pixel engine 34 and memory subset 38B. Two controllers may be able to read to and write from different memory subsets with a greater degree of independence than can only controller, but may require some redundancy for purposes of changing use of memory subset 38B from pixel engine 32 to processor core 32. FIG. 10 illustrates a third possible division of cache 38 into memory subsets 38A and 38B. In FIG. 10, memory subset 38A has a greater number of ways than does memory subset 38B. Memory subset 38A includes sets 0, 1, ..., 15 for a bank 0 and ways 0, 1, 2, 3, 4, 5, and for a bank 0A for a way 6A. Memory subset 38B includes sets 0, 1, ..., 15 for banks 4, 5, 6, and 7 and a way 6B, and for banks 0, 1, ..., 7 and a way 7. Ways

6A and 6B are each sub-ways. FIG. 10 shows that the division does not have to be in integer ways (e.g., can be 6.5 and 1.5). Cache 38 of FIG. 10 may receive address data, and perhaps control signals through conductors 80 for both memory subsets 38A and 38B, or through conductors 80A and 80B separately for memory subset 38A and 38B. In FIG. 1 1, cache 38 is similar to FIG. 10, except that memory subset 38B includes only banks 0 and 1, rather than multiple banks as in FIG. 10. Again, a non-integer split is shown.

FIG. 12 illustrates additional details of one embodiment of computer system 20. Dashed line 24 shows the boundary of a die, although cache 38 could be off the die in another embodiment. Further, portions of chipset 26 A and frame buffer RAMDAC

(digital-to-analog converter) could be on the die. System memory 26B is controlled by chipset 26A. Display 82 receives signals from frame buffer & RAMDAC 28. In the particular illustrated embodiment, pixel engine 34 includes command processing unit 84, scanline interpolator unit 86 (e.g., for color, texture, fog, specular light), Z-buffering unit 88, texturing & fogging unit 90 (e.g., MIP - mapping anisotropic filtering), shading unit 92

(e.g., Gouroud/Phong), and alpha lending & fog unit 94. Cache 38 is addressed to hold respective data of the different units. A 2D Bit & video engine 98 may also be included in pixel engine 34. Chunk frame buffer 0 and chunk frame buffer 1 receive and provide data in a ping-pong fashion from alpha blending & fog unit 94 and to 2D bit & video engine 98. FIG. 13 illustrates how different types of data may be stored in different ways of cache 38. For example, way 2 may act as a texture tile cache, while a pixel chunk in way 3 may be directly addressed by pixel engine 34. Additional Information and Embodiments

The specification does not describe or illustrate various well known components, features, and conductors, a discussion of which is not necessary to understand the invention and inclusion of which would tend to obscure the invention. Furthermore, in constructing an embodiment of the invention, there are design tradeoffs and choices, which would vary depending on the embodiment. There are a variety of ways of implementing the illustrated and unillustrated components. Embodiments in according with the present invention, including those illustrated, may include various circuits, including well known circuits, and be constructed according to various techniques, processes, and materials, including well known techniques, processes, and materials.

The borders of the boxes in the figures are for illustrative purposes and do not restrict the boundaries of the components, which may overlap. The relative size of the illustrative components does not to suggest actual relative sizes. The term "conductor" is intended to be interpreted broadly and includes devices that conduct although they also have some insulating properties. There may be intermediate components or conductors between the illustrated components and conductors.

The phrase "in one embodiment" means that the particular feature, structure, or characteristic associated with the phrase is included in at least one embodiment of the invention, and may be included in more than one embodiment of the invention. Also, the appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same one embodiment.

If the specification states a component or feature "may", "can", "could", or "might" be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic. The term "responsive" includes completely or partially responsive.

Those skilled in the art having the benefit of this disclosure will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present invention. Accordingly, it is the following claims including any amendments thereto that define the scope of the invention.

Claims

CLAIMSWhat is claimed is:

1. A processor, comprising: a processor core; a pixel engine; and a cache memory including a first memory subset coupled to the processor core to hold data for the processor core and a second memory subset coupled to the pixel engine to selectively hold data for the pixel engine.

2. The processor of claim 1, wherein the processor core, pixel engine, and cache memory are on a single die.

3. The processor of claim 1, wherein the first and second memory subset have essentially an equal number of memory locations.

4. The processor of claim 1, wherein the first and second memory subset have a substantially different number of memory locations.

5. The processor of claim 1, wherein the cache memory includes N ways, and the first memory subset includes a first portion of the N ways and the second memory subset includes a second portion of the N ways.

6. The processor of claim 1, wherein at least one way of the second memory subset is directly addressed for various pixel engine purposes.

7. The processor of claim 1 , further comprising a cache controller to control reading to and writing from the first and second memory subsets.

8. The processor of claim 7, wherein the cache controller is included in the processor core and pixel engine.

9. The processor of claim 1, further comprising a first cache controller to control reading to and writing from the first memory subset and a second cache controller to control reading to and writing from and second memory subset.

10. The processor of claim 9, wherein the first cache controller is included within the processor core and the second cache controller is included within the pixel engine.

11. The processor of claim 1, wherein sizes of the first and second memory subset changes dynamically, depending on use requirements of the processor core and pixel engine.

12. The processor of claim 1, wherein the second memory subset is either completely dedicated to holding data for the pixel engine or dedicated to holding data for the processor core.

13. The processor of claim 1 , wherein an amount of the second memory subset that is dedicated to holding data for the pixel engine may dynamically change between zero and a maximum amount.

14. The processor of claim 1, wherein an amount of the second memory subset that is dedicated to holding data for the pixel engine may dynamically change between zero and a maximum amount by at least one increment.

15. A monolithic memory apparatus comprising: a first memory subset including a first connection point at which the first memory subset can be coupled to a first computation device; and a second memory subset including a second connection point at which the second memory subset can be coupled to a second computation device.

16. The apparatus of claim 15, wherein the first computation device is a processor core and the second computation device is a pixel engine.

17. A monolithic processor comprising : a processor core; a pixel engine; and a cache memory coupled to the processor core and to the pixel engine and including, a first memory subset which is coupled only for use by the processor core, and a second memory subset which is selectively couplable for use by the processor core in a first operating mode of the monolithic processor, and for use by the pixel engine in a second operating mode of the monolithic processor.

18. The monolithic processor of claim 17, further comprising: a unified bus unit coupled to the processor core and to the pixel engine for enabling both to access a common external bus.

19. The monolithic processor of claim 17, wherein in the second operating mode of the monolithic processor: the second memory subset is further coupled for use by the processor core; there is general purpose microprocessor data stored in the first memory subset; and there is pixel engine data stored in the second memory subset.