DE19527592C2

DE19527592C2 - Cache arrangement for a processor and method for entering data into a cache memory

Info

Publication number: DE19527592C2
Application number: DE19527592A
Authority: DE
Inventors: Hans Werner Tast; Klaus Joerg Getzlaff; Udo Wille; Hans-Juergen Muenster
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1995-07-28
Filing date: 1995-07-28
Publication date: 2003-07-17
Anticipated expiration: 2015-07-29
Also published as: DE19527592A1; JPH0944400A

Description

Die Erfindung betrifft eine Cache-Anordnung für einen Prozessor, mit einem Cachespeicher für den Prozessor, mit einem zu dem Cachespeicher gehörenden Adressenverzeichnis und mit einem Pufferspeicher zur Zwischenspeicherung von Datenzeilen. Die Erfindung betrifft ferner ein Verfahren zur Eingabe von Daten in einen Cachespeicher.The invention relates to a cache arrangement for a Processor, with a cache memory for the processor, with an address directory belonging to the cache memory and with a buffer memory for the temporary storage of Rows of data. The invention further relates to a process for Entry of data into a cache memory.

Aus dem Stand der Technik sind Prozessorensysteme mit einem oder mehreren Mikroprozessoren bekannt, bei denen sogenannte Cachespeicher verwendet werden. Cachespeicher sind schnelle Datenspeicher, das heißt solche mit geringer Zugriffszeit, die im allgemeinen nahe bei einem Prozessor angeordnet sind und zur Ergänzung des Hauptspeichers des Systems dienen. In den Cachespeichern wird eine Teilmenge der Speicherinhalte des Hauptspeichers des Prozessorsystems abgelegt, die von dem Prozessor, zu dem der jeweilige Cachespeicher gehört, besonders häufig benutzt werden. Dies hat zur Folge, daß die Zugriffszeit auf Daten insgesamt verringert wird und eine bessere Effizienz des Systembus gewährleistet ist.Processor systems with one are from the prior art or several microprocessors known, in which so-called Cache memory can be used. Caches are fast Data storage, i.e. those with a short access time, the are generally located near a processor and serve to supplement the main memory of the system. In the Cache storage becomes a subset of the storage contents of the Main memory of the processor system stored by the Processor to which the respective cache memory belongs, are used particularly often. As a result, the Access time to data is reduced overall and one better efficiency of the system bus is guaranteed.

Es ist auch aus dem Stand der Technik bekannt, zwei oder mehrere Ebenen von Cachespeichern vorzusehen, also z. B. einen sogenannten L1 Cache und einen L2 Cache. Zur Adressierung der Daten ist es auch bereits bekannt, die Daten zeilenweise abzuspeichern. Eine Datenzeile kann dann z. B. 128 Byte Daten beinhalten. Wird eine Datenzeile von dem Hauptspeicher des Prozessorsystems angefordert, so werden die zu der angeforderten Datenzeile gehörigen Daten im allgemeinen in mehreren "Paketen" von dem Hauptspeicher zu dem L2 Cache und von dem L2 Cache zu dem L1 Cache übertragen, z. B. in 8 "Paketen" von jeweils 16 Byte.It is also known from the prior art, two or to provide multiple levels of cache memory, e.g. B. one so-called L1 cache and an L2 cache. To address the Data is already known, the data line by line save. A data line can then e.g. B. 128 bytes of data include. If a data line from the main memory of the Processor system is requested, so they become the requested data line related data generally in several "packets" from main memory to the L2 cache and transferred from the L2 cache to the L1 cache, e.g. B. in 8 "Packets" of 16 bytes each.

Der Prozessor, der die Datenzeile angefordert hat, benötigt typischer Weise zunächst nur einige wenige Bytes der insgesamt 128 Bytes der Datenzeile. Dabei ist die statistische Wahrscheinlichkeit hoch, daß die zunächst benötigten Byte- Positionen innerhalb der Datenzeile benachbart sind. Dasjenige sechzehn Byte beinhaltende "Paket" der betreffenden Datenzeile, welches die benötigten Byte-Positionen enthält, wird deshalb zuerst übertragen, damit der Prozessor nach möglichst geringer Wartezeit weiterarbeiten kann. Mit den darauffolgenden Datenübertragungen über den Systembus werden dann die restlichen "Pakete" übertragen.The processor that requested the data line needs typically only a few bytes of the total 128 bytes of the data line. The statistical one High probability that the byte Positions within the data line are adjacent. the one sixteen byte "packet" of the relevant data line, which contains the required byte positions is therefore transmitted first so that the processor after as low as possible Waiting time can continue to work. With the following ones Data transfers via the system bus are then the remaining "packets" transmitted.

Bei der Übertragung der "Pakete" von dem L2 Cache zu dem L1 Cache werden die nach dem ersten "Paket" nachfolgenden weiteren sieben "Pakete" zunächst in einem Pufferspeicher zwischengespeichert. Das ist insbesondere dann erforderlich, wenn sofort nach der Übertragung des ersten Pakets der Prozessor die Daten dieses Pakets verarbeitet und wenn es sich bei dem L1 Cache um ein sogenanntes Single-Port-Array handelt, was bedeutet, daß nur ein Systemteilnehmer zu einem Zeitpunkt auf den L1 Cache zugreifen kann. Solange der Prozessor mit seinem L1 Cache kommuniziert, ist deshalb der L1 Cache für die Abspeicherung der weiteren sieben "Pakete" von je 16 Byte Daten blockiert. Diese Daten verbleiben solange in dem Pufferspeicher bis der L1 Cache für die Speicherung dieser "Pakete" zugänglich ist.When transferring the "packets" from the L2 cache to the L1 The cache that follows after the first "package" will be seven "packets" initially in a buffer memory cached. This is particularly necessary if immediately after the transmission of the first package the Processor processes the data of this packet and if it is the L1 cache is a so-called single-port array, which means that only one system participant at a time can access the L1 cache. As long as the processor with communicates its L1 cache, is therefore the L1 cache for the Storage of the other seven "packets" of 16 bytes of data each blocked. This data remains in the buffer memory for as long until the L1 cache is accessible for storing these "packets" is.

Das erste "Paket", das die zunächst von dem Prozessor benötigten Byte-Positionen beinhaltet, kann dabei innerhalb eines Systemzyklus sowohl in dem L1 Cache abgespeichert werden als auch aus dem L1 Cache zu dem Prozessor ausgegeben werden. Das ist insbesondere dann möglich, wenn es sich bei dem L1 Cache um einen sogenannten "Write-Through" Cache handelt, bei dem die Eingabedaten an dem Port des L1 Cache am Ende desjenigen Systemzyklus gleich zur Verfügung stehen, in dem das erste "Paket" in dem L1 Cache abgespeichert worden ist.The first "package" that was initially created by the processor contains the required byte positions, can be within of a system cycle are both stored in the L1 cache as well as from the L1 cache to the processor. This is particularly possible if the L1 Cache is a so-called "write-through" cache the input data at the port of the L1 cache at the end of the system cycle in which the first "packet" in which L1 cache has been saved.

Beim Betrieb eines solchen Prozessorsystems kann sich folgender Nachteil ergeben: Nachdem der Prozessor die zunächst angeforderten Byte-Positionen mit dem ersten "Paket" erhalten hat, kommt es vor, daß der Prozessor unmittelbar darauffolgend weitere Byte-Positionen derselben Datenzeile, die sich in einem oder mehreren der weiteren sieben "Pakete" befinden, benötigt. Diese weiteren "Pakete" befinden sich jedoch zu diesem Zeitpunkt noch nicht in dem L1 Cache sondern noch in dem Pufferspeicher. Dementsprechend ist auch die Adresse der betreffenden Pakete der Datenzeile ausserdem in dem Adressenverzeichnis der Steuerung des Pufferspeichers registriert. Diese Registrierung bedeutet, daß die Datenzeile zwar schon im Cache Directory als gültig eingetragen ist, das betreffende Paket jedoch noch nicht im L1 Cache gespeichert wurde.The following can occur when operating such a processor system Disadvantage arise: After the processor first Get requested byte positions with the first "packet" , it happens that the processor immediately follows further byte positions of the same data line, which are in one or more of the other seven "packages", needed. However, these other "packages" are located too at this time not in the L1 cache but still in the buffer memory. Accordingly, the address of the relevant packets of the data line also in the Address directory for the control of the buffer memory registered. This registration means that the data line is already entered as valid in the cache directory, the package in question is not yet stored in the L1 cache has been.

Die Registrierung der Adresse bzw. der Adressen der Pakete in dem Adressenverzeichnis des L1 Cache geschieht nach dem Stand der Technik erst dann, wenn die betreffende Datenzeile vollständig in dem L1 Cache abgespeichert ist. Dadurch wird die Konsistenz der Daten in dem Prozessorsystem gewährleistet.Registration of the address or the addresses of the packages in the address directory of the L1 cache happens according to the status the technology only when the relevant data line is completely stored in the L1 cache. This will ensures the consistency of the data in the processor system.

Die Anforderung von Daten, die in den weiteren sieben "Paketen" enthalten sind, kann deshalb von dem L1 Cache solange nicht befriedigt werden als nicht das entsprechende Paket in dem L1 Cache abgespeichert ist. Eine entsprechende Anforderung des Prozessors vor der Abspeicherung führt daher zu einem sogenannten Line Fetch Buffer (LFB) Konflikt.The request for data in the other seven "Packets" are included in the L1 cache as long as they are not satisfied than the corresponding one Package in which the L1 cache is stored. A corresponding Requesting the processor before saving therefore results to a so-called line fetch buffer (LFB) conflict.

In diesem Fall muß der Prozessor mit seinem Zugriff auf den L1 Cache so lange warten, bis das entsprechende Paket abgespeichert ist. Dadurch gehen mehrere Systemzyklen für die Abarbeitung von Instruktionen durch den Prozessor verloren. Bei einer sogenannten "pipelined" Architektur wird darüber hinaus die Verarbeitung von Instruktionen, die sich bereits in der "Pipeline" befinden, unterbrochen oder verzögert, so daß sich hierdurch weitere Einbußen ergeben. Ebenso sind zum Wiederaufbau der "Pipeline" einige Systemzyklen erforderlich.In this case, the processor must have access to the L1 Wait until the corresponding packet cache is saved. This goes through several system cycles for the Processing of instructions lost by the processor. In a so-called "pipelined" architecture is about In addition, the processing of instructions that are already in the "pipeline" are interrupted or delayed so that this results in further losses. Likewise, for Rebuilding the "pipeline" required some system cycles.

Aus US Patent Nr. 4 654 778 ist ein "Fast-Path" bekannt, der zur Umgehung von Zwischenebenen in der Speicherhierarchie dient. Der "Fast-Path" wird in dem System benutzt, um die Spitzenbelastungen auf dem normalen Datenweg durch die Speicherhierarchien zu verringern.A "fast path" is known from US Pat. No. 4,654,778 to bypass intermediate levels in the storage hierarchy serves. The "Fast Path" is used in the system to control the Peak loads on the normal data path through the Reduce storage hierarchies.

US Patent Nr. 4 445 174 zeigt ein Multiprozessor-System mit einem Hauptspeicher, L1 und L2 Cachespeichern, die zu je einem der Prozessoren gehören, und einen "Shared Cache", der keinem der Prozessoren "privat" zugeordnet ist, sondern auf den alle Prozessoren gleichermaßen zugreifen können. Der sogenannte "Shared Cache" dient dabei zur weiteren Reduzierung von Zugriffszeiten.US Patent No. 4,445,174 shows a multiprocessor system main memory, L1 and L2 caches, each one of processors, and a "shared cache" that none the processors are assigned "privately", but to all Processors can access equally. The so-called "Shared cache" is used to further reduce Access times.

Aus US Patent Nr. 5 214 766 ist ein Multiprozessor-System bekannt, das ebenfalls einen "Shared Cache" zur Optimierung der Zugriffszeit aufweist. Zusätzlich weist dieses Multiprozessor-System eine Tabelle auf, in der die Art und Weise der Benutzung einer Datenzeile durch einen der Prozessoren registriert wird. Das soll ebenfalls zur Optimierung von Zugriffszeiten dienen.From US Patent No. 5,214,766 is a multiprocessor system known that also a "shared cache" for optimization which has access time. In addition, this points Multiprocessor system on a table in which the type and How a line of data is used by one of the Processors is registered. That is also supposed to Serve to optimize access times.

In dem oben genannten Stand der Technik ist das oben beschriebene Problem des Zugriffs auf "Pakete", die zwar bereits in dem Pufferspeicher, aber noch nicht in dem L1 Cache vorhanden sind, weder erkannt noch gelöst worden.In the above prior art, the problem described above is the Access to "packets" that are already in the buffer, but not yet in the L1 cache, neither recognized still resolved.

US Patent 5,367,660 zeigt einen verbesserten Cachespeicher zur Verwendung in einem Mikroprozessorsystem. Ein Pufferspeicher speichert Tag und Offset Felder und die dazu gehörige Datenzeile. Sogenannte Valid Bits sind mit unterschiedlichen Teilen der Datenzeile verknüpft. Beschränkt auf den lesenden Zugriff auf Prozessor-Instruktionen erlaubt diese Vorrichtung, dass, während des Füllens der Datenzeile mit Prozessor- Instruktionen, Prozessor-Instruktionen vom Pufferspeicher gelesen werden können, bevor die Datenzeile vollständig durch Hauptspeicherinhalte gefüllt ist.US Patent 5,367,660 shows an improved cache for Use in a microprocessor system. A buffer storage saves tag and offset fields and the associated ones Data line. So-called valid bits are different Linked parts of the data row. Restricted to the reader This device allows access to processor instructions, that, while filling the data line with processor Instructions, processor instructions from the buffer memory can be read before the data line is completely through Main memory content is filled.

Der Erfindung liegt daher die Aufgabe zugrunde, die Verwendung eines Pufferspeichers noch effizienter zu gestalten.The invention is therefore based on the object, the use to make a buffer storage even more efficient.

Die der Erfindung zugrundeliegende Aufgabe wird durch die in den unabhängigen Patentansprüchen aufgeführten Merkmale gelöst.The object underlying the invention is achieved by the in the features listed in the independent claims solved.

Die mit der Erfindung erzielten Vorteile bestehen insbesondere darin, daß die Wartezeit des Prozessors auf die vollständige Speicherung einer Datenzeile in dem Cache Speicher entfällt, wenn der Prozessor auf Daten innerhalb dieser Datenzeile zugreifen möchte, während die betreffende Datenzeile noch nicht vollständig in dem Cachespeicher gespeichert ist, sondern sich einige der "Pakete" der Datenzeile noch in dem Pufferspeicher des Prozessorsystems befinden. Das hat weiter den Vorteil, daß in dieser Situation auch eine Unterbrechung der "Pipeline" nicht erforderlich ist. The advantages achieved with the invention are in particular in that the processor waits for the full Storage of a line of data in the cache memory is eliminated, if the processor is on data within that data line want to access while the data row in question is still is not completely stored in the cache memory, but some of the "packets" of the data line are still in the Processor system buffer memory. That has continued the advantage that in this situation there is also an interruption the "pipeline" is not required.

Die Erfindung ist insbesondere auch dann vorteilhaft, wenn Daten von dem Prozessor in den L1 Cache gespeichert werden sollen. Für eine solche Speicherung ist es nach dem Stand der Technik erforderlich, daß die entsprechende Datenzeile in dem L1 Cache bereits vorhanden ist, so daß die betroffenen Byte- Positionen der Datenzeile in dem L1 Cache überschrieben werden können. Ist diese Datenzeile in dem L1 Cache noch nicht vollständig vorhanden, so ist es nunmehr nach der Lehre der Erfindung möglich, die noch in dem Pufferspeicher befindlichen Daten für die Durchführung dieser Speicheroperation heranzuziehen. Hierzu wird dasjenige "Paket" in dem Pufferspeicher, welches die betroffenen Byte-Positionen enthält, aus dem Pufferspeicher in einen Multiplexer ausgegeben. Der Multiplexer erhält außerdem die veränderten, abzuspeichernden Daten von dem Prozessor sowie ein Steuersignal.The invention is also particularly advantageous if Data from the processor is stored in the L1 cache should. For such storage it is according to the state of the Technique required that the corresponding data line in the L1 cache already exists so that the affected byte Positions of the data line in the L1 cache are overwritten can. This line of data is not yet in the L1 cache completely present, so it is now according to the teaching of Invention possible that are still in the buffer memory Data for performing this save operation consulted. For this, the "package" in the Buffer memory, which contains the affected byte positions contains, from the buffer memory into a multiplexer output. The multiplexer also receives the changed, data to be stored by the processor as well Control signal.

Durch das Steuersignal wird der Multiplexer so geschaltet, daß die zu überschreibenden Byte-Positionen, die von dem Pufferspeicher ausgegebenen und diesen entsprechenden Byte- Positionen ersetzen, während die restlichen Byte-Positionen des aus dem Pufferspeicher ausgegebenen "Paket" unverändert bleiben. Als Ergebnis steht an dem Ausgang des Multiplexers ein "Paket", das bereits die geänderten Daten beinhaltet. Dieses bereits geänderte "Paket" wird dann - wie sonst üblich - in den L1 Cache abgespeichert. Im Vergleich zum Stand der Technik wird dadurch vermieden, daß zunächst das unveränderte "Paket", wie es in dem Pufferspeicher zwischengespeichert wurde, unverändert in den L1 Cache gespeichert werden muß, damit darauffolgend die zu ändernden Byte-Positionen in der betreffenden Datenzeile des L1 Cache überschrieben werden können.The multiplexer is switched by the control signal so that the byte positions to be overwritten by the Output buffer memory and this corresponding byte Replace positions while the remaining byte positions of the "Package" output from the buffer memory unchanged stay. The result is at the output of the multiplexer "Package" that already contains the changed data. This already changed "package" is then - as usual - in the L1 cache saved. In comparison to the prior art thereby avoided that initially the unchanged "package", such as it has been buffered in the buffer memory, unchanged must be stored in the L1 cache so that the byte positions to be changed in the relevant data line of the L1 cache can be overwritten.

Die Erfindung ist ferner für das Holen von Operanden für den Prozessor vorteilhaft. Falls beispielsweise zu einer Instruktion zwei Operanden gehören, kann anhand der Adressen der Operanden geprüft werden, ob etwa der zweite Operand Daten aus einem anderen der "Pakete" derselben Datenzeile wie der erste Operand erfordert. Das ist insbesondere bei einem sogenannten Carry der Adressen der Fall. Falls ein solcher Carry auftritt, können die Daten des zweite Operanden zusammen mit dem entsprechenden "Paket" aus dem Pufferspeicher ausgegeben werden und über den L1 Cache in den Prozessor eingegeben werden. Dabei ist es im Vergleich zum Stand der Technik nicht erforderlich, auf die vollständige Abspeicherung der entsprechenden Datenzeile zu warten, bis der Prozessor auch auf den zweiten Operanden zugreifen kann. Hierfür ist die Verwendung eines sogenannten write-through L1 Cachespeichers erforderlich, damit innerhalb eines Systemzyklus sowohl die Abspeicherung des "Pakets" als auch die Ausgabe der in dem "Paket" beinhalteten Daten über den einzigen Port des L1 Caches an den Prozessor möglich ist.The invention is also for fetching operands for the Processor advantageous. If, for example, to a Instruction two operands can belong based on the addresses of the operands are checked whether the second operand is data from another of the "packets" on the same row of data as the first operand required. This is particularly the case with one so-called carry of addresses the case. If such Carry occurs, the data of the second operand can be combined with the corresponding "package" from the buffer memory are output and via the L1 cache in the processor can be entered. It is compared to the state of the Technology not required on full storage the appropriate line of data to wait for the processor can also access the second operand. This is what Use of a so-called write-through L1 cache memory required so that both the Saving the "package" as well as the output of the in the "Packet" contained data on the only port of the L1 Caches to the processor is possible.

Ein Ausführungsbeispiel der Erfindung ist in der Zeichnung dargestellt und wird im folgenden näher beschrieben. Es zeigenAn embodiment of the invention is in the drawing shown and is described in more detail below. Show it

Fig. 1 die zeilenweise Organisation des Adressraums; Fig. 1, the line-wise organization of the address space;

Fig. 2 die Adressierung einzelner Byte-Positionen innerhalb einer Datenzeile; FIG. 2 shows the addressing of individual byte positions within a data row;

Fig. 3 ein vereinfachtes Blockschaltbild eines erfindungsgemäßen Prozessorsystems; Fig. 3 is a simplified block diagram of a processor system according to the invention;

Fig. 4 ein Signaldiagramm für die Durchführung einer Speicheroperation seitens des Prozessors in den L1 Cache; Fig. 4 is a signal diagram for carrying out a memory operation by the processor to the L1 cache;

Fig. 5 ein Signaldiagramm für die Eingabe von zwei Operanden, die sich in unterschiedlichen "Paketen" befinden. Fig. 5 is a signal diagram for the input of two operands, which are in different "packets".

In Fig. 1 ist der Adressierraum 1 des Prozessorsystems dargestellt. Der Adressierraum 1 besteht aus den Datenzeilen Z0, Z1, Z2, Z3, . . ., Zn - 1. Jede der Datenzeilen Z hat eine Breite von 128 Byte. Ferner ist jede der Datenzeilen Z logisch in 8.16-Tupel aufgeteilt, also jeweils in die 16 byte breiten Tupel S0, S1, S2, S3, S4, S5, S6 und S7. In Fig. 1 of the addressing space 1 is illustrated the processor system. The addressing space 1 consists of the data lines Z0, Z1, Z2, Z3,. , ., Zn - 1. Each of the data lines Z has a width of 128 bytes. Furthermore, each of the data lines Z is logically divided into 8.16 tuples, that is to say into the 16 byte wide tuples S0, S1, S2, S3, S4, S5, S6 and S7.

Fig. 2 zeigt die Adressierbarkeit eines einzelnen Bytes innerhalb einer Datenzeile Zi. Die Variablen X, Y und Z symbolisieren 3 Adressbits zur Spezifizierung eines der 16- Tupeln innerhalb der betreffenden Datenzeile Zi. Die 3 Bit- Positionen X, Y und Z reichen für die Adressierung der 8 verschiedenen 16-Tupel innerhalb der Datenzeile Zi aus. Da jedes der 16-Tupel 16 Byte breit ist, werden zur bytegenauen Adressierung innerhalb eines 16-Tupels weitere vier Adressbits 2 benötigt. Fig. 2, the addressability shows a single byte within a data line Zi. The variables X, Y and Z represent 3 address bits for specifying one of the 16-tuples within the respective data line Zi. The 3 bit positions X, Y and Z are sufficient for the Addressing of the 8 different 16 tuples within the data line Zi. Since each of the 16 tuples is 16 bytes wide, a further four address bits 2 are required for byte-precise addressing within a 16 tuple.

Fig. 3 zeigt ein Blockschaltbild eines erfindungsgemäßen Prozessorsystems. Der Mikroprozessor PU des Systems ist selbst nicht in Fig. 3 dargestellt. Zu dem Prozessor PU gehört ein L1 Cachespeicher 3 (L1 CACHE). Der Port des L1 Cachespeichers 3 ist mit einem Interface 4 verbunden. Bei dem Interface 4 handelt es sich um einen sogenannten FETCH ALIGNER. Dieser dient zur geordneten Ausgabe von Daten aus dem L1 Cachespeicher 3 auf den FETCH-Bus 5. Der FETCH-Bus 5 ist ein Datenbus und verbindet das Interface 4 und daher mittelbar auch den L1 Cachespeicher 3 mit dem Prozessor PU. Fig. 3 is a block diagram showing a processor system according to the invention. The microprocessor PU of the system itself is not shown in FIG. 3. The processor PU has an L1 cache 3 (L1 CACHE). The port of the L1 cache memory 3 is connected to an interface 4 . Interface 4 is a so-called FETCH ALIGNER. This is used for the orderly output of data from the L1 cache memory 3 to the FETCH bus 5 . The FETCH bus 5 is a data bus and connects the interface 4 and therefore indirectly the L1 cache memory 3 to the processor PU.

Über den FETCH-Bus 5 gelangen Ausgabedaten aus dem L1 Cachespeicher 3 zu dem Prozessor PU, um in den Prozessor PU eingegeben zu werden. Zu dem L1 Cachespeicher 3 gehört ein Adressenverzeichnis 6, ein sogenanntes Cachedirectory. Das Adressenverzeichnis 6 besteht aus Registern zur Speicherung von Adressen. Ist in einem der Register des Adressenverzeichnisses 6 die Adresse Zi einer Datenzeile registriert, so bedeutet dies für den Prozessor PU, daß die Datenzeile der Adresse Zi in dem L1 Cachespeicher 3 vorhanden ist. Dies wird im weiteren noch genauer ausgeführt.Output data from the L1 cache 3 reach the processor PU via the FETCH bus 5 in order to be input into the processor PU. An address directory 6 , a so-called cache directory, belongs to the L1 cache memory 3 . The address directory 6 consists of registers for storing addresses. If the address Zi of a data line is registered in one of the registers in the address directory 6 , this means for the processor PU that the data line of the address Zi is present in the L1 cache memory 3 . This will be explained in more detail below.

Das Adressenverzeichnis 6 ist über eine Signalleitung 7 mit dem L1 Cachespeicher 3 verbunden. Sobald eine Datenzeile Zi in den L1 Cachespeicher 3 gespeichert werden soll, wird dies dem Adressverzeichnis 6 über die Signalleitung 7 angezeigt, so daß die entsprechende Registrierung in dem Adressenverzeichnis 6 der Adresse i der abzuspeichernden Datenzeile Zi vorgenommen werden kann. Erfindungsgemäß wird die Registrierung der Adresse i dieser Datenzeile in dem Adressverzeichnis 6 bereits vor der vollständigen Speicherung der Datenzeilen Zi in dem L1 Cachespeicher 3 vorgenommen.The address directory 6 is connected to the L1 cache memory 3 via a signal line 7 . As soon as a data line Zi is to be stored in the L1 cache memory 3 , this is indicated to the address directory 6 via the signal line 7 , so that the corresponding registration in the address directory 6 of the address i of the data line Zi to be stored can be carried out. According to the invention, the address i of this data line is registered in the address directory 6 before the data lines Zi are completely stored in the L1 cache memory 3 .

Das Prozessorsystem weist ferner einen L2 Cachespeicher 8 (L2 CACHE) auf. Der L2 Cachespeicher 8 ist mit dem Hauptspeicher des Prozessorsystems über den Systembus verbunden. Sowohl der Hauptspeicher als auch der Systembus sind nicht in der Fig. 3 dargestellt. Der L2 Cachespeicher 8 ist über einen Bus 9 mit einem Pufferspeicher 10 verbunden. Der Bus 9 ist in dem Ausführungsbeispiel der Fig. 3 sechzehn Byte breit, was der Breite eines der 16-Tupel einer Datenzeile entspricht (vgl. Fig. 1 und Fig. 2). Soll eine Datenzeile Zi von dem L2 Cachespeicher 8 zu dem L1 Cachespeicher 3 übertragen werden, so werden über dem Bus 9 die acht 16-Tupel der Datenzeile Zi in 8 "Paketen" von jeweils 16 Bytes sequenziell übertragen.The processor system also has an L2 cache 8 (L2 CACHE). The L2 cache 8 is connected to the main memory of the processor system via the system bus. Both the main memory and the system bus are not shown in FIG. 3. The L2 cache memory 8 is connected to a buffer memory 10 via a bus 9 . The bus 9 is wide in the embodiment of Fig. 3 sixteen bytes, which is the width of the 16-tuples of a data row corresponds to (see. Fig. 1 and Fig. 2). If a data line Zi is to be transferred from the L2 cache 8 to the L1 cache 3 , the eight 16 tuples of the data line Zi are transmitted sequentially in 8 "packets" of 16 bytes each via the bus 9 .

Beispielsweise kann die Ausgabe der Datenzeile Zi aus dem L2 Cachespeicher 8 deshalb erfolgen, weil der Prozessor PU Daten benötigt, die in dem 16-Tupel S3 der Datenzeile Zi beinhaltet sind. Dann wird als erstes "Paket" das 16-Tupel S3 aus dem L2 Cachespeicher 8 ausgegeben und über den Bus 9 übertragen. Dieses erste "Paket" wird jedoch nicht in dem Pufferspeicher 10, dem sogenannten Line-Fetch-Buffer zwischengespeichert, sondern unmittelbar über den Bus 11, den Multiplexer 12 und den Bus 13 zu dem Port des L1 Cachspeichers 3 übertragen. Der L1 Cachespeicher 3 ist ein sogenannter write-through Cachespeicher. Das bedeutet, daß Daten, die an dem "DataIn" des einzigen Port des L1 Cachespeichers zur Abspeicherung angelegt werden, innerhalb desselben Zyklus schon am "DataOut" des einzigen Ports des L1 Cachespeichers zur Verfügung stehen.For example, the data line Zi can be output from the L2 cache 8 because the processor PU requires data which are contained in the 16 tuple S3 of the data line Zi. Then the first "packet" is the 16 tuple S3 from the L2 cache 8 and is transmitted via the bus 9 . However, this first "packet" is not temporarily stored in the buffer memory 10 , the so-called line fetch buffer, but is transmitted directly via the bus 11 , the multiplexer 12 and the bus 13 to the port of the L1 cache memory 3 . The L1 cache 3 is a so-called write-through cache. This means that data that is created for storage at the "DataIn" of the only port of the L1 cache memory is already available at the "DataOut" of the only port of the L1 cache memory within the same cycle.

Deshalb kann das erste "Paket", das heißt, das 16-Tupel S3, das die von dem Prozessor PU zuerst benötigten Daten enthält, innerhalb eines Systemzyklus sowohl in dem L1 Cachespeicher 3 gespeichert als auch über das Interface 4 und dem Systembus 5 zu dem Prozessor PU übertragen werden.Therefore, the first "packet", that is, the 16-tuple S3, which contains the data first required by the processor PU, can be stored in the L1 cache memory 3 as well as via the interface 4 and the system bus 5 within a system cycle Processor PU are transmitted.

Da der L1 Cachespeicher 3 ein sogenanntes Single-Port-Array ist (nur 1 Port zum Zugriff auf den L1 Cachespeicher 3 steht zur Verfügung), können die weiteren "Pakete", die den übrigen 16- Tupeln der Datenzeile Zi entsprechen, so lange nicht in den L1 Cachespeicher 3 gespeichert werden, als der Prozessor PU mit dem L1 Cachespeicher 3 kommuniziert. Nach dem erste "Paket", das das 16-Tupel S3 der Datenzeile Zi enthält, werden die weiteren 16-Tupel S4, S5, S6, S7, S0, S1 und S2 in dieser Reihenfolge über den Bus 9 "paketweise" übertragen und in dem Pufferspeicher 10 zwischengespeichert.Since the L1 cache memory 3 is a so-called single-port array (only 1 port is available for accessing the L1 cache memory 3 ), the further "packets" that correspond to the remaining 16 tuples of the data line Zi cannot do so long are stored in the L1 cache 3 when the processor PU communicates with the L1 cache 3 . After the first "packet", which contains the 16 tuple S3 of the data line Zi, the further 16 tuples S4, S5, S6, S7, S0, S1 and S2 are transmitted "in packets" in this order via the bus 9 and in the buffer memory 10 buffered.

Sobald der Port des L1 Cachespeichers 3 nicht mehr belegt ist, werden dann die in dem Pufferspeicher 10 verbliebenen 16-Tupel nach und nach über den Bus 11, den Multiplexer 12 und dem Bus 13 in den L1 Cachespeicher 3 übertragen. Nach Übertragung des letzten "Pakets", das die Daten des 16-Tupels S2 enthält, ist die gesamte Datenzeile Zi in dem L1 Cachespeicher 3 vorhanden. Im Unterschied zum Stand der Technik wird jedoch nicht erst jetzt, das heißt nach der vollständigen Speicherung der Datenzeile Zi in dem L1 Cachespeicher 3 die Registrierung der Adresse i dieser Datenzeile in dem Adressenverzeichnis 6 vorgenommen. Diese Registrierung wird erfindungsgemäß bereits dann vorgenommen, wenn die betreffende Datenzeile noch nicht vollständig in dem L1 Cachespeicher 3 gespeichert ist. In dem Ausführungsbeispiel der Fig. 3 wird die Registrierung der Adresse i bereits dann durchgeführt, wenn das erste der "Pakete" - in dem hier betrachteten Beispiel das 16-Tupel S3 - in dem L1 Cachespeicher 3 gespeichert ist, während die restlichen 16-Tupel S4, S5, S6, S7, S0, S1 und S2 noch von dem L2 Cachespeicher 8 über den Bus 9 übertragen werden bzw. in dem Pufferspeicher 10 zwischengespeichert sind.As soon as the port of the L1 cache memory 3 is no longer occupied, the 16 tuples remaining in the buffer memory 10 are then gradually transferred to the L1 cache memory 3 via the bus 11 , the multiplexer 12 and the bus 13 . After transmission of the last "packet", which contains the data of the 16 tuple S2, the entire data line Zi is present in the L1 cache memory 3 . In contrast to the prior art, however, the address i of this data line is not registered in the address directory 6 until now, that is to say after the data line Zi has been completely stored in the L1 cache memory 3 . According to the invention, this registration is already carried out when the data line in question has not yet been completely stored in the L1 cache memory 3 . In the exemplary embodiment in FIG. 3, the address i is already registered when the first of the "packets" - in the example considered here, the 16-tuple S3 - is stored in the L1 cache memory 3 , while the remaining 16-tuples S4, S5, S6, S7, S0, S1 and S2 are still transmitted from the L2 cache 8 via the bus 9 or are buffered in the buffer memory 10 .

Das Prozessorsystem hat weiterhin eine Logik 14, die über eine Signalleitung 15 mit dem Adressenverzeichnis 6 und über eine Signalleitung 16 mit dem Pufferspeicher 10 verbunden ist. Die Logik 14 hat folgende Funktion: Durch die Abspeicherung der Adresse i in dem Adressenverzeichnis 6 bereits vor der vollständigen Speicherung der Datenzeile Zi in dem L1 Cachespeicher 3 kann es vorkommen, daß der Prozessor PU auf Daten in dem L1 Cachespeicher 3 zugreifen möchte, die dort noch gar nicht vorhanden sind, obwohl die Adresse i der Datenzeile Zi, zu der diese Daten gehören, bereits in dem Adressenverzeichnis 6 registriert ist. Würde die Logik 14 in dieser Situation nicht einschreiten, so würde der Prozessor PU von dem L1 Cachespeicher 3 bei einem entsprechenden Zugriff auf diese Daten, keine aktuelle Kopie sondern Daten erhalten, die zufälligerweise auf den entsprechenden Byte-Positionen in dem Speicher-Array des L1 Cachespeichers 3 vorhanden sind. Dies liegt daran, daß vom Prozessor PU aus gesehen die Datenzeile Zi dann in dem L1 Cachespeicher 3 vollständig gespeichert ist, wenn deren Adresse i in dem Adressenverzeichnis 6 registriert ist. Die Logik 14 überwacht über in der Fig. 3 nicht gezeigte Signalleitungen, die die Logik 14 mit dem Prozessor PU verbinden, die Zugriffsoperationen des Prozessor PU auf den L1 Cachespeicher 3.The processor system also has logic 14 which is connected to the address directory 6 via a signal line 15 and to the buffer memory 10 via a signal line 16 . The logic 14 has the following function: by storing the address i in the address directory 6 even before the data line Zi has been completely stored in the L1 cache memory 3 , it may happen that the processor PU wants to access data in the L1 cache memory 3 which there are not yet available, although the address i of the data line Zi to which this data belongs is already registered in the address directory 6 . If the logic 14 did not intervene in this situation, the processor PU would not receive a current copy from the L1 cache memory 3 if this data was appropriately accessed, but rather data that happens to be in the corresponding byte positions in the memory array of the L1 Cache 3 are present. This is because, viewed from the processor PU, the data line Zi is then completely stored in the L1 cache memory 3 when its address i is registered in the address directory 6 . The logic 14 monitors the access operations of the processor PU on the L1 cache memory 3 via signal lines (not shown in FIG. 3) that connect the logic 14 to the processor PU.

Über die Signalleitungen 15 und 16 erhält die Logik 14 die Information, ob eine Adresse i in dem Adressenverzeichnis 6 zu einer Datenzeile Zi gehört, die noch nicht vollständig in dem L1 Cachespeicher 3 gespeichert ist. Das ist dann der Fall, wenn noch 16-Tupel der Datenzeile Zi in dem Pufferspeicher 10 zwischengespeichert sind.Via the signal lines 15 and 16 , the logic 14 receives the information as to whether an address i in the address directory 6 belongs to a data line Zi which is not yet completely stored in the L1 cache memory 3 . This is the case when 16 tuples of the data line Zi are still buffered in the buffer memory 10 .

Bei einer Zugriffsanforderung des Prozessors PU auf Daten, die in einem 16-Tupel beinhaltet sind, das noch in dem Pufferspeicher 10 zwischengespeichert ist, veranlaßt die Logik 14 über die Signalleitung 16 den Pufferspeicher 10 zur Ausgabe dieses 16-Tupels über den Bus 11, den Multiplexer 12 und den Bus 13 in den L1 Cachespeicher 3. Da es sich bei dem L1 Cachespeicher 3 um einen write-through Cachespeicher handelt, können die Daten wiederum, wie oben erläutert, im wesentlichen gleichzeitig zu dem Prozessor PU übertragen werden. Für den Prozessor PU stellt sich das Systemverhalten deshalb so dar, als ob das betreffenden 16-Tupel, was ursprünglich noch in dem Pufferspeicher 10 zwischengespeichert war, tatsächlich bereits zu Beginn der Zugriffsanforderung an den L1 Cachespeicher 3 dort vorhanden war. When the processor PU requests access to data contained in a 16-tuple that is still buffered in the buffer memory 10 , the logic 14 via the signal line 16 causes the buffer memory 10 to output this 16-tuple via the bus 11 Multiplexer 12 and the bus 13 in the L1 cache 3 . Since the L1 cache memory 3 is a write-through cache memory, the data can in turn, as explained above, be transmitted to the processor PU essentially simultaneously. For the processor PU, the system behavior therefore appears as if the relevant 16 tuple, which was originally still buffered in the buffer memory 10 , was actually already there at the beginning of the access request to the L1 cache memory 3 .

Der Prozessor PU ist über einen Store-Bus 17 mit Ausrichtungsmitteln 18, einem sogenannten Store-Aligner verbunden. Über einen Bus 19, der ebenfalls 16 Byte breit ist, sind die Ausrichtungsmittel 18 mit einem zweiten Eingang des Multiplexers 12 verbunden. Der Multiplexer 12 hat darüber hinaus einen Steuereingang 20, der 16 Steuerleitungen breit ist. Über den Steuereingang 20 ist eine byte-genaue Steuerung des Multiplexers 12 möglich.The processor PU is connected via a store bus 17 to alignment means 18 , a so-called store aligner. The alignment means 18 are connected to a second input of the multiplexer 12 via a bus 19 , which is also 16 bytes wide. The multiplexer 12 also has a control input 20 which is 16 control lines wide. Byte-precise control of the multiplexer 12 is possible via the control input 20 .

Fig. 4 zeigt schematisch die Funktionalität des Multiplexers 12. Der erste Eingang des Multiplexers 12 ist mit dem Bus 11 verbunden, während der zweite Eingang mit dem Bus 19 verbunden ist (vgl. Fig. 3). Der Ausgang des Multiplexers 12 ist mit dem Bus 13 verbunden. In der Fig. 4 sind jeweils nur die Byte- Positionen B1, B2, B3 und B4 der jeweiligen Busse dargestellt. Jede der Byte-Positionen besteht aus 9 Signalleitungen (8 Bit und ein Parity Bit), die ebenfalls nicht in Fig. 4 dargestellt sind. Zu jeder Byte-Position gehört ein Schalter in dem Multiplexer 12: Zu der Byte-Position B1, der Schalter M1, zu B2 gehört M2, zu B3 gehört M3 und zu B4 gehört M4. Die weiteren Schalter M5 bis M16, die zu den Byte-Positionen B5 bis B16 sind ebenfalls nicht in Fig. 4 gezeigt. Über den Steuereingang 20 kann für jeden der Schalter M1 bis M16 festgelegt werden, ob eine Bit-Position des Bus 11 oder des Bus 19 mit dem Bus 13 verbunden werden soll. Über den Store-Bus 17, der 16 Byte breit ist, werden Daten von dem Prozessor PU übertragen, die in den L1 Cachespeicher 3 gespeichert werden sollen. Hierzu werden die abzuspeichernden Daten in dem Ausrichtungsmittel 18, dem sogenannten Store Aligner, auf die entsprechenden Byte- Positionen zur Übertragung über den Bus 19 ausgerichtet. Sogenannte Store Aligner sind an sich aus dem Stand der Technik bekannt. Fig. 4 schematically illustrates the functionality of the multiplexer 12.. The first input of multiplexer 12 is connected to bus 11 , while the second input is connected to bus 19 (cf. FIG. 3). The output of the multiplexer 12 is connected to the bus 13 . Only the byte positions B1, B2, B3 and B4 of the respective buses are shown in FIG. 4. Each of the byte positions consists of 9 signal lines (8 bits and a parity bit), which are also not shown in FIG. 4. A switch in the multiplexer 12 belongs to each byte position: the byte position B1, the switch M1, B2 belongs to M2, B3 belongs to M3 and B4 belongs to M4. The further switches M5 to M16 that go to the byte positions B5 to B16 are also not shown in FIG. 4. The control input 20 can be used to determine for each of the switches M1 to M16 whether a bit position of the bus 11 or the bus 19 is to be connected to the bus 13 . Data is transmitted from the processor PU via the store bus 17 , which is 16 bytes wide, and is to be stored in the L1 cache memory 3 . For this purpose, the data to be stored are aligned in the alignment means 18 , the so-called store aligner, with the corresponding byte positions for transmission via the bus 19 . So-called store aligners are known per se from the prior art.

Bevor der Prozessor PU Daten zur Abspeicherung auf den Store- Bus 17 ausgeben kann, muß zunächst festgestellt werden, ob die entsprechende Datenzeile Zi, zu der die Ausgabedaten gehören, in dem L1 Cachespeicher 3 vorhanden ist. Denn nur dann können die entsprechenden Byte-Positionen in der Speicherzeile Zi durch die von dem Prozessor PU ausgegebenen Daten überschrieben werden.Before the processor PU can output data for storage on the store bus 17 , it must first be ascertained whether the corresponding data line Zi, to which the output data belong, is present in the L1 cache memory 3 . Only then can the corresponding byte positions in the memory line Zi be overwritten by the data output by the processor PU.

Hierzu prüft der Prozessor PU, ob die Adresse i der Datenzeile Zi in dem Adressenverzeichnis 6 registriert ist. Ist das der Fall, so gibt der Prozessor PU die entsprechenden Daten über den Store-Bus 17 aus. Allerdings kann die Situation auftreten, daß zwar die Adresse i bereits in dem Adressenverzeichnis registriert ist, die Datenzeile Zi aber noch nicht vollständig in dem L1 Cachespeicher 3 gespeichert ist. Insbesondere kann es vorkommen, daß die Daten, die von dem Prozessor PU auf den Store-Bus 17 zur Abspeicherung ausgegeben werden zu einem 16- Tupeln gehören, das noch in dem Pufferspeicher 10 zwischengespeichert ist. Ein solcher Zugriff des Prozessors PU auf den L1 Cachespeicher 3 aufgrund einer Speicherungsanforderung von Daten, wird ebenfalls von der Logik 14 überwacht.For this purpose, the processor PU checks whether the address i of the data line Zi is registered in the address directory 6 . If this is the case, the processor PU outputs the corresponding data via the store bus 17 . However, the situation can arise that although the address i is already registered in the address directory, the data line Zi is not yet completely stored in the L1 cache memory 3 . In particular, it can happen that the data that are output by the processor PU to the store bus 17 for storage belong to a 16-tuple that is still temporarily stored in the buffer memory 10 . Such an access by the processor PU to the L1 cache memory 3 due to a storage request for data is also monitored by the logic 14 .

Tritt der Fall auf, daß abzuspeichernde Daten auf den Bus 17 ausgegeben werden, die zu einem 16-Tupel, beispielsweise dem Tupel S1 gehören, das noch in dem Pufferspeicher 10 zwischengespeichert ist, so veranlaßt die Logik 14 über die Signalleitung 16 den Pufferspeicher 10 zur Ausgabe eines "Pakets", das das Tupel S1 beinhaltet. Nach dem Stand der Technik wäre es in jedem Fall erforderlich gewesen, vor der Speicherung der Daten die Datenzeile Zi, die den Adressbereich der abzuspeichernden Daten beinhaltet, zunächst vollständig in den L1 Cachespeicher 3 zu speichern.If the case occurs that data to be stored are output on the bus 17 which belong to a 16 tuple, for example the tuple S1, which is still buffered in the buffer memory 10 , the logic 14 causes the buffer memory 10 via the signal line 16 Output of a "packet" containing tuple S1. According to the prior art, it would have been necessary in any case to first completely store the data line Zi, which contains the address area of the data to be stored, in the L1 cache memory 3 before storing the data.

Dagegen ist es nach der Erfindung möglich, das 16-Tupel, das den Adressbereich der abzuspeichernden Daten beinhaltet, aus dem Pufferspeicher 10 über den Bus 11 in den Multiplexer 12 einzugeben, wie auch die abzuspeichernden Daten über den Bus 19 in den Multiplexer 12 eingegeben werden. In dem Multiplexer 12 werden nur diejenigen Byte-Positionen des aus dem Pufferspeicher 10 ausgegebenen 16-Tupels mit den entsprechenden Byte-Positionen des Bus 13 verbunden, die unverändert bleiben sollen. Diejenigen Byte-Positionen hingegen, die die abzuspeichernden Daten über den Bus 19 übertragen, ersetzen die entsprechenden Byte-Positionen des Bus 11. Dies geschieht über die Schalter M1 bis M16, die entsprechend über den Steuereingang 20 geschaltet werden.In contrast, according to the invention, it is possible to enter the 16 tuple, which contains the address area of the data to be stored, from the buffer memory 10 via the bus 11 into the multiplexer 12 , as well as the data to be stored via the bus 19 into the multiplexer 12 , In the multiplexer 12 , only those byte positions of the 16-tuple output from the buffer memory 10 are connected to the corresponding byte positions of the bus 13 which are to remain unchanged. On the other hand, those byte positions that transmit the data to be stored via bus 19 replace the corresponding byte positions of bus 11 . This is done via switches M1 to M16, which are switched accordingly via control input 20 .

Sollen beispielsweise die Byte-Positionen B2 und B3 in einem 16-Tupel überschrieben werden, so verbinden die Schalter M2 und M3 die Byte-Positionen B2 und B3 des Bus 19 mit den Byte- Positionen B2 und B3 des Bus 13, während die restlichen Byte- Positionen des Bus 13, das heißt die Byte-Positionen B1 und B4 bis B16 über die Schalter M1 und M4 bis M16 mit den Byte- Positionen B1 und B4 bis B16 des Bus 11 verbunden sind. Das hat zur Folge, daß das von dem Pufferspeicher 10 über den Bus 11 ausgegebene 16-Tupel in dem Multiplexer 12 mit den neuen, abzuspeichernden Daten aktualisiert wird und an dem Ausgang des Multiplexers, so daß auf dem Bus 13 bereits das aktualisierte 16-Tupel zur Verfügung steht. Es erübrigt sich dadurch, das 16- Tupel zunächst in dem L1 Cachespeicher 3 abzuspeichern, um es gleich danach an den entsprechenden Byte-Positionen mit den aktualisierten Daten zu überschreiben. Das aktualisierte 16- Tupel wird über den Bus 13 zu dem L1 Cachespeicher 3 übertragen und dort abgespeichert.For example, if byte positions B2 and B3 are to be overwritten in a 16 tuple, switches M2 and M3 connect byte positions B2 and B3 of bus 19 with byte positions B2 and B3 of bus 13 , while the remaining bytes - Positions of the bus 13 , that is, the byte positions B1 and B4 to B16 are connected to the byte positions B1 and B4 to B16 of the bus 11 via the switches M1 and M4 to M16. As a result, the 16 tuple output from the buffer memory 10 via the bus 11 is updated in the multiplexer 12 with the new data to be stored and at the output of the multiplexer, so that the updated 16 tuple is already on the bus 13 is available. It is therefore unnecessary to first store the 16 tuple in the L1 cache 3 in order to overwrite it with the updated data at the corresponding byte positions immediately thereafter. The updated 16 tuple is transmitted via bus 13 to L1 cache memory 3 and stored there.

Beim Betrieb des Prozessors PU kommt es vor, daß zur Bearbeitung einer Instruktion Operanden erforderlich sind, die sich über mehr als ein 16-Tupel erstrecken. Eine solche Situation ist bereits aufgrund der Adressen der Operanden erkennbar (vgl. Fig. 1 und Fig. 2). Im allgemeinen sind die Operanden, die zur Ausführung einer Instruktion benötigt werden, unmittelbar benachbart abgespeichert. Im weiteren wird davon ausgegangen, daß der Prozessor PU beispielsweise Operanden benötigt, die in zwei aufeinanderfolgenden 16-Tupeln einer Datenzeile Zi beinhaltet sind, also etwa in den 16-Tupeln S6 und S7. Wenn die Adresse i der Datenzeile Zi, die die benötigten 16-Tupeln S6 und S7 beinhaltet, nicht in dem Adressverzeichnis i registriert ist, wird die Datenzeile Zi von dem L2 Cachespeicher 8 in der Reihenfolge der Tupel S6, S7, S0, S1, S2, S3, S4 und S5 über den Bus 9 ausgegeben. Wie oben beschrieben, wird das erste 16-Tupel, das als "Paket" über den Bus 9 ausgegeben wird, unmittelbar zu dem Prozessor PU weitergeleitet, während die darauffolgenden 16-Tupel S7 und S0 bis S5 zunächst in dem Pufferspeicher 10 zwischengespeichert werden.When the processor PU is operating, operands that extend over more than a 16 tuple are required to process an instruction. Such a situation is already due to the addresses of the operands recognizable (see. Fig. 1 and Fig. 2). In general, the operands that are required to execute an instruction are stored immediately adjacent. It is further assumed that the processor PU requires, for example, operands that are contained in two successive 16-tuples of a data line Zi, that is to say in the 16-tuples S6 and S7. If the address i of the data line Zi, which contains the required 16 tuples S6 and S7, is not registered in the address directory i, the data line Zi is stored by the L2 cache 8 in the order of the tuples S6, S7, S0, S1, S2 , S3, S4 and S5 are output via bus 9 . As described above, the first 16-tuple, which is output as a "packet" via the bus 9 , is directly forwarded to the processor PU, while the subsequent 16-tuples S7 and S0 to S5 are first buffered in the buffer memory 10 .

Für den Prozessor PU ist jedoch bereits erkennbar, daß die Daten, die in dem 16-Tupel S6 beinhaltet sind, für die Ausführung der Instruktionen nicht ausreichend sind, sondern noch weitere Daten benötigt werden, die in dem darauffolgenden 16-Tupel S7 vorhanden sind. Nach Eingabe des 16-Tupels S6 in den Prozessor PU wird dieser daher erneut auf den L1 Cachespeicher 3 zugreifen, um auch das 16-Tupel S7 zu erhalten. Jedoch ist das 16-Tupel S7 noch in dem Pufferspeicher 10 zwischengespeichert, obwohl die Adresse i der Datenzeile Zi bereits in dem Adressenverzeichnis 6 registriert ist. Aufgrund dessen veranlaßt die Logik 14 wiederum den Pufferspeicher 10 zur Ausgabe des 16-Tupels S7, so daß sich aufgrund der write- through Funktion des L1 Cachespeichers 3 die Situation für den Prozessor PU so darstellt, als sei das 16-Tupel S7 bereits in dem L1 Cachespeicher 3 zu Beginn der Zugriffsanforderung auf das 16-Tupel S7 in dem L1 Cachespeicher 3 vorhanden gewesen.However, the processor PU can already see that the data contained in the 16-tuple S6 are not sufficient for the execution of the instructions, but further data are required which are present in the subsequent 16-tuple S7. After the 16 tuple S6 has been entered into the processor PU, it will therefore again access the L1 cache memory 3 in order to also obtain the 16 tuple S7. However, the 16-tuple S7 is still temporarily stored in the buffer memory 10 , although the address i of the data line Zi is already registered in the address directory 6 . Because of this, the logic 14 in turn causes the buffer memory 10 to output the 16-tuple S7, so that due to the write-through function of the L1 cache memory 3, the situation for the processor PU is as if the 16-tuple S7 were already in the L1 cache memory 3 have been present at the beginning of the access request on the 16-tuple S7 in the L1 cache. 3

Fig. 5 zeigt ein Signaldiagramm für den Fall, daß der Prozessor PU über den Store-Bus 17 Daten in dem L1 Cachespeicher 3 speichern möchte. Es wird angenommen, daß bei diesem Zugriff auf den L1 Cachespeicher 3 die Adresse i der entsprechenden Datenzeile Zi bereits in dem Adressenverzeichnis 6 registriert ist, jedoch das 16-Tupel zu dem die abzuspeichernden Daten gehören noch in dem Pufferspeicher 10 zwischengespeichert ist. Das Signaldiagramm der Fig. 5 zeigt die Systemzyklen Cycle 1, Cycle 2, Cycle 3, Cycle 4 und Cycle 5 des "pipelined" Prozessors PU. FIG. 5 shows a signal diagram in the event that the processor PU wants to store data in the L1 cache memory 3 via the store bus 17 . It is assumed that with this access to the L1 cache memory 3, the address i of the corresponding data line Zi is already registered in the address directory 6 , but the 16 tuple to which the data to be stored still belongs is buffered in the buffer memory 10 . The signal diagram of FIG. 5 shows the system cycles Cycle 1 Cycle 2 Cycle 3 Cycle 4 and Cycle 5 of the "pipelined" processor PU.

In der Stufe 0 (Stage 0) der Pipeline wird die nächste Instruktion von dem Prozessor von dessen Programmspeicher geholt (Instruction Fetch). In dem Beispiel der Fig. 5 holt der Prozessor in dem Cycle 1 eine Speicherinstruktion ST, durch die Daten aus dem Prozessor PU ausgegeben und in dem L1 Cachespeicher 3 gespeichert werden soll. In dem Cycle 2 wird die Instruktion ST decodiert (OP Decode, Stage 1). Außerdem holt der Prozessor in dem Cycle 2 bereits die nächste Instruktion NSI (= next sequential Instruction) von dem Programmspeicher. Das entspricht der "pipelined" Architektur des Prozessor PU.In stage 0 of the pipeline, the processor fetches the next instruction from its processor (instruction fetch). In the example of FIG. 5, the processor fetches a storage instruction ST in the cycle 1, by means of which data is to be output from the processor PU and stored in the L1 cache memory 3 . In cycle 2, the instruction ST is decoded (OP decode, stage 1 ). In addition, the processor in cycle 2 already fetches the next instruction NSI (= next sequential instruction) from the program memory. This corresponds to the "pipelined" architecture of the processor PU.

In dem Zyklus 3 greift der Prozessor PU auf das Adressenverzeichnis 6 zu (L1 Cache Dir Access). Dies geschiet in der Stufe 2 (Stage 2) der Pipeline. Da die Adresse i der Datenzeile Zi bereits in dem Adressenverzeichnis 6 registriert ist, stellt sich für den Prozessor PU die Situation so dar, daß auch die angeforderten Daten, die in einem der 16-Tupel dieser Datenzeile Zi, zum Beispiel in dem Tupel S5 vorhanden sind, bereits in dem L1 Cachespeicher 3 gespeichert sind. Die Logik 14 bemerkt jedoch, daß dies nicht der Fall ist, sondern daß das 16-Tupel S5 noch in dem Pufferspeicher 10 zwischengespeichert ist. Dies geschieht in dem Cycle 3 (Compare LFB Addr with L1 CACHE Address Register). Das 16-Tupel S5 wird dann aus dem Pufferspeicher 10 ausgegeben und in dem Multiplexer 12 mit den zu speichernden Daten aktualisiert. Dies geschieht in dem Cycle 4 (ST with LF). Als besonders vorteilhaft erweist sich dabei, daß die Pipeline nicht gestört wurde, da auch die Instruktion NSI in Cycle 4 bereits die Pipelinestufe 2 erreicht.In cycle 3 , the processor PU accesses the address directory 6 (L1 cache dir access). This geschiet in stage 2 (Stage 2) of the pipeline. Since the address i of the data line Zi is already registered in the address directory 6 , the situation for the processor PU is such that the requested data which is present in one of the 16 tuples of this data line Zi, for example in the tuple S5 are already stored in the L1 cache 3 . Logic 14 , however, notes that this is not the case, but that 16-tuple S5 is still temporarily stored in buffer memory 10 . This is done in cycle 3 (Compare LFB Addr with L1 CACHE Address Register). The 16 tuple S5 is then output from the buffer memory 10 and updated in the multiplexer 12 with the data to be stored. This happens in cycle 4 (ST with LF). It proves to be particularly advantageous that the pipeline has not been disturbed, since the instruction NSI in cycle 4 has already reached pipeline stage 2 .

Fig. 6 zeigt ein der Fig. 5 entsprechendes Signaldiagramm für den Fall, daß der Prozessor PU Operanden zur Durchführung einer Instruktion anfordert, die in benachbarten 16-Tupeln gespeichert sind. Die entsprechende Instruktion zum Holen der Daten ist eine sogenannte Fetch-Instruktion F. Im Cycle 3 entdeckt der Prozessor PU, daß der zweite Operand in dem nächstfolgenden 16-Tupel wie der erste Operand enthalten ist (16 Bytes Crossing Condition). FIG. 6 shows a signal diagram corresponding to FIG. 5 in the event that the processor PU requests operands to carry out an instruction which are stored in adjacent 16 tuples. The corresponding instruction for fetching the data is a so-called fetch instruction F. In cycle 3, the processor PU discovers that the second operand is contained in the next 16 tuple like the first operand (16 bytes crossing condition).

Das hat zur Folge, daß die nächste Instruktion NSI in dem darauffolgenden Cycle 4 auf der Pipelinestufe 1 verbleibt, da in dem Cycle 4 zunächst eine weitere Fetch-Instruktion F zum Holen des nächsten 16-Tupels durchgeführt werden muß. Bereits in dem Zyklus 3 wird der erste Operand in den Prozessor PU eingegeben (L1 Cache Access, Read). Außerdem wird in dem Cycle 3 bereits von der Logik 14 festgestellt, daß das Tupel, in dem der zweite Operand beinhaltet ist, noch in dem Pufferspeicher 10 zwischengespeichert ist (Compare LFB Addr with L1 CACHE Address Register + 16). Das entsprechende Tupel, das den zweiten Operanden enthält, wird dann aus dem Pufferspeicher 10 ausgegeben und aufgrund der write-through Funktion des L1 Cachespeichers 3 unmittelbar danach in den Prozessor PU eingegeben. Auch hier erweist sich wiederum als vorteilhaft, daß die Pipeline nicht unterbrochen werden mußte.As a result, the next instruction NSI in the subsequent cycle 4 remains on pipeline stage 1 , since in cycle 4 a further fetch instruction F must first be carried out in order to fetch the next 16 tuple. Already in cycle 3 , the first operand is entered into the processor PU (L1 cache access, read). In addition, the logic 14 already determines in cycle 3 that the tuple in which the second operand is contained is still temporarily stored in the buffer memory 10 (Compare LFB Addr with L1 CACHE Address Register + 16). The corresponding tuple, which contains the second operand, is then output from the buffer memory 10 and, due to the write-through function of the L1 cache memory 3, is entered into the processor PU immediately thereafter. Again, it proves advantageous that the pipeline did not have to be interrupted.

Der erste und der zweite Operand können auch als Teile eines Operanden aufgefaßt werden, wobei sich dann dieser Operand über zwei der n-Tupel derselben Datenzeile erstreckt.The first and second operands can also be parts of one Operands are understood, whereby this operand then over extends two of the n-tuples of the same row of data.

Claims

1. cache arrangement for a processor with a cache memory ( 3 ) for the processor, with an address directory ( 6 ) belonging to the cache memory and with a buffer memory ( 10 ) for intermediate storage of data lines,
wherein the registration of the address in the address directory (6) of the cache memory is performed (3) prior to the complete storage of the relevant data line in the cache memory (3);
marked by
a buffer cache control logic ( 14 ) for controlling the buffer memory ( 10 ), the output of data of the data line from the buffer memory ( 10 ) and the storage of this data in the cache memory ( 3 ) regardless of any further cache miss and parallel to the activity the processor causes; and
after this data has been output from the buffer memory ( 10 ) to the cache memory ( 3 ), this data from the cache memory ( 3 ) is made available to the requesting processor essentially simultaneously.

2. Cache arrangement for a processor according to claim 1, characterized in that each of the data lines is logically divided into n-tuples and the buffer memory ( 10 ) is organized in units of n-tuples.

3. Cache arrangement for a processor according to one of claims 1 to 2, characterized in that the processor system has a multiplexer ( 12 ) which is connected between the buffer memory ( 10 ) and the cache memory ( 3 ), a first input ( 11 ) of the multiplexer ( 12 ) with the buffer memory ( 10 ) and a second input of the multiplexer ( 12 ) with a memory bus ( 19 ), via which the processor transmits the data to be stored in the cache memory ( 3 ), and one Control input ( 20 ) of the multiplexer is connected to the buffer cache control logic.

4. cache arrangement for a processor according to one of claims 1 to 3, characterized in that data from the buffer memory ( 10 ) to the cache memory ( 3 ) and from the processor to the cache memory ( 3 ) are transmitted in units of an n-tuple.

5. cache arrangement for a processor according to one of claims 1 to 4, characterized in that it is in the cache memory ( 3 ) is a write-through cache.

6. The method for input / output of data in the cache device according to claim 1, wherein
the address of a data line which contains a data element requested by the processor is registered in the address directory ( 6 ) of the cache memory ( 3 ) before the relevant data line has been completely stored in the cache memory ( 3 ),
characterized in that
the output of a data element of a data line whose address is effected in the address directory ( 6 ) of the cache memory ( 3 ) in response to a corresponding cache read request by the processor in that
in a first step it is analyzed whether the data element of the cache read request is still in the buffer memory ( 10 ), and further, if this is the case, is brought about by
in a second step, at least one part of the data line comprising the data element is transferred ( 3 ) from the buffer memory ( 10 ) to the cache memory and at the same time the data element of the cache read request is transferred to the processor, and
in the case of a cache storage request by the processor with which a data element of a data line whose address is registered in the address directory ( 6 ) of the cache memory ( 3 ) is to be stored in the cache memory ( 3 ), a multiplexer ( 12 ) controls this, that a part of the data line addressed by the processor's cache request, which is still stored in the buffer memory ( 10 ), is transferred to the cache memory ( 3 ) and at the same time is replaced to the extent mentioned by the data elements transferred by the processor.

7. The method according to claim 6, characterized in that the data line divided into logical n-tuples and stored in the buffer memory ( 10 ) is transmitted parallel to the processor activity in the cache memory ( 3 ).

8. The method according to any one of claims 6 to 7, characterized in that if it is determined in the first step that the address of the data line of the requested data element is not yet registered in the address directory ( 6 ), that tuple that the data element requested by the processor contains, is transmitted as the first n-tuple and is stored directly in the cache memory ( 3 ) without buffering in the buffer memory ( 10 ) and the requested data element is made available directly to the processor.