DE10137457A1

DE10137457A1 - Polynomial filter algorithm, using a finite impulse response filter (FIR) for use in hardware encoders with polynomial division uses wider bit width words to increase processing speed

Info

Publication number: DE10137457A1
Application number: DE2001137457
Authority: DE
Inventors: Wolfram Drescher
Original assignee: Systemonic AG
Current assignee: NXP BV
Priority date: 2001-08-02
Filing date: 2001-08-02
Publication date: 2003-02-20
Anticipated expiration: 2021-08-03
Also published as: DE10137457B4

Abstract

Method for polynomial calculation using 1-bit coefficients, whereby the calculation is undertaken in the multiplexor accumulator (MAC) of a processor with a computer mechanism that is divided into slices with data paths assigned to each slice. The method uses data words with wider bit width to increase the processing speed of the processor.

Description

Die Erfindung betrifft ein Verfahren zur Polynomberechnung mit 1-Bit-Koffizienten, wobei die Berechnung in einem Multiziplier- Accumulator (MAC) eines Prozessors mit einem Rechenwerk, welches in Slices fraktioniert ist, mit darin implementierten Datenpfaden, vorgenommen wird. The invention relates to a method for polynomial calculation 1-bit coefficients, the calculation in a multiziplier Accumulator (MAC) of a processor with an arithmetic unit, which is fractionated into slices, with implemented in them Data paths.

Die Polynomberechnungen mit 1-Bit-Koffizienten werden vorzugsweise bei der Realisierung der Algorithmen von Finite-Impulse- Response-Filtern (FIR) angewandt. The polynomial calculations with 1-bit coefficients will be preferably when implementing the algorithms of finite impulses Response filters (FIR) applied.

Solch ein Filter-Algorithmus findet am häufigsten in Hardware- Encodern mit polynomer Division Verwendung. Dieser Anwendungsfall ist beim Stand der Technik am meisten bekannt. Der Algorithmus erzeugt einen systematischen Code durch Hinzufügen von n-k Paritäts-Prüfsymbolen zur Folge der Datensymbole. Such a filter algorithm is most commonly found in hardware Encoders with polynomeric division use. This The best known application is in the prior art. The Algorithm generates systematic code by adding n-k parity check symbols for the sequence of data symbols.

Die polynome Schreibweise des Code-Wortes c(x) ist daher:

c(x) = p(x) + x^n-kd(x)

wobei die Datensymbole d0, d1, . . .dk-1 als Koeffizienten eines Polynoms angesehen werden. Die Paritätssymbole werden gegenüber dem Datenwort um n-k Stellenwerte verschoben. The polynomial spelling of the code word c (x) is therefore:

c (x) = p (x) + x ^nk d (x)

where the data symbols d0, d1,. , .dk-1 are regarded as coefficients of a polynomial. The parity symbols are shifted by nk places compared to the data word.

p(x) ist dadurch vorberechnet, indem ein generiertes Polynom

das Code-Wort c(x) so dividiert, dass kein Rest bleibt (Rest-Klasse 0):

p (x) is precalculated by using a generated polynomial

divides the code word c (x) so that no remainder remains (remainder class 0):

Die Polynomberechnungen nach dem Stand der Technik erfolgen in den Prozessoren, welche dafür Datenpfade bestimmter Wortbreite bereithalten und die als Multiziplierer-Accumulator (MAC) in Slices konfiguriert sind. Es enthalten diese Slices mit den implementierten Datenpfaden logische Schieberegister (Shifter), Addierer und Multiplizierer, wobei letztere entweder in Integer Betriebsart oder einer Betriebsart mit abtrennbaren Übertrag arbeiten. The polynomial calculations according to the prior art are carried out in the processors, which have data paths of certain word width have ready and the as a multiplier accumulator (MAC) in Slices are configured. It contains these slices with the implemented data paths logical shift registers (shifters), Adders and multipliers, the latter either in integers Operating mode or an operating mode with detachable carry work.

Die Betriebsart mit abtrennbarem Übertrag wird beim Stand der Technik bei den in den Datenpfaden enthaltenen Multiplizerern bevorzugt angewendet, um die Berechnungsvorgänge zu beschleunigen und Einsparungen bei der Hardware zu erreichen. Dies bedeutet, dass bei der in den Datenpfaden eingestellten Betriebsart mit abtrennbarem Übertrag der Multiplizierer und Addierer im Rechenweg die anfallenden Übertragsinformationen der einzelnen Datenpfade nicht ausgewertet werden müssen. The operating mode with detachable carry is the status of Technology for the multipliers contained in the data paths preferably applied to the calculations accelerate and achieve savings in hardware. This means that the one set in the data paths Operating mode with detachable transfer of multipliers and Adders in the calculation path the carry information of the individual data paths need not be evaluated.

Sollen zur schnellen Polynomberechnung Datenworte mit großen Bitbreiten verarbeitet werden, muss bei der Verwendung solcher Slices, in denen die Datenpfade in bestimmten festen Verarbeitungsbreiten konfiguriert sind, eine in sequenzielle Rechenschritte aufgeteilte Polynomberechnung vorgenommen werden. Dieses erweist sich als unbedingt notwendig, wenn die Größe der zu verarbeitenden Bitbreite des Datenwortes die Verarbeitungsbreite der Slices übersteigt. For fast polynomial calculation, data words with large Bit widths must be processed when using such Slices in which the data paths are fixed in certain Processing widths are configured, one in sequential Calculation steps divided polynomial calculation can be made. This proves to be absolutely necessary if the size of the the bit width of the data word to be processed Processing width of the slices exceeds.

Diese parallel in unabhängigen Slices organisierte, jeweils innerhalb der Slices durchgeführte sequentielle Teilverarbeitung der Datenworte, verlangt vom Prozessor viel Steuerleistung beim Verwalten und Zuordnen der je Slice einzeln verarbeiteten und nachfolgend zusammengefügten Wortteile ab. These were organized in parallel in independent slices, each Sequential performed within the slices Part processing of the data words requires a lot of control power from the processor when managing and assigning the slice processed individually and parts of the word merged below.

Dies erweist sich als beim Stand der Technik vorherrschender grundsätzlicher Nachteil, welcher für die Ausführung der Polynomberechnung im Prozessor mittels Koeffizienten mit geringer Bit-Breite, z. B. 1-Bit-Koeffizienten, schwerwiegend ist. Daraus resultiert, dass die Verarbeitungsgeschwindigkeit eingeschränkt und begrenzt, der Aufwand an Hard- und Software groß ist. This proves to be more prevalent in the prior art fundamental disadvantage, which for the execution of the Polynomial calculation in the processor using coefficients with lower Bit width, e.g. B. 1-bit coefficients is serious. from that results in the processing speed being limited and limited, the amount of hardware and software is great.

Der Erfindung liegt nunmehr die Aufgabe zugrunde, bei der Polynomberechnung mittels 1-Bit-Koeffizienten von Datenworten mit großer Bit-Breite die Verarbeitungsgeschwindigkeitkeit im Prozessor zu erhöhen und den dabei notwendigen Aufwand an Hard- und Software zu vermindern. The invention is based on the object in which Polynomial calculation using 1-bit coefficients of data words with a large bit width the processing speed in To increase the processor and the necessary hardware effort and reduce software.

Die verfahrensmäßige Lösung dieser Aufgabenstellung sieht vor, dass die Polynomberechnung zusammen mit den zugehörigen 1-Bit Koeffizienten in slice-übergreifender Datenwortbreite parallel in den Datenpfaden erfolgt. Diese Berechnung wird innerhalb einer ersten Verarbeitungsstufe, in welcher eine 1-Bit-Verschiebung des gesamten Datenwortes in Richtung höherwertige Bits des Datenwortes ausgeführt wird, eingeleitet. Dem schließt sich eine zweite Verarbeitungsstufe an, welche eine bitweise Multiplikation des Datenwortes mit dem vorliegenden 1-Bit- Koeffizienten und nachfolgend die Akkumulation des Produkts zum vorangespeicherten Wert im Accumulator umfasst. The procedural solution to this task provides that the polynomial calculation together with the associated 1 bit Coefficients in cross-slice data word width in parallel done in the data paths. This calculation is done within a first processing stage, in which a 1-bit shift of the entire data word in the direction of higher values Bits of the data word is executed. That closes a second processing stage, which is a bitwise Multiplication of the data word by the existing 1-bit Coefficients and subsequently the accumulation of the product to includes the previously stored value in the accumulator.

Bei dieser Lösung wird deutlich, dass, abweichend vom Stand der Technik, die Datenworte durch slice-übergreifende und somit eine gleichzeitige, parallele Verarbeitung in den MAC repräsentierenden Datenpfaden vorgenommen wird. With this solution it becomes clear that, deviating from the status of the Technology, the data words through cross-slice and thus simultaneous, parallel processing in the MAC representing data paths is made.

Eine Erweiterung der verfahrensmäßigen Lösung der Aufgabenstellung sieht vor, dass die Datenpfade, welche jeweils zu einem Slice gehören, wahlweise in einer Betriebsart mit abtrennbarem Übertrag, ohne Auswertung von Überträgen des Multiplizierers/Addierers, oder in einer Integer-Betriebsart mit Übertragsauswertung des Multiplizierers/Addierers konfiguriert werden. An extension of the procedural solution to the Task provides that the data paths, which each to belong to a slice, optionally in an operating mode with detachable transfer, without evaluation of transfers of the Multipliers / Adders, or in an integer mode with carry evaluation of the multiplier / adder can be configured.

Hierbei führt diese Erweiterung zu einer Programmierbarkeit der Datenpfade. Somit kann die Hardware optimal ausgenutzt und an die zu lösende Berechnungsaufgabe angepasst werden. This extension leads to programmability the data paths. Thus, the hardware can be used optimally and be adapted to the calculation task to be solved.

Eine vorteilhafte verfahrensmäßige Lösung dieser Aufgabenstellung sieht vor, dass die in den Datenpfaden vorhandene schaltbare 1-Bit-Schiebe-Funktion durch zusätzliche slice- übergreifende schaltbare Zwischenverbindungen erweitert werden. Von den Ausgängen der Slices, die durch den zughörigen jeweils höherwertigsten Datenpfad realisiert werden, zu den Eingängen der jeweils vereinbarten niederwertigsten Datenpfade der zumindest mittelbar benachbarten höherwertigen Slices werden diese schaltbaren Zusatzverbindungen eingeführt. An advantageous procedural solution to this Task provides that the existing in the data paths switchable 1-bit shift function by additional slice comprehensive switchable interconnections can be expanded. From the outputs of the slices, by the associated one data path of the highest order to the inputs the respectively agreed least significant data paths of the at least indirectly neighboring higher-value slices introduced these switchable additional connections.

Die erfindungsgemäße Lösung zielt darauf ab, dass die jeweils innerhalb der Slices vorhandene 1-Bit-Schiebefunktion zwischen benachbarten Datenpfaden durch zusätzliche schaltbare Verbindungen zwischen den Slices in Form von Multiplexern zur Gewährleistung der slice-übergreifenden 1-Bit-Schiebefunktion von den niederwertigen Slices zu den zumindest mittelbar benachbarten höherwertigen Slices erweitert wird. The solution of the invention aims to ensure that each 1-bit shift function within the slices between neighboring data paths through additional switchable Connections between the slices in the form of multiplexers for Ensuring the cross-slice 1-bit shift function from the least significant slices to the at least indirectly neighboring higher value slices is expanded.

Die Ausnutzung der innerhalb der Slices vorhandenen 1-Bit- Schiebefunktion trägt zu der angestrebten Hardware-Aufwandsminimierung und außerdem zur Zyklenminimierung in der Software bei. The utilization of the 1-bit available within the slices Sliding function contributes to the desired Hardware effort minimization and also for cycle minimization in the software at.

Eine Variante der verfahrensmäßigen Lösung der Aufgabenstellung sieht vor, dass die zweistufige Polynomberechnung mit 1-Bit- Koffizienten in den Datenpfaden eines Slices damit begonnen wird, dass die erste Verarbeitungsstufe mit einer Grundstellung des ersten und zweiten Multiplexers korrespondiert. Es wird hierbei jeweils ein erster Tor1MUX1- und Tor1MUX2-Eingang des ersten sowie eines zweiten Multiplexers durchgeschalten. Damit wird einerseits der Ausgangswert des Akkumulators an den ersten Eingang des Multiplizierers und anderseits über die Zwischenverbindung der Multiplizierer-Ausgangswert an den ersten Eingang des Addierers eines zumindest mittelbar benachbarten niederwertigsten Datenpfades eines höherwertigen Slices m angelegt. Es wird weiterhin hierbei jeweils ein erster Tor2MUX1- und Tor2MUX2-Eingang des ersten sowie des zweiten Multiplexers ebenfalls durchgeschalten, so daß am zweiten Eingang des Multiplizierers der arithmetische Wert EINS anliegt. A variant of the procedural solution to the task provides that the two-stage polynomial calculation with 1-bit Coefficients in the data paths of a slice started with it is that the first processing level with a basic position of the first and second multiplexers corresponds. It will a first Tor1MUX1 and Tor1MUX2 input each first and a second multiplexer switched through. In order to on the one hand the output value of the accumulator becomes the first Input of the multiplier and on the other hand via the Interconnect the multiplier output to the first input of the adder one at least indirectly neighboring least significant data path of a more significant one Slices m created. It will always be a first one Tor2MUX1 and Tor2MUX2 input of the first and the second Multiplexers also switched through, so that on the second Input of the multiplier the arithmetic value ONE is applied.

Außerdem wird mit dem durchgeschaltenen ersten Tor2MUX2-Eingang des zweiten Multiplexers am zweiten Eingang des Addieres eine arithmetische NULL angelegt. Es wird weiterhin realisiert, dass die zweiten Tor1MUX1- und Tor1MUX2-Eingänge sowie zweiten Tor2MUX1 und Tor2MUX2-Eingänge des ersten und zweiten Multiplexers antivalent gesperrt sind. In addition, the first Tor2MUX2 input is switched through of the second multiplexer at the second input of the adder one arithmetic zero. It continues to be realized that the second Tor1MUX1 and Tor1MUX2 inputs and the second Tor2MUX1 and Tor2MUX2 inputs of the first and second Multiplexers are locked antivalent.

Der ersten Verarbeitungsstufe schließt sich eine zweite Verarbeitungsstufe an. Dabei korrespondieren die Einstellungen des jeweils ersten und zweiten Multiplexers mit einer Folgestellung. Die bitweise Multiplikation wird begonnen, indem einerseits der bereitgestellte 1-Bit-Koeffizient über den zweiten Tor1MUX1-Eingang an den ersten Eingang des Multiplizierers angelegt wird und anderseits das über den CBUS anliegende Datenwort-Bit über den zweiten Tor2MUX1-Eingang an den zweiten Eingang des Multiplplizieres gelangt. Im Multiplizierer wird eine Multiplikation des 1-Bit Koeffizienten mit dem eingegebenen Datenwort-Bits ausgeführt. The first processing stage is followed by a second Processing level. The settings of the first and second multiplexers each with one Follow-up opinion. The bitwise multiplication is started by on the one hand, the 1-bit coefficient provided over the second Tor1MUX1 input to the first input of the multiplier is created and on the other hand that applied via the CBUS Data word bit via the second Tor2MUX1 input to the second The input of the multiplier arrives. In the multiplier a multiplication of the 1-bit coefficient by the entered data word bits executed.

Nachfolgend wird das hierbei am Ausgang des Multiplizierers erzeugte Produkt durch den durchgeschalteten zweiten Tor2MUX2- Eingang des sich ebenfalls in Folgestellung befindlichen zweiten Multiplexers an den zweiten Eingang des zum MAC gehörenden Addierers angelegt. Weiterhin wird durch den in Folgestellung befindlichen zweiten Tor1MUX2-Eingang der am Ausgang des Accumulators bereitgestellte Rechenwert einer vorhergehenden Polynomberechnung an den ersten Eingang des Addierers angelegt und es wird nunmehr, nach der Addition im Addierer, über den Eingang des Accumulators der neue Rechenwert im Accumulator eingespeichert. This is what happens at the output of the multiplier generated product by the second Tor2MUX2- Receipt of the following position second multiplexer to the second input of the MAC Adders created. Furthermore, by the following second Tor1MUX2 input located at the output of the Accumulators provided a previous calculation value Polynomial calculation applied to the first input of the adder and it is now, after the addition in the adder, over the Input of the accumulator the new calculation value in the accumulator stored.

Außerdem wird realisiert, dass der jeweils erste Tor1MUX1- und Tor1MUX2-Eingang sowie der erste Tor2MUX1- und Tor2MUX2-Eingang in der Folgestellung antivalent zu den Schaltzuständen der Grundstellung gesperrt sind. It is also realized that the first Tor1MUX1 and Tor1MUX2 input and the first Tor2MUX1 and Tor2MUX2 input in the following position antivalent to the switching states of the Are locked.

Eine weitere verfahrensmäßige Lösung der Aufgabenstellung sieht vor, dass die Polynomberechnung nur mit einem Teil der verfügbaren Slices ausgeführt wird. See another procedural solution to the task before that the polynomial calculation only with part of the available slices is executed.

Eine besondere weitere verfahrensmäßige Lösung der Aufgabenstellung sieht vor, dass die Polynomberechnung nur mit einem Teil der Verarbeitungsbreite des Slices mit einer bestimmten Anzahl von Datenpfaden ausgeführt wird. A special further procedural solution of the Task provides that the polynomial calculation with only one Part of the processing width of the slice with a certain one Number of data paths running.

Eine Ausführung der weiteren verfahrensmäßigen Lösung der Aufgabenstellung sieht vor, dass eine Teil-Verarbeitungsbreiten-Logik bei der Polynomberechnung auftretende Überläufe über die vorgesehene Verarbeitungsbreite erkennt und die Weiterverarbeitung bei der Polynomberechnung in den zulässigen Slices in den vorgesehenen Verarbeitungs-Bereichen gewährleistet. An execution of the further procedural solution of the The task provides that a Part processing width logic overflows occurring in the polynomial calculation over recognizes the intended processing range and the Further processing in the polynomial calculation in the permitted slices in the intended processing areas.

Die Erfindung soll nachfolgend anhand eines Ausführungsbeispieles näher erläutert werden. In den zugehörigen Zeichnungen zeigt The invention is based on a Embodiment are explained in more detail. In the accompanying drawings shows

Fig. 1 eine Teilstruktur des Multiplizier-Accumulators im Prozessor Fig. 1 shows a partial structure of the multiplier accumulator in the processor

Fig. 2 ein Blockschaltbild eines Bereiches des Multiplizier- Accumulators mit implementierter Teil-Verarbeitungsbreiten-Logik Fig. 2 is a block diagram of an area of the multiplier accumulator with implemented partial processing width logic

In Fig. 1 wird eine Teilstruktur des im Prozessor vorliegenden Multiplizier-Accumulators (MAC) dargestellt, wobei die Fraktionierung in Slices beispielhaft durch die Darstellung von Slice m 23 und Slice m-1 24 verdeutlicht wird. In Fig. 1 is a partial structure is present in the processor of the multiply-accumulator (MAC), wherein the fractionation in slices by way of example m by the representation of slice 23 is illustrated and slice m-1 24.

Diese Slices sind wiederum in Datenpfaden, die durch die dargestellten niederwertigster Bitstreifen des Slice m 4 und vereinbarter höchstwertiger Bitstreifen des Slice m-1 5 repräsentiert werden, organisiert. These slices are in turn organized in data paths which are represented by the least significant bit strips of the slice m 4 and the most significant bit strips of the slice m-1 5 that have been agreed.

Die wahlweise Einstellung der Betriebsart mit abtrennbaren Übertrag oder der Integer-Betriebsart der Multiplizierer/Addierer wird durch die jeweilige Verarbeitung ihrer Übertragsausgänge zur Eingabe in die Übertragseingänge der Multiplizierer/Addierer des nächst-höherwertigen Datenpfad des Slices voreingestellt. Bei vorliegender Integer-Betriebsart sind die Übertragsausgangs-Multiplexer für die Übertragsausgangssignale durchgeschaltet. The optional setting of the operating mode with detachable Carry or the integer mode of the Multiplier / adder is processed by their respective Carry outputs for input into the carry inputs of the Multiplier / adder of the next higher data path of the Preset slices. With the existing integer operating mode are the carry output multiplexers for the Carry output signals switched through.

Bei vorgewählter Betriebsart mit abtrennbarem Übertrag sind anstatt der Durchschaltungen die Übertragsausgangs-Multiplexer für die Übertragsausgangssignale der Multiplizierer/Addierer gesperrt und es wird statt dessen jeweils ein Nullsignal durchgeschalten, d. h. eine weitere Verarbeitung der Übertragsausgänge der Multiplizierer/Addierer wird vermieden. In the case of a preselected operating mode with detachable carry the carry output multiplexers instead of the interconnections for the carry output signals of the multiplier / adder blocked and a zero signal is sent instead switched through, d. H. further processing of the Carry outputs of the multipliers / adders are avoided.

Bei der in Fig. 1 dargestellten Teilstruktur des Multiplizier- Accumulators im Prozessor liegt die Betriebsart mit abtrennbarem Übertrag vor. Damit ist jeweils der Multiplizier- und der Addierer-Übertragsausgangs-MUX 25, 26 so geschalten, dass jeweils die anliegende NULL an die Übertragseingänge des Multiplizierers 1 und Addierers 2 angelegt werden. In the substructure of the multiplier accumulator in the processor shown in FIG. 1, the operating mode with separable carry is present. The multiplier and adder carry output MUXs 25 , 26 are each switched in such a way that the NULL present is applied to the carry inputs of multiplier 1 and adder 2 .

Die zu verarbeitenden einzelnen Bitstellen des 1-Bit-Koeffizienten und des Datenwortes werden an den Eingängen der ersten Multiplexer der Bitstreifen bereitgestellt. So wird im Slice m- 1 24 die 2¹⁵-Bit-Stelle des 1-Bit-Koeffizienten, die 2¹⁵-Bit- Stelle des 1-Bit-Koeffizienten für Slice m-1 19, am zweiten Tor1MUX1-Eingang 14 und die 2¹⁵-Bitstelle des Datenwortes für Slice m-1, der Datenwert der 2¹⁵-Bitstelle des Datenwortes für Slice m-1 21, am zweiten Tor2MUX1-Eingang 15 jeweils angelegt. Die zweistufige Polynomberechnung des Datenwortes mit dem 1- Bit-Koffizienten in den Datenpfaden eines Slices wird mit einer ersten Verarbeitungsstufe, welche mit einer Grundstellung des ersten Multiplexers 6 und des zweiten Multiplexers 7 korrespondiert, begonnen. The individual bit positions of the 1-bit coefficient and the data word to be processed are provided at the inputs of the first multiplexers of the bit strips. So in the slice m-1 24 the 2 ^15- bit position of the 1-bit coefficient, the 2 ¹⁵ -bit position of the 1-bit coefficient for slice m-1 19 , at the second Tor1MUX1 input 14 and the 2 ¹⁵ bit location of the data word for slice m-1, the data value of the 2 ¹⁵ bit location of the data word for slice m-1 21 , each applied to the second Tor2MUX1 input 15 . The two-stage polynomial calculation of the data word with the 1-bit coefficient in the data paths of a slice is started with a first processing stage, which corresponds to a basic position of the first multiplexer 6 and the second multiplexer 7 .

Hierbei wird jeweils ein erster Tor1MUX1- und Tor1MUX2-Eingang 10, 12 des ersten sowie zweiten Multiplexers 6, 7 durchgeschalten. Dadurch wird einerseits der Ausgangswert des Akkumulators 3 an den ersten Eingang des Multiplizierers 1 angelegt. Anderseits wird über die Zwischenverbindung 27 der Ausgangswert des Multiplizierers 1 an den ersten Eingang des Addierers des zumindest mittelbar benachbarten niederwertigsten Datenpfades eines höherwertigen Slices m 23 angelegt. In this case, a first Tor1MUX1 and Tor1MUX2 input 10 , 12 of the first and second multiplexer 6 , 7 are switched through. In this way, on the one hand, the output value of the accumulator 3 is applied to the first input of the multiplier 1 . On the other hand, the output value of the multiplier 1 is applied to the first input of the adder of the at least indirectly adjacent least significant data path of a more significant slice m 23 via the intermediate connection 27 .

Es wird weiterhin jeweils ein erster Tor2MUX1- und Tor2MUX2- Eingang 11, 13 des ersten sowie zweiten Multiplexers 6, 7 ebenfalls durchgeschalten, so dass einerseits am zweiten Eingang des Multiplizierers 1 der arithmetische Wert EINS anliegt. Anderseits wird mit dem durchgeschaltenen ersten Tor2MUX2- Eingang 13 des zweiten Multiplexers 7 am zweiten Eingang des Addieres 2 eine arithmetische NULL angelegt. A first Tor2MUX1 and Tor2MUX2 input 11 , 13 of the first and second multiplexer 6 , 7 are also also switched through, so that on the one hand the arithmetic value ONE is present at the second input of the multiplier 1 . On the other hand, with the first Tor2MUX2 input 13 of the second multiplexer 7 switched through, an arithmetic ZERO is applied to the second input of the adder 2 .

Weiterhin wird im ersten Verarbeitungszustand durch den hierfür eingestellten Grundzustand des ersten und zweiten Multiplexer 6, 7 gewährleistet, dass der jeweils zweite Tor1MUX1- und Tor1MUX2-Eingang 14, 16 sowie zweite Tor2MUX1- und Tor2MUX2- Eingang 15, 17 antivalent gesperrt ist. Furthermore, in the first processing state, the basic state of the first and second multiplexers 6 , 7 set for this purpose ensures that the respective second Tor1MUX1 and Tor1MUX2 inputs 14 , 16 and second Tor2MUX1 and Tor2MUX2 inputs 15 , 17 are blocked in an equivalent manner.

In einer sich an die erste Verarbeitungsstufe anschließenden zweiten Verarbeitungsstufe, welche mit einer zum Grundzustand antivalenten Folgestellung des jeweils ersten und zweiten Multiplexers 6, 7 korrespondiert, wird die bitweise Multiplikation des 1-Bit Koeffizienten mit dem eingegebenen Datenwort- Bit, hierbei für den Slice m-1, ausgeführt. In a second processing stage following the first processing stage, which corresponds to a subsequent position of the respective first and second multiplexers 6 , 7 , which is antivalent to the basic state, the bit-wise multiplication of the 1-bit coefficient by the entered data word bit, here for the slice m -1, executed.

Dies geschieht einerseits durch Eingabe der bereitgestellten 2¹⁵-Bit-Stelle des 1-Bit-Koeffizienten für Slice m-1 19 über den nunmehr durchgeschaltenen zweiten Tor1MUX1-Eingang 14 an den ersten Eingang des Multiplizierers 1. Anderseits wird die Eingabe der vom CBUS 22 bereitgestellten 2¹⁵-Bitstelle für Slice m-1 des Datenwortes, der Datenwert der 2¹⁵-Bitstelle des Datenwortes für Slice m-1 21, über den durchgeschaltenen zweiten Tor2MUX1-Eingang 15 an den zweiten Eingang des Multiplizieres 1 ausgeführt. This is done on the one hand by entering the provided 2 ^15- bit position of the 1-bit coefficient for slice m-1 19 via the now connected second Tor1MUX1 input 14 to the first input of the multiplier 1 . On the other hand, the input of the 2 ¹⁵ bit position provided by the CBUS 22 for slice m-1 of the data word, the data value of the 2 ¹⁵ bit position of the data word for slice m-1 21 , via the switched through second Tor2MUX1 input 15 to the second input of the Multiply 1 executed.

Nachfolgend wird das am Ausgang des Multiplizierers 1 erzeugte Produkt durch den durchgeschalteten zweiten Tor2MUX2-Eingang 17 des sich ebenfalls in Folgestellung befindlichen zweiten Multiplexers 7 in den zweiten Eingang des zum MAC gehörenden Addierers 2 angelegt und es wird durch den in Folgestellung befindlichen zweiten Tor1MUX2-Eingang 16 der am Ausgang des Accumulators 3 bereitgestellte Rechenwert einer vorhergehenden Polynomberechnung an den ersten Eingang des Addierers 2 angelegt. The product generated at the output of the multiplier 1 is subsequently applied to the second input of the adder 2 belonging to the MAC through the connected second Tor2MUX2 input 17 of the second multiplexer 7, which is also in the following position, and it is applied by the second Tor1MUX2 input located in the following position 16 the arithmetic value of a previous polynomial calculation provided at the output of the accumulator 3 is applied to the first input of the adder 2 .

Nach der Addition im Addierer 2 wird über den Eingang des Accumulators 3 der neue Rechenwert im Accumulator 3 eingespeichert. After the addition in the adder 2 , the new arithmetic value is stored in the accumulator 3 via the input of the accumulator 3 .

Weiterhin wird hierbei in dem ersten und zweiten Multiplexer 6, 7 gewährleistet, dass der jeweilig erste Tor1MUX1- und der Tor1MUX2-Eingang 10, 12 sowie der erste Tor2MUX1- und der Tor2MUX2-Eingang 11, 13 jeweils den gesperrten Schaltzustand einnimmt. Diese Schaltzustände sind entsprechend der eingenommenen Folgestellung zu den Schaltzuständen der Grundstellung antivalent. Furthermore, the first and second multiplexers 6 , 7 ensure that the respective first Tor1MUX1 and Tor1MUX2 inputs 10 , 12 and the first Tor2MUX1 and Tor2MUX2 inputs 11 , 13 each assume the blocked switching state. These switching states are antivalent to the switching states of the basic position in accordance with the subsequent position assumed.

In dem in Fig. 2 dargestellten Blockschaltbild eines Bereiches des Multiplizier-Accumulators mit implementierter Teil-Verarbeitungsbreiten-Logik 36 ist eine Teilstruktur eines MAC, bestehend aus den Slice k 30, Slice k-1 31, Slice 0 und den zugehörigen Koeffizienten-Register k 33, Koeffizienten-Register k-1 34, Koeffizienten-Register 0 35, dargestellt. In the block diagram shown in FIG. 2 of a region of the multiplier accumulator with implemented partial processing width logic 36 is a partial structure of a MAC, consisting of the slice k 30 , slice k-1 31 , slice 0 and the associated coefficient register k 33 , coefficient register k-1 34 , coefficient register 0 35 .

Die Belegungswerte der Bitstellen des 1-Bit-Koeffizienten sind sliceweise in den zugehörigen Koeffizienten-Registern gespeichert und liegen, zusammen mit den vom CBUS 22 bereitgestellten Belegungswerten der Bitstellen des Datenwortes, zur Verarbeitung an den jeweiligen Slices an. The assignment values of the bit positions of the 1-bit coefficient are stored slice-wise in the associated coefficient registers and, together with the assignment values of the bit positions of the data word provided by the CBUS 22, are available for processing at the respective slices.

Die Verarbeitung mit der Polynomberechnung wird in den Slices so vorgenommen, dass sie in einer solchen Verarbeitungsbreite erfolgt, bei der nicht alle zu den Slices zugehörigen Datenpfade zur Berechnung herangezogen werden. Processing with the polynomial calculation is in the slices made in such a processing range in which not all associated with the slices Data paths can be used for the calculation.

Auch werden nicht alle Slices bei der Verarbeitung dadurch eingesetzt, in dem die Zwischenverbindung 27 benachbarte Slices umgeht. Die Teil-Verarbeitungsbreit-Logik 36 erkennt auftretende berl ufe und richtet durch das Bus-Steuersignal 37 die vorab gew hlte Verarbeitungsbreite am CBUS 22 ein. Bezugszeichenliste 1 Multiplizierer
2 Addierer
3 Accumulator
4 niederwertigster Bitstreifen des Slice m
5 vereinbarter höchstwertiger Bitstreifen des Slice m-1
6 erster Multiplexer
7 zweiter Multiplexer
8 erstes MUX1-Tor
9 zweites MUX1-Tor
10 erster Tor1MUX1-Eingang
11 erster Tor2MUX1-Eingang
12 erster Tor1MUX2-Eingang
13 erster Tor2MUX2-Eingang
14 zweiter Tor1MUX1-Eingang
15 zweiter Tor2MUX1-Eingang
16 zweiter Tor1MUX2-Eingang
17 zweiter Tor2MUX2-Eingang
18 2⁰-Bit-Stelle des 1-Bit-Koeffizienten für Slice m
19 2¹⁵-Bit-Stelle des 1-Bit-Koeffizient für Slice m-1
20 Datenwert der 2⁰-Bitstelle des Datenwortes für Slice m
21 Datenwert der 2¹⁵-Bitstelle des Datenwortes für Slice m-1
22 CBUS
23 Slice m
24 Slice m-1
25 Multiplizier-Übertragsausgangs-MUX
26 Addierer-Übertragsausgangs-MUX
27 Zwischenverbindung
28 erstes MUX2-TOR
29 zweites MUX2-TOR
30 Slice k
31 Slice k-1
32 Slice 0
33 Koeffizienten Register k
34 Koeffizienten Register k-1
35 Koeffizienten Register 0
36 Teil-Verarbeitungsbreiten-Logik
37 Bus-Steuersignal
Also, not all slices are used in processing by bypassing adjacent slices by the interconnect 27 . The partial processing width logic 36 recognizes occurring overlaps and sets up the previously selected processing width on the CBUS 22 by means of the bus control signal 37 . Reference number list 1 multiplier
2 adders
3 Accumulator
4 least significant bit strips of the slice m
5 agreed most significant bit stripe of the slice m-1
6 first multiplexer
7 second multiplexer
8 first MUX1 gate
9 second MUX1 gate
10 first Tor1MUX1 input
11 first Tor2MUX1 input
12 first Tor1MUX2 input
13 first Tor2MUX2 input
14 second Tor1MUX1 input
15 second Tor2MUX1 input
16 second Tor1MUX2 input
17 second Tor2MUX2 input
18 2 ⁰ -bit position of the 1-bit coefficient for slice m
19 2 ¹⁵ bit location of the 1-bit coefficient for slice m-1
20 Data value of the 2 ⁰ bit position of the data word for slice m
21 Data value of the 2 ¹⁵ bit position of the data word for slice m-1
22 CBUS
23 slice m
24 slice m-1
25 multiplier carry output MUX
26 Adder carry output MUX
27 Interconnection
28 first MUX2-TOR
29 second MUX2-TOR
30 slice k
31 Slice k-1
32 slice 0
33 coefficient registers k
34 coefficient registers k-1
35 Coefficients register 0
36 part processing width logic
37 bus control signal

Claims

1. A method for polynomial calculation with 1-bit coefficients, the calculation being carried out in a multiziplier accumulator (MAC) of a processor with an arithmetic unit which is fractionated into slices with data paths implemented therein, characterized in that the polynomial calculation is carried out together with the associated 1-bit coefficients in cross-slice data word width is carried out in parallel in the data paths, this calculation being carried out within a first processing stage, in which a 1-bit shift of the entire data word in the direction of higher-order bits of the data word is carried out, and a second processing stage , which includes a bit-wise multiplication of the data word with the existing 1-bit coefficients and subsequently the accumulation of the product to the previously stored value in the accumulator.

2. The method according to claim 1, characterized characterized that the data paths, which each to belong to a slice, optionally in an operating mode with detachable transfer without evaluation of transfers of the Multiplier or in an integer mode with Carry out evaluation of the multiplier configured become.

3. The method according to claims 1 and 2, characterized in that the switchable 1-bit shift function present in the data paths is expanded by an additional switchable interconnect ( 27 ) which crosses the slice, with the output of each slice , which is implemented by the output of the associated most significant data path, to which the switchable intermediate connections ( 27 ) are routed to the input of the least significant data path agreed for each slice, and thereby the 1-bit shift function is guaranteed at least to indirectly adjacent, higher-order slices.

4. The method according to claims 1 to 3, characterized in that the two-stage polynomial calculation of the data word with the 1-bit coefficient in the data paths of a slice is started so that the first processing stage with a basic position of the first multiplexer ( 6 ) and second multiplexer ( 7 ) corresponds to a first Tor1MUX1 and Tor1MUX2 input ( 10 ); ( 12 ) of the first and second multiplexers ( 6 ); ( 7 ) are switched through and thus on the one hand the output value of the accumulator ( 3 ) to the first input of the multiplier ( 1 ) and on the other hand via the interconnection ( 27 ) the output value of the multiplier ( 1 ) to the first input of the adder of the at least indirectly adjacent least significant Data path of a higher value slice m ( 22 ) is created, that in each case a first Tor2MUX1 and Tor2MUX2 input ( 11 ); ( 13 ) of the first and second multiplexers ( 6 ); ( 7 ) is also switched through, so that the arithmetic value ONE is present at the second input of the multiplier ( 1 ) and also an arithmetic with the switched through first Tor2MUX2 input ( 13 ) of the second multiplexer ( 7 ) at the second input of the adder ( 2 ) NULL is applied that the second Tor1MUX1 and Tor1MUX2 input ( 14 ); ( 16 ) and second Tor2MUX1 and Tor2MUX2 input ( 15 ); ( 17 ) the first and second multiplexers ( 6 ); ( 7 ) is blocked in an equivalent way that in a second processing stage following the first processing stage, which is followed by the respective first and second multiplexers ( 6 ); ( 7 ) corresponds to the bit-by-bit multiplication by entering the 1-bit coefficient provided via the second Tor1MUX1 input ( 14 ) to the first input of the multiplier ( 1 ) and the data word bit via the CBUS ( 22 ) via the second Tor2MUX1 input ( 15 ) enters the second input of the multiplier ( 1 ), that the multiplier ( 1 ) carries out a multiplication of the 1-bit coefficient by the entered data word bit, so that what is generated at the output of the multiplier ( 1 ) Product is subsequently applied through the switched through second Tor2MUX2 input ( 17 ) of the second multiplexer ( 7 ), which is also in the following position, into the second input of the adder ( 2 ) belonging to the MAC and that through the second Tor1MUX2 input ( 16 ) the arithmetic value provided at the output of the accumulator ( 3 ) of a preceding polynomial calculation to the first input of the adder ( 2 ) is inserted and now, after addition in the adder ( 2 ), the new calculation value is stored in the accumulator ( 3 ) via the input of the accumulator ( 3 ), that the respective first Tor1MUX1 and Tor1MUX2 input ( 10 ); ( 12 ) and first Tor2MUX1 and Tor2MUX2 inputs ( 11 ); ( 13 ) are blocked in the following position, equivalent to the switching states of the basic position.

5. The method according to claims 1 to 4, characterized characterized that the polynomial calculation with part of the slices, this selection of slices are variably grouped, is executed.

6. The method according to claims 1 to 5, characterized characterized that the polynomial calculation only with part of the processing range of the slice and with one certain number of data paths is executed.

7. The method according to claim 6, characterized in that a partial processing width logic ( 36 ) in the polynomial calculation recognizes overflows occurring over the intended processing width and that the further processing in the polynomial calculation in the permitted slices in their processing areas by Bus control signal 37 is guaranteed.