WO1982003481A1

WO1982003481A1 - A bit slice microprogrammable processor for signal processing applications

Info

Publication number: WO1982003481A1
Application number: PCT/US1982/000359
Authority: WO
Inventors: Micro Devices Inc Advanced; Bernard James New
Original assignee: Micro Devices Inc Advanced
Priority date: 1981-03-26
Filing date: 1982-03-18
Publication date: 1982-10-14
Also published as: EP0075593A1; EP0075593B1; JPS58500424A; US4393468A; JPH0230538B2; EP0075593A4; DE3279776D1

Abstract

A programmable device (Fig. 1, 22) for signal processing applications in which short loops of digital data are processed repetitively and in parallel. The device consists of five independently programmable sub-systems (Fig. 3) whose functions are able to operate simultaneously. The apparatus is intended for use in a connection with a digital multiplier device (Fig. 1, 32) and a digital memory device (Fig. 1, 34) for such signal processing applications as fast Fourier transforms and time domain filtering in real time or near real time. The five parallel functions are 1. to move data in and out of an external memory device (Fig. 1, 34) between selected registers (Fig. 3, 60, 62, 64, 66, 68 and 70); 2 to move data in and out of an external multiplier (Fig. 1, 32) between selected registers and an arithmetic logic unit (ALU) (Fig. 3, 84); 3. to move data from the output of a multiplier to selected registers and the ALU; 4. to propagate data selectively through a chain if register (Fig. 3, 60, 62, 64, 66, 68 and 70) the chain being of preselectable length; and 5. to perform selected arithmetic and logic operations. The device is provided with an instruction set (Fig. 4) capable of completely defining any of the five simultaneously allowable functions. The device structure is modular (Fig. 2) to permit expansion of data word length at the ALU. Internally generated bit signals are capable of explicitly forcing a carry or inhibiting a carry, thereby to permit independent parallel operation or extended word length operation under program control. The entire apparatus is intended to be embodied as an integrated circuit in a single chip of semiconductor material (Fig. 3).

Description

A BIT SLICE MICROPROGRAM ABLE PROCESSOR FOR SIGNAL PROCESSING APPLICATIONS

BACKGROUND OF THE INVENTION

1. Field of Invention This invention relates to digital signal processing and more particularly it relates to a device capable of performing specific arithmetic and logic functions required to perform various types of waveform signal processing tasks including transforms known collectively as fast Fourier transforms ( FFT) .

Fast Fourier transforms are a class of processes which are capable of performing Fourier transformation of signals with considerably fewer multiplication operations than normally is required for Fourier transformation. For example, direct evaluation of a discrete Fourier transform

2 on N number of points requires N complex multiplications and additions. A fast Fourier transform requires only

- Nsj- log_N number of computations . For an N = 1024 points , this represents a computational savings of ninety-nine _. percent.

The fast Fourier transform is characterized by a large number of repetitive sequential operations of complex (real and imaginary) numbers in short loops. It is desirable to perform such computations as rapidly as possible to accommodate a broad spectrum of frequencies in real time applications where manipulation of the information in the transform domain is particularly convenient. 2. Description of the Prior Art

In the past, general purpose computers and bit slice machines have been adapted for waveform signal processing applications. General purpose computers, 5 however, are generally expensive and have many unnecessary or limited functions when used in signal processing applications, specifically signal processing applications which approach real time speeds. The number of multiplications, data transfers, and the like, which must be ° performed during each sample period is extremely large, and the processing time is generally limited by the critical path of the central processing unit. For example, a typical bit slice-type central processing unit is the Am2903 Arithmetic Processor manufactured by Advanced Micro Devices, inc. of Sunnyvale, California. The central element, the Am2903, contains an arithmetic logic unit and a 16-word scratch-pad memory with special multiply functions. However, the Am2903 has a critical path which only permits the efficient implementation of a microprogrammed multiplication in a multi-cycle operation. It is therefore necessarily slower than a device capable of parallel hardware multiplication. Still further, the architecture of the Am2903 has only one arithmetic logic unit and one data bus. Thus a critical path time restraint exists if it is necessary to manipulate complex numbers, since two cycles are required for each such operation. Still further, the Am2903 does not easily permit simultaneous memory access and arithmetic operation, thus establishing another critical path.

O What is needed is a device capable of a high degree of parallel processing. Indeed, it is only by performing many operations in parallel that really high processing throughput may be achieved. The input/output structure of existing devices, including the Am2903 simply do not provide for the necessary interconnection, or flexibility of interconnection, between computational elements, storage elements and external devices to minimize critical path.

SUMMARY OF THE INVENTION

According to the invention a programmable device is provided for digital signal processing applications in which short loops of digital data are processed repetitively and in parallel. The device is a structure, typically on a single silicon chip, comprising storage registers and an arithmetic logic unit with suitable multiplexers to permit a high degree of flexibility in direct interconnection of the registers, the arithmetic logic unit, and external devices such as a multiplier and external memory. The device consists of five independently programmable subsystems where functions are able to operate simultaneously in conjunction with a multiplier device, a memory device, and a source of program instructions. The five functions are: 1) to move data in and out of an external memory device between preselected registers; 2) to move data in and out of an external multiplier between the preselected registers and an arithmetic logic unit (ALU); 3) to move data from the output of an external multiplier to preselected registers and to the ALU; 4) to propagate data selectively through a chain of registers, the chain being of preselectable length; and 5) to perform selected arithmetic and logic operations.

An instruction set is defined for the device according to the invention which is capable of completely specifying any of the five simultaneously allowable operations. The apparatus is constructed in manner permitting modular expansion of data word length at the arithmetic logic unit by the use of control bits capable of explicitly forcing a carry or inhibiting a carry, thereby to permit independent parallel operation of the arithmetic logic unit or extended word length operation under program control.

The invention will be best understood by reference to the following detailed description taken in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram of a digital computer system to which is coupled a signal processing system. Figure 2 is a block diagram of an arithmetic processor and a multiplier in which modular devices according to the invention are employed.

Figure 3 is a schematic diagram of a device according to the invention.

Figure 4 is chart illustrating the control word of the device according to the invention. Figure 5 is a set of six tables defining the instructions for six independent registers of the device according to the invention.

Figure 6 is a set of two tables defining the arithmetic logic unit operand select instructions for the apparatus according to the invention.

Figure 7 is a table defining the arithmetic logic unit operation instructions for the device according to the invention.

Figure 8 is a table defining the instructions for the Multiplier Output (MO) multiplexer for the device according to the invention.

Figure 9 is a table defining the data input/output (DIO) instructions for the device according to the invention. Figure 10 is a table illustrating a single computation cycle of one type of fast Fourier transform butterfly operation for devices according to the invention in the form of structure shown in Figure 2. ✓•α_iΪ 7 OMPI DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS

The subject invention is intended for use as a bit slice arithmetic logic unit and register stack for fast Fourier transform generators which process digital signals representative of analog waveforms in digital form. One representative environment is a computer system 10 (Fig. 1). In the computer system 10, there are typically a main system bus 12 which interconnects various system elements, such as an input controller 14, with its associated input devices (not shown) coupled thereto by a suitable connection 15, an output controller with its associated output devices (not shown) coupled thereto by an associated connection 17, a central system controller 18, which incorporates a central processing unit, and a main memory 20, with its control interface.

According to the invention, there is at least one special function device, such as a signal processor 22, which is coupled to the main system bus 12. The signal processor 22 may for example perform fast Fourier transforms in response to a specific instruction word signal applied to the system bus 12.

An interface 24 connects a signal processor 22 with the main system bus 12 and provides all necessary data, address and control information transfer functions for the signal processor 22. The signal processor 22 is itself a small special purpose computing machine which is capable of rapidly generating digital signals representative of the results of its special purpose computation.

The signal processor 22 typically includes a microprogram sequencer 26, a read only memory (ROM) 28, a random access memory (RAM) 34, an address control device or address sequencer 36, and a special purpose complex number processor 33, as hereinafter explained, which comprises an arithmetic processor with a dedicated high-speed parallel multiply function device.

O ?I The microprogram sequencer 26 is coupled to the interface 24, the address control 36, the ROM 28 and the number processor 33. The microprogram sequencer 26 provides the microcode instructions, initial parameters and clock to the address control 36 and to the other devices of the signal processor 22. The ROM 28 contains data representing at least a portion of the constants used in the signal processor 22. The RAM 34 is for storage of input and output data and for so-called "scratch pad" storage of data generated during computation by the number processor 33.

The RAM 34 is coupled to the address control 36, as well as to and from the interface 24 and to and from the number processor 33.

Referring to Figure 2 there is shown a block diagram of a number processor 33 in which a plurality of devices according to the invention, hereinafter bit-slice processors 40, are employed. The architecture of Figure 2 is but one example of the use of the bit-slice processors 40 according to the invention. The number processor 33 (Fig. 2) comprises a real processor portion 42 and an imaginary processor portion 44, with data input/output terminals of the real processor 42 coupled to an interconnection for real data 46 and the data input/output terminals of the imaginary processor 44 coupled to a bus for imaginary data 48. The number processor 33 further includes, for example, a 16-bit high-speed multiplier 32 having one operand input coupled to a programmable read only memory (PROM) 50 and the other operand input coupled to a multiplier input/output terminal of both the real processor 42 and the imaginary processor 44. The multiplier 32 may for example be a type MPY-16HJ

16 x 16 bit parallel multiplier manufactured by TRW, Inc. of Los Angeles, California. Other high-speed parallel array multipliers may also be used. The product of the multiplier 32 is fed through a bus 52 to the multiplier inputs of the real processor 42 and the imaginary processor 44.

Specifically, for a 16-bit output of the most significant product, one 8-bit portion is coupled to one 8-bit input of

__O PI one bit-slice processor 40 of the imaginary processor and to the corresponding 8-bit input of a bit-slice processor 40 of the real processor, while the other 8-bit output is (the most significant eight bits) is coupled to one 8-bit input of the more significant bit-slice processor 40 of the imaginary processor 44 and also to the same bit inputs of the more significant bit-slice processor 40 of the real processor 42. The bit-slice processors 40 are appended to each other in parallel by a carry flag line 54 (including other flags) between each bit-slice processor 40 forming a processor unit. Each bit-slice processor is under independent or interdependent control of external microcode through control buses 41A, 41B, 41C and 41D. It is thus seen that the bit-slice processor 40 is a basic building block of a number processor 33. The architecture of Figure 2 is one example of a parallel processor which can be used with a complex number system. Other architectures are suggested by the modular structure and capabilities of the bit-slice processor 40. Turning to Figure 3, each bit-slice processor 40 consists of a register/arithmetic logic unit module in integrated circuit form having an internal architecture with the specific intention of providing a high degree of parallelism and flexibility. The bit-slice processor 40 comprises six registers 60, 62, 64, 66, 68 and 70, each of which has an associated three-channel register multiplexer 72, 74, 76, 78, 80 and 82. Two of the registers 60, 66 serve as input registers. The registers 60, 62, 64, 66, 68, 70 are arranged in a stack forming a loop which can be entered or exited at any register. The input registers 60, 66 are spaced equidistant from one another, allowing the register stack to be programmmed as two independent parallel stacks, as one loop of registers or as a single stack of registers with one input and one output. In addition, there is provided an arithmetic logic unit 84 with two operand input terminals 86 and 88 called, respectively, the S input and the R input. An S input

OMPI multiplexer 90, an R input multiplexer 92, a data output multiplexer 94, a multiplier output multiplexer 96 and output drivers 98 and 100 constitute the interconnections. In addition, the bit-slice processor 40, which is structured as an 8-bit slice, includes the following external connections: an 8-bit-wide data I/O terminal (DIO) 102, an 8-bit-wide multiplier I/O terminal (MIO) 104, an 8-bit-wide multiplier input terminal (MI) 106, which is intended for use in parallel with the MIO 104 as the most significant product of the 16-bit line, a sign extend input (SE__N) 108, a carry input (C_IN) 110, a sign extend output (SE_OUT) 112, and five bits of selected flags which are used for control of a parallel bit-slice device, namely. Carry, Propagate, Generate, Zero, and Overflow. The data output multiplexer 94 has four inputs, the multiplier output 96 has eight inputs, the S multiplexer 90 has eight inputs, and the R multiplexer 92 has eight inputs.

The control lines for the instructions of the multiplexers and to the ALU 84, as well as the clock lines for the registers, are not shown. The device is subject to external instructional control as defined by a 29-bit instruction word which allows all registers and the ALU to be explicitly controlled by external means with each clock cycle. Various specific implementations of logic circuits according to the invention will be apparent to those of ordinary skill in the design of logic circuits once the control state and interconnections have been defined as herein disclosed. The internal interconnections of the bit-slice processor 40 are intended to provide maximum flexibility of interconnection between the registers and the ALU 84, as well as to the external access terminals. Specifically referring to Figure 3, all interconnections between elements are 8-bit-wide bus connections. The DIO terminal 102 has a data input (DI) bus 120 which is coupled to the DI input of the MO multiplexer 96 and also to one input of the first

OMPI multiplexer 72 for the first register 60 (hereinafter the Al register) and to one input of the fourth multiplexer 78 of the fourth register 66 (hereinafter the Bl register) . The

MIO terminal 104 has a multiplier input (LSP) bus 122 which is coupled to the least significant product (LSP) terminal of the S multiplexer 90, to one input of the A2 multiplexer

74 of the A2 register 62 and to one input of the B2 multiplexer 80 of the B2 register 68.

The MI terminal 106 has its bus MSP 124 coupled to the most significant product (MSP) terminal of the S multiplexer 90, to one input of the A3 multiplexer 76 of the

A3 register 64, to one input of the B3 multiplexer 82 of the

B3 register 70, to one input of the Al multiplexer 72 of the

Al register 60, and to one input of the Bl multiplexer 78 of the Bl register 66. The output bus of the ALU 84, designated the ALU bus 126, is coupled to the ALU input of the MO multiplexer 96, to one input of the A3 multiplexer 76 of the A3 register 64, to one input of the B3 multiplexer 82 of the B3 register 70, to one input of the A2 multiplexer 74 of the A2 register 62 and to one input of the B2 multiplexer

80 of the B2 register 68.

The output of the Al register 60, designated the

Al bus, is coupled to one input of the A2 multiplexer 74 of the A2 register 62, to the Al input of the MO multiplexer 96 and to the Al inputs of both the S multiplexer 90 and the R multiplexer 92. The output of the A2 register 62, designated the A2 bus, is coupled to one input of the A3 multiplexer 76 of the A3 register 64, and to the A2 inputs of the DO multiplexer 94, to the MO multiplexer 96, to the S multiplexer 90 and to the R multiplexer 92. The output of the A3 register 64, designated the A3 bus is coupled to one input of the Bl multiplexer 78 of the Bl register 66, and to the A3 inputs of the DO multiplexer 94, the MO multiplexer 96, the S multiplexer 90, and the R multiplexer 92. The output of the Bl register 66 is coupled to one input of the B2 multiplexer 80 of the B2 register 68, and to the Bl inputs of the MO multiplexer 96, the S multiplexer

OMPI 90, and the R multiplexer 92. The output of the B2 register 68, designated the B2 bus, is coupled to one input of the B3 multiplexer 82 of the B3 register 70 and to the B2 inputs of the DO multiplexer 94, the MO multiplexer 96, the S multiplexer 90 and the R multiplexer 92. The output of the B3 register 70 is coupled to one input of the Al multiplexer 72 of the Al register 60 and to the B3 inputs of the DO multiplexer 94, the MO multiplexer 96, the S multiplexer 90 and the R multiplexer 92. The SE__N input terminals 108 are coupled to input of the R multiplexer 90, as are Force Zero input lines 130. The output of the S multiplexer is not only provided to the S input 86 of the ALU 84, but it is also provided as the MSB S operand output to the SE_OUT terminals 112. The output of the DO multiplexer 94 is coupled through the driver 98 to the DIO output terminals 102. The output of the MO multiplexer 96 is coupled through driver 100 to the MIO output terminals 104. As will be readily apparent, the multiplexers and busses of the bit-slice processor 40 allow, in one device, typically on a chip of silicon, direct interconnection between any register and the ALU 84, a continuously recirculating register stack of six registers which can be entered and exited at virtually any point, and a structure which allows two pairs of three registers to be independently queued up. Each register may be independently controlled to load data from one of the three sources. Two of the registers, the Al register 60 and the Bl register 66 are specifically intended for use as input registers while the other four registers are intended as accumulator registers.

The ALU 84 is provided with eight definable arithmetic and logic functions as follows: R + S, R - S, S - R, Pass R, R OR S, R AND S, R XOR S, and R (R inverted). Two ports are provided to communicate with an external multiplier. The MIO terminal 104 is designed to be used with the Y port of a multiplier to load operands and to recover the least significant product (LSP). The MI

O PI terminal 106 is intended to recover the most significant product (MSP) output of the external multiplier. The third port, the DIO terminal 102 is intended for communication with an external memory. The bit-slice processor 40 is provided with an ability to execute instructions simultaneously and independently in five areas. These areas are as follows:

1. exchanging data between an external memory and specified internal registers as established by register multiplexer instruction signals;

2. loading of a multiplier operand from either an external multiplier or an internal register as defined by operand multiplexer instruction signals;

3. retrieving a multiplier product from an external multiplier;

4. performing an arithmetic or logic operation; and

5. moving data within a stack of registers. Turning to Figure 4, there is shown the structure of a 29-bit binary instruction word, which when applied at twenty-nine external terminals executes the instructions explained hereinafter. The instruction is composed of thirteen independent disjoint microcode fields which preset the multiplexers and define the ALU operation for each microcode cycle. Specifically, Field 1 consisting of bit 0 is defined as the Data Out Enable field. Field 2, defined by bit 1 is the Multiplier Out Enable field. Field 3 defined by bits 2 and 3 is the Data Out Select field. Field

4 defined by bits 4, 5 and 6 is the Multiplier Out Select field. Field 5 defined by bits 7 and 8 is the Store Al field for storing data in register Al. Similarly, Fields 6, 7, 8, 9 and 10 defined by bits 9 and 10, 11 and 12, 13 and 14, 15 and 16, and 17 and 18, respectively are the Store A2, Store A3, Store Bl, Store B2, and Store B3 fields. Bits 19, ιo, 21 and 22 define all of the ALU operations and is called the ALU Operator field. Bits 23, 24, and 25 define the ALU

5 Operand Select, and bits 26, 27 and 28 define the ALU R Operand Select field.

Figure 5 defines the four states of bits I_ through I,_g which specify which of three multiplexer inputs is activated associated with the registers Al, A2, A3, Bl, B2, B3. A fourth state is a Hold state, a state which prevents the data from being propagated from the previous register or input into the register.

The Al register instruction set selects the most significant product (MSP), the DI bus or bus B3, as well as Hold, in response to the I_ and I_g bit input applied ^• thereto. The A2 register, responding to instruction bits I_ and I-J _Γ gates the LSP, the ALU and the Al bus as well as Hold. The A3 register responds at bits I., and I.- to enable the multiplexers for either the MSP, the ALU or the A2 bus, as well as Hold. The Bl register, instructed as bits I,- and I₁₄/ is mirror image of the Al register instructions in that it also responds to the most significant product DI bus A3 and Hold as its instruction set. Likewise, register B2 responding at bits I,- and I,,, gates the LSP, the Bl bus and Hold, and the B3 register responds at bits I,_ and I,_g to gate the MSP, the ALU, the B2 bus and Hold.

Instruction bits I₂6' ^I27 ^{and X}28 ^{define the} multiplexer gating of the R operand multiplexer and bits 1-23' I24 and I25 are the S operand multiplexer gating (Fig. 6). The instructions for the eight different gates are, respectively, Al bus, A2 bus, A3 bus, Bl bus, B2 bus, B3 bus, SE__N, and Force Zero. On the S operand side, the gating instructions are Al bus, A2 bus, A3 bus, Bl bus, B2 bus, B3 bus, MSP bus and LSP bus. Bits I . _r I_ς and I_g define the gating instructions for the MO multiplexer 96 which are, respectively, Al bus, A2 bus, A3 bus, Bl bus, B2 bus, B3 bus, ALU, and DI (Fig. 8). Bits I_ and I-, define the mutliplexer instructions for the DO multiplexer 94, which are respectively A2, A3, B2 and B3.

Bits I₂i' ^I22' ^I19 ^an<3 "^20 ^def^-^{ne the} operations for the ALU 84. The states of bits I,_g and I__Q also define

OMPI whether bits are set to communicate relevant information to an adjacent or parallel bit-slice processor. If bits I-^g and 1- - are always zero, a Carry Out bit issues a Carry signal (depending on the product of the ALU). Similarly, a Propagate bit and Generate bit are also set according to the product of the ALU whenever bits I,_g and I_2Q ^{are set at} zero. However, when bit I-._g is set, the Carry Out is always inhibited (set to zero) and the Propagate and Generate bits are locked in a complementary state to the Carry Out signal. (in the particular design shown, the Propagate and Generate bits are negated so that they are only enabled when set to a binary zero.) When bit I_₀ is set with bit I,_g at zero, the Carry Out bit is invariably enabled while the Propagate and Generate bits are also set to the complementary state, in this case they are also locked in the enable state.

Bit I₂₁ and I₂₂ ⁿ connection with bits I-._g and I₂₀ define the operations of the ALU. When bits I_lg and I₂₀ are zero, bits I₂, and I₂₂ define the four arithmetic operations R + S, R - S, R (R pass), and S - R, according to the four possible states of bits I₂, and I₂2_* ^τ^^{ιe same} arithmetic operations are defined when bit I,_g and I₂« are set at one and zero, respectively, and zero and one, respectively. However, when bits I-,_g and I₂₀ ^are both set to one, bits I-,. and I₂₂ define the logical operation R XOR S, R AND S, R negate, and R OR S.

In this manner all of the states of the ALU, of the multiplexers and of the registers are fully defined by external instructions. The implementation of a structure capable of executing these logical instructions will be apparent to a designer of ordinary skill in this art, and many such structures fulfill these criteria. The order of the bits is of course irrelevant so long as the fields are independent of one another within the constraints hereinabove defined. Figure 10 is a table illustrating the instructional microcode which may be applied to a basic signal processor architecture of the type shown in Figure 2

f o OMPIr for a single fast Fourier transform butterfly operation. The specific operation implemented is the butterfly operation of the form A¹ = A + B jj and B' = A - B ^, where A, A', B, B' and W? are complex numbers. The structure of the microcode is such that ten cycles are required to complete an entire read and write cycle. However, by implementing a code with the programming structure as herein disclosed, a new operation can be initiated and completed each fourth cycle. This is possible because of interleaving of the encoding to take full advantage of the parallel structure of the bit-slice processor 40. Sixteen cycles are shown numbered from 0 through 15. The multiplication microcycle shown begins with system cycle 3 with a read of the real and imaginary parts of the value B through the input bus DIO 102. An address has been presented to the external memory and the DO Enable field has been set to pass data into the bit-slice processor 40 to the Bl register of the real processor 42 (Fig. 2) and to the Bl register of the imaginary processor 44. The computation process continues as indicated by the instructions shown in the table. The spacing of the instructions is instructive. For example, the first ALU instruction, which occurs at system cycle 6, may recur again at system cycle 10 and system cycle 14. Four instruction cycles interleave in such a manner that each parallel operation follows as closely as is possible the critical path for the butterfly multiplication.

A valuable feature of the invention, particularly with the instruction set as herein defined wherein an express command is provided to each register in order to accept a command from a previous register, is the ability to push through data in a stack, to override data in the next subsequent register, and to pull out data from any register at any time. A still further convenient and important feature is the explicit instruction for inhibiting carry.

This instruction eliminates costly and unnecessary AND gates

OMPI between bit-slice modules. Accordingly, the system can be changed quickly under program control from one large extended precision computing device to a plurality of smaller independently operable computing devices. A still further valuable feature of the invention is the elimination of any need for an instruction decoder. The instructions are explicitly defined. In fact, more than 500,000,000 combinations of instructions (2 ) are possible in a device with instruction words and these explicit capabilities. The instruction set is such that there are no undefined states'. The embodiment of the invention herein described has defined a bit-slice computing device which is optimized for signal processing applications, and also for vector computation applications. It is capable of carrying on five functions in parallel, and has been shown to be able to perform characteristic butterfly functions (with the exception of the specific call to memory and the multiplication) with a minimum of wasted operations, and to provide a fully explicit instruction set without undefined states. The device can be used as a module in expandable computing machines or by a simple instruction, it can be caused to operate as an independent device in parallel with other similar computing machines, depending upon the microcode input. The bit-slice device herein described is intended to be embodied on a single chip of silicon semiconductor, thereby permitting its use as a component in a wide variety of applications in larger systems. For example, the device can be used in a hardware FFT transform processor. With a typical device cycle of 100 nanoseconds, the device is capable of completing a typical FFT butterfly operation once each 400 nanoseconds.

The invention has now been described with reference to a specific embodiment. Other embodiments will be apparent to those of ordinary skill in the art in light of this disclosure. Accordingly, it is not intended that this invention be limited except as indicated by the appended claims.

OMPI

Claims

WHAT IS CLAIMED IS:

1. An integrated circuit device for processing digital data in response to instruction signals in connection with a digital memory means and multiplier means, said device comprising: a plurality of storage registers for storing said data; . . . . an arithmetic logic unit for performing preselectable arithmetic and logic operations on said data; first means for moving data between said digital memory means and preselectable ones of said storage registers; second means for moving data between said multiplier means and preselectable ones of said storage registers and said arithmetic logic unit; third means for moving data from said multiplier means to preselectable ones of said storage register means and said arithmetic logic unit; and means for propagating data through preselectable ones of said storage registers, wherein said first, second and third moving means, said propagating means and said

arithmetic logic unit are preselectably interconnected with one another and are simultaneously operable in response to said instruction signals.

OMPI

2. The device according to Claim 1 wherein said propagating means is responsive to said instruction signals for selectively holding data currently therein against erasure.

3. The device according to Claim 2 wherein said arithmetic logic unit is responsive to said instruction signals to selectively issue or inhibit a carry signal indicating overflow of a most significant bit for selectively coupling or decoupling the most significant bit value of said data to a second one of said integrated circuit devices which is operative in parallel with said device.

4. The device according to Claim 3 wherein said arithmetic logic unit is further operative in response to an external instruction to produce a zero value data output.

OMPI

5. An integrated circuit device for processing digital data in response to instruction signals for use in connection with digital memory means for storing data in external multiplier means,* said device comprising: a plurality of storage registers for storing said data, each one of said storage registers having associated therewith a register multiplexer for controlling access thereto from at least one other of said storage registers and at least one other source of data, said plurality of storage registers being interconnected with one another through said associated register multiplexers in a manner capable of forming a sequence of storage registers, said sequence being of a length which is preselectable according to said instruction signal; an arithmetic logic unit which is operative to perform preselectable arithmetic and logic operations on data applied at a first operand input terminal and at a second operand input terminal, said arithmetic logic unit having coupled thereto a first operand multiplexer for controlling access to said first operand input terminal and a second operand multiplexer for controlling access to said operand input terminal, said first operand multiplexer and said second operand multiplexer being coupled to the output of each one of said storage registers, and at least one of said operand multiplexers being coupled to receive input from said multiplier means; first output multiplexer means for preselectably multiplexing output values from each one of said storage registers, from said arithmetic logic unit, and from said memory means to said multiplier means; and second output multiplexer means^* for preselectably multiplexing output values from at least two of said storage registers to said memory means, wherein said plurality of storage registers with associated register multiplexers are simultaneously operable independent of one another, and wherein said storage registers with associated register multiplexers, said arithmetic logic unit with associated operand multiplexers, said first output multiplexing means and said second output multiplexing means are all simultaneously operable in response to said instruction signal.

6. The device according to Claim 5 wherein each one of said storage registers is responsive to said instruction signals for selectively holding data currently therein against erasure.

7. The device according to Claim 6 wherein said arithmetic logic unit is responsive to said instruction signals to selectively issue or inhibit a carry signal indicating overflow of a most significant bit of said arithmetic logic unit in order to selectively signal a second arithmetic logic unit of a second said device which is operative in parallel.

8. The device according to Claim 6 wherein said arithmetic logic unit further includes means operative to generate signals indicative of Propagate, Generate, Zero, and Overflow depending on the result of arithmetic and logic operations in said arithmetic logic unit.

9. The device according to Claim 5 wherein a first one of said register multiplexers is coupled to receive data from a first external source into a first one of said storage registers, wherein a second one of said register multiplexers is coupled to receive data from a second external source into a second one of said storage registers, wherein a third one of said register multiplexers is coupled to receive data from said first one of said storage registers, wherein a fourth one of said register multiplexers is coupled to receive data from said second one of said storage registers, and wherein means coupled to receive data from said third storage register is coupled to an input of said first register multiplexer, and wherein means coupled to receive data from said fourth storage register is coupled to an input of said second storage register thereby to provide a data loop for recirculating data having two external inputs.

- ^*0REΛ

OMP

10. An integrated circuit device for storing and propagating digital data in response to instruction sig-nals in connection with digital memory means, said device comprising: a plurality of storage registers for storing data, each storage register having associated therewith a register multiplexer having at least two preselectable input channels for controlling access to its associated storage register, said storage registers being interconnectable through said register multiplexers in at least the following manner: a) in a single sequence of serial registers having an input and an output; and b) in a serial loop of storage registers wherein the output of the last one of said storage registers is coupled to the associated multiplexer of the first one of said storage registers.

_j OREA MPI

11. The device according to Claim 10 wherein said storage registers are each operative in response to said instruction signals for selectively holding data currently in said storage register against erasure.