CA2248711A1

CA2248711A1 - Scaleable double parallel digital signal processor

Info

Publication number: CA2248711A1
Application number: CA002248711A
Authority: CA
Inventors: Harold Blount; Alexander Tulai
Original assignee: Individual
Current assignee: Microsemi Semiconductor ULC
Priority date: 1996-03-11
Filing date: 1997-03-10
Publication date: 1997-09-18
Also published as: WO1997034226A1; DE69719221T2; US5960209A; EP0976037B1; EP0976037A1; DE69719221D1

Abstract

A distributed architecture parallel processing apparatus, includes a central microprocessor having at least one external interface connected to a similar interface of a neighboring parallel processor. The processors exchange data and control signals through the interfaces to cooperatively share in the execution of a program. An inter-processor status register in each processor maintains the current status of the processors.

Description

I
SCALEABLE DOUBLE PARALLEL DIGITAL SIGNAL PROCESSOR
This invention relates to digital processing apparatus, and more particularly to a digital processing apparatus having a distributed architecture ~ A classical Digital Signal Processor (DSP) has two major parts, namely a core architecture and the peripherals. The major blocks of the core architecture are the the Program / Data Memory; the Arithmetic / Logic Unit (ALU); the Multiplier /
Accumulator (MAC); the Barrel Shifter (BS); the Data Address Generator (DAG); the Program Address Generator (PAG); the Registers (used to hold intermediary results, addresses, and speed up access to the previous five blocks), and the buses.
Some of the peripheral blocks are the Serial Port(s); the Host Interface Port (parallel port), and Timer(s). Somewhere between these two blocks are the DMA controller; and theInterrupt(s) controller Various DSPs may use distinct ALU, MAC and BS computational blocks or may blend them into multifunctional units.
The new generation of DSPs take advantage of newer technologies allowing faster clocking of old architectures and consequently higher processing power, faster memories that allow improvements in the internal architecture of various blocks, multiple internal buses, and new peripherals.
One of the common problems associated with the traditional DSP architectures is the uneven loading of the processors in a multiprocessor design. To cope with this problem, more recently, new DSP architectures have been proposed and implemented that have parallel processing capabilities.
At the heart of their design is the concept of inter-processor communication via external interface ports, globally shared memory, and shared buses. The complexity of these designs, however, translates into extremely high cost IC implementations.
Parallel Computing (PC) increases processing power by permitting parallel processing at the routine (task) level. When a program has to execute two different routines that are independent at the data level (i.e. the data written by one routine is not read by the other routine), the two routines can be executed in parallel. This is referred to herein as macro parallelism.
Congestion can also occur at the instruction level. When a program has to execute a sequence of instructions that are independent, at data level, these instructions could be executed in parallel. Executing these instructions in parallel (herein referred to as micro parallelism) on the same processor, however, would require multiple buses and instruction words large enough to handle multiple operands.
US patent no. 5,239,654 discloses a multiprocessor system typical of the background ar referred to above, wherein a plurality of processors can execute instructions insynchronism. This patent does not disclose permit multiple processors to co-operatively share in the execution of a single instruction word.
An object of the invention is alleviate this problem.
According to the present invention there is provided digital processing apparatus comprising a microprocessor, said microprocessor comprising comprising at least one extemal interface for connection to a respective parallel like microprocessor having a similar interface, a plurality of intemal registers, an intern~l bus ~ccessing said intemal registers, and an ext~rn~l bus connectable to each said parallel like microprocessor through said at least one extem~l interface to permit the exch~nge of data and control signals. The int~rn~l registers include intern~l registers shareable with each said parallel like microprocessor. A multiplexer connects the int~rn~l bus and the or each said ext~
bus to the or each said shareable intern~l register so that said microprocessor and the or each said extPrn~l like microprocessor can co-operatively share in the execution of a single instruction represented by a large instruction word. A inter-processor status register m~ the current status of the microprocessor and the least one parallel like micropr~cessul .
The invention handles macro parallelism by allowing a processor to start a task (and be notified on its completion) on a neighboring parallel processor.
The invention can also handle parallel procçscing of single instruction words (micro parallelism) without the need for multiple buses and the like. Instead of requiring a complex processor, the invention locks together multiple simpler processors to achieve a ~ENDED S~Er ~ CA 02248711 1998-09-10 similar result, and at the same time obtain the benefit of the power of multiple processing units. When multiple processors are locked together, the instructions they execute can be seen as the equal length segments of a Large Instruction Word (LIW). Depending on how many processor are locked together, the length of the Large Instruction Word could vary.
The invention thus pemmits the h~ntlling of micro parallelism through LIW, as well as macro parallelism through Parallel Computing.
The invention thus employs a processor interface and changes to the architecture of a DSP
that make both Parallel Computing and Large Instruction Word possible. The new distributed processing architecture is particularly suited for the case when the processors share the silicon space of a single integrated circuit.

The invention still further provides a method of executing a program wherein at least two parallel processors are interconnected through an PxtPrn~l interface so that they can exchange data and control signals to cooperatively share in the execution of a program, each processor having intemal registers, characterized in that at least one saidintPrn~lregister is shareable with a said parallel processor, and that in each said processor an intern~l bus and an extemal bus are connected to a said parallel processor through said interface whereby two said processors can access said shareable registers by multiplexing said intemal and extemal buses so that said parallel processors can co-operatively share in the execution of a single instruction represented by a large instruction word, and the status of the cooperating processors is m~int~ined in a inter-processor status register provided therein.
It should be understood that each processor in a multi-processor configuration has the potential to be a master/and or slave. For example, if processor A starts a job on processor B, A and B are in a master-slave relationship. However, B can "sub-contract" some part of the job to C, in which case B and C are in a master-slave relationship. B is a slave to A, but a master to C. At a different moment in time, which is software dependent, this relationship can totally reverse itself.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described in more detail, by way of example, only with lcçelcnce to the accolllp~lying drawings, in which:-A~qE~DE

-.a~ a Figure 1 is a diagrammatic illustration of a microprocessor with an ext~l interface in accordance with the invention;
Figure 2 shows the organization of the inter-processor status register;
Figure 3 shows the control and status lines of the interface in more detail;

AMENDED SHEET

CA 02248711 1998-09-lO

Figure 4 shows the internal registers and bus structure of a processor in accordance with the invention;
Figure S illustrates conflict resolution in a multiple processor system; and Figure 6 is a more detailed diagrarn expl~ining the architecture of a processor in accordance with the invention.

Referring to figure 1, the central digital signal processor I includes a prograrn / data memory; an arithmetic / logic unit (ALU); a multiplier / accnmlll~tQr (MAC); a barrel shifter (BS); a data address generator (DAG); a program address ge~ lor (PAG);
registers for holding intermediate results, addresses, and speed up access to the previous five blocks); and buses. As these components are conventional, they are not illustrated in the drawings and will not be described in detail.
The processor 1 also includes an interprocessor register 2 (IPSR) described in more detail with reference to Figure 2 and right and left register banks 3, 4, and central register 13.
Right and left dual Port data memory 12, 13 provides a memory window accessible both to the central processor and the associated neighboring parallel processor.
The central processor 1 has right and left external interfaces 5, 6 for col..."u..icating with respective parallel processors 7, 8 in a syrnmetrical s-~h.on ç, referred to as the Left processor and Right processor. The external intçrf~ee is presented in Figure 1. The Left and Right Processors are similar microprocessors to the central processor and are not illustrated in detail.
In the above scheme, the processor 1 is viewed as the 'Middle processor', having a similar left and a right neighbor presenting and controlling an identical interface.
The external signals are separated in three main groups of signals 9, 10, 11 as shown in more detail in Figure 3, namely the Control and Status Lines - eight lines, (6 outgoing and two bi-directional as shown in more detail Figure 3 for details); bi-directional Data Bus Lines; the number of which is implementation dependent (16 in one embodiment); and bi-directional Register Select Lines, the number of which is implement~tion dependent (3 in one embodiment).

W O 97134~26 - PCT/CA97/00164 As shown in Figure 1, two adjacent processors share data through a dual port RAM 12, 13, mapped in the data memory space of both processors, and via two banks of dual port registers (~cces.ced from both internal Data Bus and external Left or Right Data Bus), each processor with its own set (see Figure 3).
The central processor has an Inter-Processor Status register (IPSR) 2 that describes its state and functional mode with respect to the left and right processors. The IPSR register is shown in Figure 2.
There are four possible states and thus two bits needed to describe:
1. Independent 2. Parallel Computing (PC) 3. Large Instruction Word (LIW) 4. Suspended There are 2 possible modes (1 bit needed):
~ Master ~ Slave A central processor can be in a Master mode with respect to both neighboring processors, or a Master mode with respect to one and a Slave mode with respect to the other, but it can never be in a Slave mode with respect to both (left and right) processors simultaneously .
Any central processor can interrupt a left or/and right processors (status and interface line condition permitting) and bring it/them into a Master-Slave mode in which the Slave does work on behalf of the Master.
Depending on the state and mode bits in the status register 2, a processor has various access rights to the dual port data memory window and to the register bank of the neighboring processor(s). Table 1 describes the access rights and the functionality of a processor based on the state and mode bits configuration. In Table I, the ' Symmetry state' column is used to label those situations where a symmetric situation could occur.

Table 1: Access ri~ht~ and f -: ~ralitv based ~n status bits.
LeH Side Bits Right Side Bits Access Executed pro~rarns Symm.
State Mode State Mode s~ate Indep. ¦ Masler Indep. ¦ MasterReslricted to itsown re~s and Executes its own job NO

CA 02248711 1998-09-lO

data space Indep. Master PC Master Reslricted lo its own regs. and E~ccutes its own job. YES
dat~ space Started 11 job on righl processor Indep. Masur PC Slavc Own regs. and dDta space + Execules ll job on behalt'of Right YES
RDMWA' proc.
Indep. Masler LIW Maslcr Own regs. and data+ Execules own job locking Righl YES
RRA2 + RDMWA proc.
Indep. Masler LIW Slave Own regs. and data + Execules locked by YES
RRA + RDMWA Righl proc.
Indep. Master Suspend Slavc Ownregs. and data+ PC is trozen while YES
RRA + RDMWA NOPs arc execuled PC Masler PC Master Reslricted to its own regs. and Execulcs fls own job NO
dala space Slarted 1I jobs on lefl & righl processors.
PC Masler PC Slavc Own regs. and data + Execules job on behalf of Righl YES
RDMWA proc.
Slarted 11 Job on lefl.
PC Masler LIW Masler Own regs. and dala + Execules its own job locking YES
RRA Righl proc. Started job on Lefl proc.
PC Masur LIW Slave Own regs. and data + Execu~es locked by YES
RRA + RDMWA Right proc. Started job on Lcfl proc.
PC Master Suspend Slave Own regs. and data + Slarted pb on LeR YES
RRA + RDMWA proc. Suspended while locked by righl PC Slave LIW Master Own regs. and data + Executes job on behalf of Lefl YES
Ll:)MWA + RRA proc.+
locking Right proc.
PC Slave Suspend Master Own regs. and data + Suspended while YES
LDMWA + RRA locking Right proc.
Now execulesll job tor Lefl processor.
LIW Masler LIW Master Own regs. and data + Execules ils own job NO
LRA3 + RRA locking bolh Lefl and Righl procs.
LIW Master LIW Slave Own regs. and data + Executes on behalf of and locked YES
LRA + RRA + hy '. RDMWA - Right processor Data Memory Window Access ~. RRA - Right processor Register Access 3. LRA - Left ,~ )cessor Register Access CA 022487ll l998-09-lO

RDMWA Ri~ht, locl;in~ Left.
Suspend Master Suspend Slave O~n re_s. and data + While in the above state has YES
LRA + RRA ~ received (and passed to Le~t) the - RDMWA Suspend command The state and mode bits in the IPSR 2 uniquely detennine the condition of the extemal interface status line. The mapping of the state and mode bits onto external status lines is given in Table 2.

Table 2: Internal status ~its to extemal status line mapping Lefl Side Bits Right Side Bits LeR state lines Right statc lincs Symm.
State Iv ode Stale ~v ode State Mode State Mode statcs Indcp. Master Indep. Master Indep, Master Indep. Master NO
Indep. Master PC Master PC Maslcr PC Mastcr YES
Indep. Master PC Slave PC Slave PC Slavc YES
- Indep. Master LIW Master LIW Master LIW Master YES
Indep. Master LIW Slave LIW Slave LIW Slave YES
Indep. Master Suspend Slave Suspend Slave Suspcnd Slavc YES
PC Master PC Master PC Master PC Master ~0PC Master PC Slave PC Slave PC Slave YES
PC Masler LIW Master LIW Master LIW Master YES
PC Master LIW Slave LIW Slave LIW Slave YES
PC Master Suspcnd Slave Suspend Slave Suspcnd Slavc YES
PC Slave LIW Masler LIW Slave LIW Slave YES
PC . Slave Suspend Masler Suspend Slave Suspend Slave YES
LIW Master LIW Masler LIW Masler LIW Masler NOLIW Master LIW Slave LIW Slave LIW Slave YES
Suspend Master Suspend Slave Suspend Siave Suspend Slave YES

The possible actions of a processor with respect to the left/ right processors, based on its le~tlright status bits and external status lines and left/right processor status lines are given in Table 3.

Table 3: Possibl~ actions of a processor based ~n its status bits and externa status lines Ri~ht Side Bits Ri~hl Side Lines Ri~hl Slalus Lines Possible actions - State Mode State Mode State Mode Indep ¦ Master Indep. ¦ Master Indep. ¦ Master Force Ri~ht to PC

CA 02248711 1998-09-lO

W 0 97/34~26 - PCT/CA97/00164 PC PC l:orce RighttoLlW
LIW
Indep. Master Indep. Master LIW Master F~rce Ri~ht to PC
PC
Indep. M3ster PC Slave Indep. Master Force Right to PC
PC Forc~ Ri~ht to LIW
Indep. Master LIW Slave Indep Masler Force Right to PC
PC Force Ri~ht to UW
PC Slave PC Slave PC Master Report t~sl; completed LIW Master LIW M~ster LIW Slave Exit LIW state (unlocl;) As will be a~ , there are four possible states and two possible modes. From all eight possible combinations only one is invalid, (lndependent, Slave) combination.
The two pairs of status bits in the IPSR 2 clet~rmin~. what is the relation of the processor with respect to the processor on that side. Only a combination of both sides status bits could deterrnine the real state of the processor.
Whenever a processor enters a Slave mode, almost all its registers get saved, such that the work can be resumed when the Master mode is re-entered. This can occur quickly with the use of shadow registers in this embodime~S
The situation that arises in various valid combinations will now be described, although it will be apparent to one skilled in the art that other valid combinations are possible.
1. (Independent, Master) A processor is in this state when the status bits on both sides of the IPSR 2 show it in this state. In this case the external status lines will show the same thing (see Table 2).
In this state a processor executes code on behalf of itself and can access only its own registers and data memory.
2. (Parallel Computing, Master) When one side of the IPSR register 2 shows this configuration and the other side shows the Independent-Master case, the central processor I is in a Master-Slave relationship with the processor on that side, has already started a parallel task on the processor on that side, and can check on the state of that task by polling the corresponding Task Completed W O 97/34~26 - PCTtCA97/00164 bit in IPSR 2 or by executing a Wait until Task Completed on Left/Right instruction. In this last case the processor will stay idle until the corresponding bit is set.
In this state the processor has the sarne access right as in (Independent, Master) state.
3. (Large Instruction Word, Master) When one side of the IPSR register 2, shows this configuration (while the other side shows the Independent-Master case), the central processor 1 is in a Master-Slave relation with the processor on that side, and has already locked to that processor to so as to process Large Instruction Words in parallel. The processor that has been locked can, in turn lock to another one, and so on in c~cc~c~e. Whenever the LIW-Master processor jumps as a result of a control instruction ~conditional/l.nronditional br~nt~ s or looping instructions,) the take-the-branch condition is passed as a signal through the interfaces to all the processors locked in the chain. In this way, synchronized jurnps are ensured, making ~c~i~t~d loop executions possible. When the processor executes a Release Left/Right processor instruction, the loc~ed processor becomes unlocked and the Master can enter a state ~epçn~l~nt on the status bits on the other side of IPSR 2.
In this state, the processors have access not only to the dual port data memory window se~dlh~g them from the Slave but also to the correspondent register bank of processor locked. The instruction set will be extended with instructions capable of ~ccessing the left or right processor.
4. (Suspended, master) Only one side of a processor can show this combination of state and mode bits. However, the status bits on the opposite side of IPSR determine what the processor really does.
If the opposite status bits show (PC, Slave), the processor in fact is not suspended but is rather executing a parallel task forced by the processor on that side. Before being forced into a (PC7 Slave) situation the processor was in a (LIW, Master) situation. When the switch occurred the processor had to suspend LIW activity itself and the processors locked up with it.
If the opposite status bits show (LIW, Slave), the processor is in fact suspended. In this situation the processor has frozen its own PC and executes NOP instructions. Before being in this state the processor was in a (LIW, Slave) situation with one of its sides and CA 02248711 1998-09-lO

in a (LIW, Master) situation with the other side. The processor it has received a SUSPEND signal from the Slave side that it has past to the processor on the Master side.
In this way, when the head of LIW link is suspended, all the processors in the chain will get suspended.

5. (Parallel Computing, Slave) When one side of the IPSR register 2 shows this configuration (while the other side shows the Independent-~faster case), the processor is in a Slave-Master relation with the processor on that side, on behalf of which it executes a task. The starting address of the task is passed to the processor when the Slave-Master relation has been established. At the end of the task, the processor executes an End-Of-Task instruction that gets locked in the corresponding status bits of the Master. When the End-Of-Task instruction isexecuted, the processor enters a state that is dependent on the status bits on the other side of the IPSR 2.
In this state, a processor has access to its own registers and data memory space and to the dual port memory window into the data space of the Master processor.

6. (Long Instruction Word, Slave) When one side of the IPSR register shows this configuration, (while the other side shows the Independent-Master case), the processor is in a Slave-Master relation with the processor on that side. In this situation, the processor still has the ability to put itself into a Master situation with respect to the processor on the other side.
As mentioned before, when multiple processors run in a locked state, synchronism is essential. All processors should have the same master clock and they all should take (or not take) a conditional branch based on the decision of the Master processor. In this case, the Master drives the Jump interface line and all the Slaves in the chain execute a Branch on External Decision instruction that takes the jump based on the state of the line.
A processor locked in a Slave mode has access not only to its own registers and data memory space but to the register banks of the other neighboring processor its running locked with and the dual port data memory windows into their data space.

7. (Suspended, Slave) W O 97134i26 PCTtCA97/00164 ln this case the processor that was locked executes only an NOP instruction, freezes the Prograrn Counter (PC), and waits for the Release signal.
The internal register access and structure of a central processor will now be described ~ with reference to Figure 4.
Data memory bus 20 is connected through multiplexers 21 to Left, Middle and Right registers 22, 23, 24 which in turn are connected through muliplexer 25 to procec~ing unit 26 including the ALU/MAC, BS, and DAG. Because any processor in this architecture is interruptible, almost all internal registers except for the IPSR 2 should be shadowed.
The MAC/ALU (Multiplier/Accumulator)architecture is shown in more detail in Figure 6, in which for brevity only the input data flow is shown. Left DMD bus 21 is connected through the i~ race to corresponding bus in the left processor 8. In operation, data flows from the left hand processor through MUX 22 to registers ALH, ALL (Acc~lm~ tor Left - High, Accumulator Right Low) from where it passes through Mux 23 to Multiplier and Accumulator and logic circuit 24, which is connected to the right barrel shifter 25.
Similarly, data from the right processor 7 arrives over the right DMD bus 26 and passes through Mux 27, registers ARH, ARL, and Mux 28 to MAC unit 24. Internal bus 29 is connected through Mux units 30, 31, 32, 33 to pairs of registers ALH, ALL; ARH, ARL;
AAH. AAL; ABH, ABL connect~-l through Mux 34 and left barrel shift register to MAC
unit 24. It will be ap~alellt that this arrangement allows instruction words to be shared between the adjacent processors.
When a processor becomes slave to another processor, it uses the shadow registers to preserve the last contents of its registers as a Master. The shadow registers are back-propagated to the main registers when the processor re-enters a master mode (with respect to both left and right processor).
For all three computational units (ALU, MAC and BS) a register relationship as presented in Fi,~ure 4 is valid.
The ALU and the MAC require two operands (usually) while the BS requires only l.Depending on the architecture, the DAG requires I to 3 input registers. The set of registers available to a computational block is symmetrically divided into three groups, narnely a set of n registers that can be loaded from their own DMD bus or some other CA 022487ll l998-09-lO

local bus, and two sets/banks of m registers that can be accessed not only from the local buses but from the adjacent (left or right) processors.
The access to an internal register from the left or the right processor, in a symmetrical arrangement, is a significant aspect of the present invention. Tltis change facilitates the taking advantage of the Large Instruction Word functlonal state. When one DSP can perform an operation on the already existent registers, the neighboring ~SPs can use the additional buses to read/write access other internal registers. The dual port memory is 3 used in this case to çnh~nce the access of the neighboring DSPs to the data space of the middle~rocessor.
The m and n values should be relatively small (1 and 2 in one embodiment) because otherwise the propagation delays through various levels of multiplexing could add up to significant values. The totality of all registers accessible from the left (or right) processor forms the bank of registers used for communicating with the left (or right) processor.
Because of the symmetry of the register dlstribution, similar banks of registers are available in the left and right processor, and as such, in any two processor LIW interaction two banks of registers will be always available for co~ ication and speeding up each others computations when needed.
The instruction set of a processor will be enh~nced with instructions capable of addressing the left or right processor. These instructions are operational and useful only when a processor functions locked with another processor (in LIW state).
Tables 4 to 19 present the state and mode transition. It should be noted that due to the symmetrical properties of the architecture, the cases that are not covered can be derived from those that are given.

Table 4: Initial s~atus bits LeR: Indep Masl- r Right: Indep Mast Action Left status bitsRight status bits Regs Left state lines Ri~ht state lines State Mode Stnte Mode state State Mode State Mode Int: torce Riohtto PC Indep. MasterPC Master Saved PC Master PC Master Int.: force Right to LIW Indep. MasterLIW Master Saved LIW Master LIW Master Right: Enter PC Indep. MasterPC . Slave Saved PC Slave PC Slave Right: Enter LIW Indep. Master LIW Slave Saved LIW Slave LIW Slave Table 5: Initial status bits Lefl: Incep. Master Ri--ht: PC Master Action LeR status bitsRight status bits Regs LeR state lines Right state lines State ModeState Mode stale State Mode State Mode Int.: force LeR to PC PC MasterPC Master Saved PC Master PC Master Int.: foroe LeR to LIW LIW MasterPC Master Saved LIW Master LIW Master Right: tas~ completed hldep. MasterIndep. Master Saved Indep. Master Indep. Masoer Lefl: Enter PC PC Slave PC Master Saved PC Slave PC Slave LeR: Enter LIW LIW Slave PC Master Saved LIW Slave LIW Slave Table ': Initialstatus bits Lefl: Indep Mas~er Righ: PC Slave Action Lefl status bitsRight status bits Regs Lefl stale lines ~ Right state lines State ModeState Mode state State Mode State Mode Int.: force Lefl to LIW LIW MasterPC Slave Saved LIW Slave LIW Slave Int.: force LeR to PC PC MasterPC Slave Saved PC Slave PC Slave Int.: tasl; completed Indep. MasterIndcp. Master Savcd Indep. Master Indep. Master -Table 7: Initial status bits Le i: Indep. MasterRight: UW Masler Action Lefl status bitsRight status bits Rcgs LeR state lines Right state lines State ModeState Mode state Sla~e ModeState Mode Int.: force LeR to UW LIW MasterLIW Master Saved LIW Master LIW Master Int.: force Lefl to PC PC MasterLIW Master Saved PC Master LIW Master Right: exitLlW Indep. MasterIndep. MasterSaved Indep. Master Indep. Master Lefl: enter PC PC Slave Susp. Master Saved PC SlaveSusp. Slave Table 8: Initial status bits Le 't:lndep. Master~ight: LIW Slave Aaion Let't status bitsRight status bits Regs Lefl state lines Right state lines State ModeState Mode state State ModeState Mode Int.: foree Lefl to LIW LIW MasterLIW Slave Saved LIW Slave LIW Slave ~ Int.: foree Leh to PC PC MasterLIW Slave Saved LIW Slave LIW Slave Right: exit LIW Indep. MasterIndep. Master Saved Indep. Master Indep. Master Right: suspend Indep. Master Susp. Slave Saved Susp. Slave Susp. Slave Table 9: Initial status bits Lefl: 1~ Master Right: Suspend Slave Action Lefl status bits Rigllt status bits Regs LeR state lines Right state lines State Mode State ModestateState Mode State Mode Right exitSuspend Indep. Master LIW SlaveSavedLIW. Slave LIW. Slave Table 10: Initial status bits LeR: PC Master Ri~ht: PC Master CA 022487ll l998-09-lO

Action Lefl status bitsRight status bitsRegs Lefl state lines Right state lines Statc ModeStatc Modc statcState Mode State Modc Lefl: task completed Indep. MacterPC Master Saved PC Macler PC MasterRi~ht: taci; completed PC Masterh~dep. Master Saved PC Master PC Master Table I 1: Initial status bitc L_R: PC MacterR Bht: PC Slave Action Lefl status bitsRight status bitcRegs Lefl state lines Righl state lines - State ModeState Mode stateStale Mode State Mode Lefl: tasx completed Indep. MasterPC Slsve Saved PC Slave PC Slave Int.: task completed PC MasterIndep. Master Saved PC Master PC Master Table 12: Initial status bits Le t: PC MasterRi--ht: LIW Master Action Lefl status bits Right status bits Rcgs Lefl state lines Right state lines State Mode State Mode state State Mode State Mode LeR: task completed Indep. Master LIW Master Saved LIW. Masler LIW. Master Int.: exit LIW (unlock) PC Master Indep. Master Saved PC Master PC Master -Table 13: Initial status bits Lcft: PC Msctcr Ri ~ht: LIW Slave Aclion LeR status bitsRight status bits Rcgs Leh state lines Right state lines State ModeState ModestateState Mode State Mode Lefl: tasl; completed Indep. MacterLIW SlaveSaved LIW. Slsve LIW. Slave Right: suspend PC MasterSusp. Slave SavedSusp. Slave Susp. Slave Ri~ht: exitLlW PC MasterIndep. Master Saved PC Master PC Master Table 14: Initial status bitsLefl: PC Master Ri~lt: Suspend Slave Action Lefl status bitc Right status bits Regs Lefl state linec Right state lines State Mode Slate ModestateState ModeState Mode Right: exit Suspend PC MaslerLIW Slave Saved LIW Slave LIW Slave Lefl: tasl;completed Indep. MasterSusp. Slave Saved Susp. Slave Susp. Slave Table 1~: Initial statusbitsL_fl: PC Slave Ri ~ht: LIW Master Action Lett status bits Right status bits Regs Lefl state lines Right state lines Statc ModeState ModestateState Mode State Mode Int.: tasl; completed Indep. MacterIndep. Macter Saved Indep. Master Indep. Master hlt.: exit LIW (unlock) PC SlaveIndep. MasterSaved PC Slave PC Slave Table 16: Initial status bitsLe ~: PC Slavc Rio n: Suspend Macter Action Lel~ status bitc Kight status bits Regs Lelt state lines Right state lines State Mode State Mode state State Mode State Modc CA 02248711 1998-09-lO

Int. tasl; completed ¦ Indep. Master¦ LIW Master ¦ Saved ¦ LIW Master ¦ ILIW Master Table 17: Initial statusbitsLei: LIW MasterFi~ht: LIW Master Action LeR status bitsRight status bits Regs Lcft state lines Right state lines State ModeState Modestate Stale ModeState Mode Int.: exit LIW Lefl Indep. Master LIW Master Saved LIW. Master LIW. Master Int.: exit LIW Right LIW MasterIndep. Master Saved LIW Master LIW Master Table 18: Initial status bitsLefl: LIW MasterRight: LIW Slave Action Lefl stalus bitsRight status bits Regs Lefl state lines Right state lines State ModeState Modestatc State ModeState Mode Int.: exit LIW Lefl Indep. Master LIW. Slave Saved LIW. Slave LIW. Slave Right.: cxit LIW Indep. MasterIndep. Master Savcd Indep. Mrstu Indep. Master Right: suspend Susp. SlaveSusp. MaslerSaved Susp. SlaveSusp. Slave Table 19: Inilial stalus bitsLcft: uspcnd Master Righl: Suspend Slave Aclion Lcfl status bitsRight status bits Regs Lcfl state lines Right statc lincs State Mode Statc Mode statc Statc Modc State Modc Right: cxit Suspend LIW Master LIW Slave Savcd LIW SlavcLIW Slave The following table present all the software comm~nrl~ required to perform the various actions described in the previous tables.

Table 20 Command Descnplion XTR addrcss eXccule Task starting at 'address' on Right processor XTL addrcss eXccute Task starting at 'address' on Lefl processor LCKR address LoCK Right proccssor (force right to LIW state) staning at 'address' LCKL addrcss LoCK Lcfl processor (force lefl to LIW state) staning at 'address' EOT End Of Task (reported to the processor on the slave side) RELR RELease (unlock) Right processor RELL RELease (unlock) Lcfl processor BED address Branch on Extemal Decision WTCL Wait for Task Completed on Left processor WTCR Wait for Task Completed on Right processor In one embodiment, the first four instructions in Table 20 (XTR,XTL,LCKR,LCKl) are blocking. This ensures that if the processor they are trying to bring to a Master-Slave relation is in a state that does not permit the desired state transition, then the processor ~,vill enter a state where it will keep on trying to execute the mentioned instructions. In a dirr~ embodiment, these instructions can be made non blocking. In this situation, the program needs code that is compatible with a sl~ece~cful attempt and code that is compatible with a failed attempt.
Besides the specific instructions given in the table, some of the usual instructions of a DSP are P~t~n-led to handle external register bank access rights.
The instructions XTE~,XI L,LCKR,LCK require at least two cycles to execute. During the first cycle, the processor executing one of these instructions will try, based on its own status bits and other processor status lines, to force a neighboring processor into a Slave situation. If this attempt is s~lccçs!~rul, during the second cycle an address will be passed over the Data Bus lines to the other processor. ln many cases, a third cycle is required for the second processor to fetch the instruction found at ~e address passed.
A conflict arises when two processors attempt to put each other in a Master-Slave relation simultaneously. One solution to this situation is to always give pr;ority to the processor on the right side of the couple. To solve this conflict, in one embodiment, an extra interface line is added (the ACKnowled~T ent line) and an Arbitration block that is biased to the right. This arrangement is shown in Figure 5, where central processor 1 is shownconnPct~d to Right and Left processors 7, 8.The IPSR 2 of eachprocessor has an ~I,illdlion block 30.
Where the software can guarantee that such conflicts do not occur, the Arbitration block and the additional interface line are not required.
The present invention thus offers a powerful technique for evenly distributing the processing power of complex applications over multiple DSPs, using Parallel Computing and Large Instruction Word methods, which can be of variable length.

W O 97/34i26 PCT/CA97/00164 Because of the proce~sing power and additional buses made available by multiple processors through this new distributed architecture method, it can be used with slower master clocks or slower memories.
The new distributed arrhitectl~re is particularly suited for the case where the processors are sharing the silicon space of the same integr;ltecl circuit.
Due to its symlTletrical plo~cllies, the distributed architecture can be easily scaled up to provide the neces~ co~ ~lional power for very complex DSP tasks even at low master clock rates or slow memory access time.

Claims

Claims:

1. Digital processing apparatus comprising a microprocessor, said microprocessorcomprising comprising at least one external interface for connection to a respective parallel like microprocessor having a similar interface, a plurality of internal registers, an internal bus accessing said internal registers, and an external bus connectable to each said parallel like microprocessor through said at least one external interface to permit the exchange of data and control signals, characterized in that said internal registers (22, 24) include internal registers shareable (3, 4, 22, 24) with each said parallel likemicroprocessor (7, 8), a multiplexer (21) connects said internal bus (20) and the or each said external bus (Left DMD bus, Right DMD bus) to the or each said shareable internal register (3, 4, 22) so that said microprocessor (1) and the or each said external like microprocessor (7, 8) can co-operatively share in the execution of a single instruction represented by a large instruction word, and an inter-processor status register (2) maintains the current status of said microprocessor (1) and said least one parallel like microprocessor (7, 8).

2. Digital processing apparatus as claimed in claim 1 or claim 2, characterized in that said microprocessor (1) include dual-ported memory (12, 13) that can be mapped into the data memory space of an adjacent said parallel microprocessor (7, 8) to provide a window between said adjacent processors (1, 7, 8).

3. Digital processing apparatus as claimed in claim 2, characterized in that said interface includes control and status lines (9), data bus lines (10), and register select lines (11).

4. Digital processing apparatus as claimed in any one of claims 1 to 3, characterized in that said inter-processor status register (2) includes for each said parallel processor a memory cell storing the processing state of the processor (State), a memory cell storing the current mode of operation (Mode), and a memory cell storing the state of completion of a current task (Task Completed).

5. Digital processing apparatus as claimed in as claimed in claim 4, characterized in that said interface (9, 10, 11) is operative to permit the exchange control and data signals to permit the parallel execution in each processor of sequences of separate instructions forming independent routines.

6. Digital processing apparatus as claimed in as claimed claim 5, characterized in that said interface (9, 10, 11) includes a jump line to send a signal to the or each cooperating parallel processor (7, 8) so that when said microprocessor (1) encounters a jump instruction, the or each said parallel processor (7, 8) also executes a jump so as to make loop executions possible.

7. Digital processing apparatus as claimed in claim 6, characterized in that said microprocessors (1, 7, 8) include an arbitration unit (30) and said interface (9, 10, 11) includes an acknowledgment line so as to permit conflict resolution between cooperating microprocessors (7, 8).

8. Digital processing apparatus as claimed in claim 7, characterized in that said microprocessors is provided on a common integrated circuit with said parallel microprocessors (7, 8).

9. Digital processing apparatus as claimed in claim 8, characterized in that said microprocessors (1, 7, 8) include internal registers that are shadowed, and arranged such that when a master processor becomes a slave to another processor the last contents of the register in the master mode are preserved in shadow memory.

10. A method of executing a program wherein at least two parallel processors areinterconnected through an external interface so that they can exchange data and control signals to cooperatively share in the execution of a program, each processor having internal registers, characterized in that at least one said internalregister is shareable with a said parallel processor, and that in each said processor an internal bus and an external bus are connected to a said parallel processor through said interface whereby two said processors can access said shareable registers by multiplexing said internal and external buses so that said parallel processors can co-operatively share in the execution of a single instruction represented by a large instruction word, and the status of the cooperating processors is maintained in a inter-processor status register provided therein.

11. A method as claimed in claim 10, characterized in that the execution of a single instruction defined by a large instruction word is shared between the cooperating micrprocessors.

12. A method as claimed in claim 10 or claim 11, characterized in that said cooperating processors are further capable of sharing the execution of a program task, each executing an independent sequence of program instructions.

13. A method as claimed in claim 12, characterized in that neighboring said processors share a common address space through a dual-ported memory.

14. A method as claimed in claim 13, characterized in that one of said microprocessors serves as a master and the or each parallel microprocessor serves as a slave.

15. A method as claimed in claim 14, characterized in that said microprocessors are synchronized over a jump line through said interface so that when the master executes a program jump, the or each slave processor executes a program jump in synchronismtherewith to permit assisted loop executions.

16. A distributed architecture parallel processing apparatus as claimed in claim 15, characterized in that said processors include internal registers that are shadowed, and arranged such that when a master processor becomes a slave to another processor the last contents of the register in the master mode are preserved in shadow memory.

17. A method of executing a program characterized in that it comprises the steps of:
a) providing at least two parallel processors, one said processor being a master and the or each remaining processor being a slave;
b) interconnecting said processors through an external interface so that they can exchange data and control signals to cooperatively share in the execution of a program;
and c) maintaining the status of the cooperating processors in a inter-processor status register provided therein.

18. A method as claimed in claim 17, characterized in that the execution of a single instruction defined by a large instruction word is shared between the cooperating processors.

19. A method as claimed in claim 17 or claim 18, characterized in that said cooperating processors are further capable of sharing the execution of a program task, each executing an independent sequence of program instructions.

20. A method as claimed in claim 18, characterized in that neighboring said processors share a common address space through a dual-ported memory.

21. A method as claimed in claim 20, characterized in that one of said processors serves as a master and the or each parallel processor serves as a slave.

22. A method as claimed in claim 21, characterized in that said processors are synchronized over a jump line through said interface so that when the master executes a program jump, the or each slave processor executes a program jump in synchronismtherewith to permit assisted loop executions.

23. Digital processing apparatus comprising:
a) a microprocessor having at least one external interface for connection to a respective parallel processor having a similar interface, said interface permitting the exchange of data and control signals to permit said central processor and one or more parallel processors to cooperatively share in the execution of a program; and b) means for maintaining the current status of said processors and said at least one parallel processor.