APPARATUS AND METHOD TO EFFICIENTLY IMPLEMENT A SWITCH ARCHITECTURE FOR A MULTIPROCESSOR SYSTEM
BACKGROUND OF THE INVENTION
1. Field of the Invention —
This invention relates generally to multiprocessor architectures, and relates more particularly to an apparatus and method to efficiently implement a switch architecture for a multiprocessor system.
2. Description of the Background Art
An effective and efficient method for implementing a multiprocessor system architecture is a significant consideration for designers, manufacturers, and users of many modern electronic systems. As system applications and demands increase in complexity, a single processor often becomes insufficient to perform the substantial variety of tasks required by many system users. Multiprocessor system architectures of various descriptions have thus become a significant area of technological development in the field of electronic systems design.
Referring now to FIG. 1, a block diagram illustrating an architecture for a multiprocessor system 110 is shown. In the FIG. 1 embodiment, multiprocessor system 1 10 includes processor 1 14, processor 1 16, processor 118, and processor 120. Each of the FIG. 1 processors 1 14, 116, 118, and 120 are coupled to, and communicate through, a system bus 112. Therefore, if a particular task performed by system 110 is relatively complicated or extensive, system 110 may thus divide and allocate portions of the task between processors 114, 116, 1 18, and 120 to facilitate and expedite performance of the task.
Modern integrated circuit fabrication techniques have progressively reduced the individual component size and corresponding physical circuit block dimensions for manufactured integrated circuits. Since the physical dimensions of the entire integrated circuit device remains relatively unchanged, the smaller components must frequently communicate over system buses that have not been correspondingly reduced in length. __
Driving these long system buses using modern components (with reduced physical dimensions) often results in bus loading and delay problems for processors 1 14, 1 16, 1 18, and 120. In fact, in some cases, the wire delay of system bus 112 may become greater than one system clock, and therefore, a critical path is induced.
Due to the foregoing problems, system bus 1 12 becomes very slow and inefficient when servicing multiple processors. The limiting factor for system 1 10 may thus become the speed of system bus 112, rather than the speed of individual processors 114, 116, 1 18, and 120. Therefore, for all the foregoing reasons, an improved apparatus and method are needed to efficiently implement a switch architecture for a multiprocessor system.
SUMMARY OF THE INVENTION
In accordance with the present invention, an apparatus and method to efficiently implement a switch architecture for a multiprocessor system is disclosed. In one embodiment, the invention includes a host processor, a digital signal processor 1 (DSP1), and a digital signal processor 2 (DSP2) __ that preferably communicate through a switch. In operation, the host processor, the DSP1 , and the DSP2 each communicate to the switch through corresponding interface sockets. In accordance with the present invention, the host processor, the
DSP1, or the DSP2 may function as a master processor to initiate either a data read cycle or a data write cycle by generating a data transfer request. Any of the remaining processors (host processor, DSP1, or DSP2) may similarly act as a slave processor to service the data transfer request. In the data write cycle, a master processor preferably sends a write request and a slave unit number to the switch, which responsively arbitrates the write request, and generates a grant signal to the master processor to authorize a write data transfer. Next, the switch creates a data transfer bridge to pass the write data to the slave processor. Then, the master processor sends an address and a data count to the switch, which responsively stores the data count in an internal data counter.
The switch then receives the write data from the master processor and temporarily stores the received write data into an internal FIFO. If the slave processor is ready to accept the write data, then the switch initially sends the address, and subsequently sends one unit of the temporarily stored write data from the FIFO to the slave processor. The switch then decrements the data counter to monitor the amount of write data that remains to be transferred to the slave processor. The switch then continues to transmit units of data from the FIFO to the slave processor. When the current data count stored in the data counter becomes equal to zero, then all the write data has been transferred to the slave processor, and the switch generates a termination signal to end the data write cycle.
In the data read cycle, the master processor preferably sends a read request and a slave unit number to the switch, which responsively arbitrates the read request and generates a grant signal to the master processor to authorize a read data transfer. Next, the master processor sends an address and a data count to the switch, which responsively stores the data count in the internal data counter. __
When the slave processor is ready, the switch then receives the read data from the slave processor and temporarily stores the received read data into the internal FIFO. When the master processor is ready to accept the read data, then the switch sends one unit of the temporarily stored read data from the FIFO to the master processor. The switch then decrements the data counter to monitor the amount of read data that remains to be transferred to the master processor. The switch then continues to transmit units of data from the FIFO to the master processor. When the current data count stored in the data counter becomes equal to zero, then all the read data has been transferred to the master processor, and the switch generates a termination signal to end the data read cycle. The present invention thus efficiently and effectively implements a switch architecture for a multiprocessor system.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating an architecture for a multiprocessor system;
FIG. 2 is a block diagram for one embodiment of a multiprocessor __ system, in accordance with the present invention;
FIG. 3 is a block diagram for one embodiment of the switch of FIG. 2, in accordance with the present invention;
FIG. 4 is a signal table corresponding to the switch of FIG. 3, in accordance with the present invention;
FIG. 5 is a detailed block diagram for one embodiment of the switch of FIG. 2, in accordance with the present invention;
FIG. 6 is a block diagram for one embodiment of the host interface socket of FIG. 3, in accordance with the present invention;
FIG. 7 is a signal table corresponding to the host interface socket of FIG. 6, in accordance with the present invention;
FIG. 8 is a block diagram for one embodiment of the DSP interface sockets of FIG. 3, in accordance with the present invention;
FIG. 9 is a signal table corresponding to the DSP interface sockets of FIG. 8, in accordance with the present invention;
FIG. 10 is a block diagram tracing a basic write pipeline for a data write cycle, in accordance with the present invention;
FIG. 1 1 is a timing diagram showing exemplary waveforms for a data write cycle, in accordance with the present invention;
FIG. 12 is a flowchart of method steps for one embodiment to perform a data write cycle, in accordance with the present invention;
FIG. 13 is a block diagram tracing a basic read pipeline for a data read cycle, in accordance with the present invention;
FIG. 14 is a timing diagram showing exemplary waveforms for a data read cycle, in accordance with the present invention;
FIG. 15 is a flowchart of method steps for one embodiment to perform a data read cycle, in accordance with the present invention; and
FIG. 16 is a block diagram for one embodiment of a multiprocessor system, in accordance with the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention relates to an improvement in electronic processor architectures. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is — provided in the context of a patent application and its requirements. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown, but is to be accorded the widest scope consistent with the principles and features described herein.
The present invention comprises an apparatus and method to efficiently implement a switch architecture for a multiprocessor system including a switch device, a plurality of processors, and corresponding interface sockets. Each system processor communicates with the other system processors through the switch device to perform data write operations from a master processor to a slave processor, and also to perform data read operations from a slave processor to a master processor. In multiprocessor systems having more than three processors, the present invention permits simultaneous multiple data transfers between any two pairs of system processors.
Referring now to FIG. 2, a block diagram for one embodiment of a multiprocessor system 210 is shown, in accordance with the present invention. Although multiprocessor system 210 may readily be implemented in any appropriate and compatible electronic device, in the preferred embodiment, multiprocessor system 210 is preferably part of an encoder device that encodes data, including audio data, received from a data source. The encoder device (multiprocessor system 210) may then provide the encoded data to a program destination, such as a recordable
digital video disc device for storage and subsequent playback by a system user.
The FIG. 2 embodiment includes a host processor 216, a digital signal processor 1 (DSPl) 224, and a digital signal processor 2 (DSP2) 232 that preferably communicate through a switch 212. In operation, host processor 216, DSPl 224, and DSP2 232 each communicate directly to __ switch 212. For example, host processor 216 and switch 212 communicate bidirectionally through bus 218, host interface socket 214, and bus 220. Similarly, DSPl 224 and switch 212 communicate through bus 226, DSP 1 interface socket 222, and bus 228. Further, DSP2 232 and switch 212 communicate bidirectionally through bus 234, DSP 2 interface socket 230, and bus 236. Switch 212 may thus advantageously receive various data from a source processor (host processor 216, DSPl 224, or DSP2 232), and then responsively relay the received data to a selected destination processor (host processor 216, DSPl 224, or DSP2 232).
The FIG. 2 embodiment thus provides an architecture that avoids the problems discussed above in conjunction with multiprocessor system 1 10 of FIG. 1. Design independency is achieved by connecting each FIG. 2 processor to an independent port on switch 212 via a separate interface socket. System designers may thus design and test each processor independently. Furthermore, no direct connections exist between any of the FIG. 2 processors. Therefore, system timing analysis is significantly facilitated since only a single timing check is typically required for each FIG. 2 processor. The bus speed of the FIG. 1 system bus 1 12 is substantially increased in the FIG. 2 embodiment due to the shortened bus length and reduced bus loading found in multiprocessor system 210. The FIG. 2 architecture also advantageously exhibits increased circuit modularity. Each FIG. 2 processor circuit block is connected to an independent port on switch 212. Therefore, any of the FIG. 2 processor circuit blocks may readily be replaced or removed from multiprocessor system 210 without affecting the operation of the remaining FIG. 2 processor circuit blocks.
Referring now to FIG. 3, a block diagram for one embodiment of the FIG. 2 switch 212 is shown, in accordance with the present invention. FIG. 3 depicts switch 212, host interface socket (HSK) 214, DSP 1 interface socket (DSK1) 222, and DSP 2 interface socket (DSK2) 230. Also shown in the FIG. 3 embodiment are the respective interface signals that pass between switch 212 and HSK 214, DSK1 222, and DSK2 230.
In the FIG. 3 embodiment, HSK 214, DSK1 222, and DSK2 230 are preferably implemented using an identical or substantially similar configuration to simplify and facilitate interfacing various processors with switch 212. Each respective set of interface signals between switch 212 and HSK 214, DSK1 222, and DSK2 230 are therefore also preferably identical or substantially similar. The interface signals shown in FIG. 3 are further discussed below in conjunction with FIGS. 4-5 and 1 1- 15.
Referring now to FIG. 4, a signal table 410 corresponding to the FIG.
3 switch 212 is shown, in accordance with the present invention. The FIG.
4 signal table 410 describes a set of interface signals 412 through 432, including an interface signal name (corresponding to FIG. 2), an interface signal direction (input or output), and an interface signal description. The timing and functionality of the interface signals described in FIG. 4 are further discussed below in conjunction with FIGS. 5 and 1 1- 15.
Referring now to FIG. 5, a detailed block diagram for one embodiment of the FIG. 2 switch 212 is shown, in accordance with the present invention. In the FIG. 5 embodiment, switch 212 includes switch arbitration logic 510, switch control logic 512, and switch data path 514. Also depicted are data counter 516 in switch control logic 512, and first-in first-out memory (FIFO) 518 in switch data path 514.
In operation, switch arbitration logic 510 receives a request 414 (FIG. 4) for either a read data transfer or a write data transfer between a master processor (host processor 216, DSPl 224, or DSP2 232) and a slave processor (host processor 216, DSPl 224, or DSP2 232) . The master
processor initiates the data transfer by generating the request 414 to switch 212 and by also sending a unit number 412 to identify the slave processor. Switch 212 then checks a busy signal 430 (FIG. 4) to determine whether the designated slave processor is free to perform the requested transfer. If the slave processor is available, then switch 212 sends a grant signal 416 (FIG. 4) to the master processor to authorize the data transfer. The grant __ signal 416 may also be generated if an ignore signal 432 is asserted by the master processor.
Switch control logic 512 preferably receives a data count from the master processor to indicate the number of data units being transferred. The data count is loaded into data counter 516 which is then decremented as each data unit is transferred. Switch control logic may then generate a transfer termination signal when the data count in data counter 516 reaches zero. Implementing data counter 516 centrally within switch 212 significantly reduces the complexity of the switch interfaces required within each processor (host processor 216, DSPl 224, and DSP2 232). Switch control logic 512 also preferably generates control signals for operating FIFO 518.
In response to the grant signal 416, switch data path 514 creates a data transfer bridge connecting the master processor and the slave processor. FIFO 518 temporarily stores the transferred data to maximize performance of switch 212. For example, during a write cycle, if the slave processor is slow or delayed while saving the transfer data into its internal memory, then switch 212 may advantageously store the transfer data into FIFO 518 until the slave processor becomes ready to accept more transfer data. In addition, if FIFO 518 becomes filled to capacity with transfer data, then switch 212 may notify the master processor to temporarily halt the transmission of additional transfer data until FIFO 518 regains storage capacity. Switch 212 may thus utilize FIFO 518 to effectively coordinate the data transfer process by dividing the data transfer process into two separate steps. Furthermore, the physical distance and the corresponding signal propagation time between the master processor and the slave
processor are divided in half to permit a system clock to function at twice the rate of similar data transfers performed directly from master processor to slave processor.
Referring now to FIG. 6, a block diagram for one embodiment of the
FIG. 3 host interface socket (HSK) 214 is shown, in accordance with the __ present invention. In the FIG. 6 embodiment, host interface socket 214 preferably includes electronic circuitry that receives interface signals generated from switch 212 (FIGS. 3 and 4) and responsively generates a set of corresponding host processor signals to host processor 216. The functionality of the host processor signals is further described below in conjunction with FIG. 7.
Referring now to FIG. 7, a signal table 710 corresponding to the FIG. 6 host interface socket 214 is shown, in accordance with the present invention. The FIG. 7 signal table 710 describes a set of host processor signals 712 through 748, including a host processor signal name (corresponding to FIG. 6), a host processor signal direction (input or output), and a host processor signal description. Many of the host processor signals of FIG. 7 directly correspond to the interface signals (FIG. 4) between host interface socket 214 and switch 212.
The data transfer handshaking protocol between host processor 216 and host interface socket 214 is designed so that host processor 216 may advantageously be implemented without requiring complicated interface circuitry. In practice, host interface socket 214 preferably generates a master enable signal 724 to host processor 216 (when host processor 216 functions as the master processor) and a unit of data is responsively transferred. Similarly, host interface socket 214 preferably generates a slave enable signal 736 to host processor 216 (when host processor 216 functions as the slave processor) and a unit of data is responsively transferred. Host processor 216 thus may assume a relatively passive role in the data transfer process.
Referring now to FIG. 8, a block diagram for one embodiment of the FIG. 3 DSP interface sockets (DSK1 222 and DSK2 230) is shown, in accordance with the present invention. In the FIG. 8 embodiment, DSP interface sockets 222 and 230 preferably each include electronic circuitry that receives interface signals generated from switch 212 (FIGS. 3 and 4) and responsively generates similar sets of corresponding DSP signals to __ DSPl 224 and to DSP2 232. The functionality of the DSP signals is further described below in conjunction with FIG. 9.
Referring now to FIG. 9, a signal table 810 corresponding to the FIG.
8 DSP interface sockets 222 and 230 is shown, in accordance with the present invention. The FIG. 9 signal table 810 describes a set of DSP signals 812 through 848, including a DSP signal name (corresponding to FIG. 8), a DSP signal direction (input or output), and a DSP signal description. Many of the DSP signals of FIG. 9 directly correspond to the interface signals (FIG. 4) between DSPl interface socket 222 or DSP2 interface socket 230 and switch 212.
As similarly discussed above, the data transfer handshaking protocol between host processor 216 and DSP sockets 222 and 232 is designed so that DSPs 224 and 232 may advantageously be implemented without requiring complicated interface circuitry. In practice, the DSP interface sockets 222 or 230 preferably generate a master enable signal 824 to their corresponding DSP 224 or 232 (when that DSP functions as the master processor) and a unit of data is responsively transferred. Similarly, the DSP interface sockets 222 or 230 preferably generate a slave enable signal 836 to their corresponding DSP 224 or 232 (when that DSP functions as the slave processor) and a unit of data is responsively transferred. The DSPs 224 and 232 may thus assume relatively passive roles during the data transfer process.
Referring now to FIG. 10, a block diagram tracing a basic write pipeline 1010 for a data write cycle is shown, in accordance with the present invention. In the following FIGS. 10 through 12, for the sake of
illustration, DSPl 224 is described as the master processor that initiates a write request to transfer write data to slave processor DSP2 232. In other uses of the present invention, any system 210 processor (host processor 216, DSPl 224, or DSP2 232) may function as a master processor to request a write data transfer, and likewise, any system 210 processor may operate as the slave processor to service the write request. __
In the FIG. 10 example, DSPl 224 sends an address and data count through DSK1 222 to switch 212 which temporarily latches the address in latch 1012. DSPl 224 then sends write data through DSK1 222 to switch 212 which temporarily latches the write data in latch 1012. At the appropriate time, switch 212 stores the address into latch 1014 of DSK2 230, and also stores the write data into latch 1016 of DSK2 230. When DSP2 232 is ready, DSK2 230 provides the address and the write data to DSP2 232, and the write cycle is complete. The operation of the write cycle is further illustrated and discussed below in conjunction with FIGS. 1 1 and 12.
Referring now to FIG. 1 1, a timing diagram 11 10 showing exemplary waveforms for a data write cycle is shown, in accordance with the present invention. The waveforms of FIG. 1 1 correspond to the signals discussed above in conjunction with FIGS. 3 through 5, and are presented to illustrate the operation of one embodiment of the present invention. In alternate embodiments, multiprocessor system 210 may readily generate and operate with various other appropriate timing waveforms. One embodiment for generating and utilizing the FIG. 11 waveforms is further discussed below in conjunction with FIG. 12.
Referring now to FIG. 12, a flowchart of method steps for one embodiment to perform a data write cycle is shown, in accordance with the present invention. In the FIG. 12 embodiment, DSPl 224 functions as the master processor and DSP2 232 functions as the slave processor. However, in alternate embodiments, any system 210 processor (host processor 216,
DSPl 224, or DSP2 232) may initiate a write request as the master processor, or service a write request as the slave processor.
Initially, in step 1212, the master processor sends a write request 414 (FIG. 11) and a slave unit number 412 to switch 212 via DSKl 222. Then, in step 1214, switch 212 responsively arbitrates the write request 414, and generates a grant signal 416 to the master processor to authorize^ a write data transfer, as discussed above in conjunction with FIG. 5.
Next, in step 1216, switch 212 creates a data transfer bridge through switch data path 514 (FIG. 5) to temporarily store and then pass the write data to the slave processor. Then, in step 1218, the master processor sends an address and a data count to switch 212 via DSKl 222, and switch 212 responsively stores the data count in data counter 516 and latches the address in latch 1012.
In step 1220, switch 212 receives the write data from the master processor and temporarily stores the received write data into FIFO 518 (FIG. 5). Then, in step 1222, switch 212 determines whether the slave processor is ready to receive the write data stored in FIFO 518. In the FIG. 12 embodiment, switch 212 preferably checks a busy signal 430 to determine the status of the slave processor. If the slave processor is ready to accept the write data, then switch
212, in step 1224, initially sends the latched address, and subsequently sends one unit of the temporarily stored write data from FIFO 518 to the slave processor via DSK2 230. In step 1226, switch 212 then decrements the data counter 516 to monitor the amount of write data that remains to be transferred to the slave processor.
In step 1228, switch 212 determines whether the current data count stored in data counter 516 is equal to zero. If the current data count stored in data counter 516 is not equal to zero, then the FIG. 12 process returns to step 1224 to continue transfering the remaining units of write data. However, if the current value stored in data counter 516 is equal to zero, then all the write data has been transferred to DSP2 232, and switch 212 generates a termination signal 426 to end the FIG. 12 data write cycle.
Referring now to FIG. 13, a block diagram tracing a basic read pipeline 1310 for a data read cycle is shown, in accordance with the present invention. In the following FIGS. 13 through 15, for the sake of illustration, DSPl 224 is described as the master processor that initiates a read request to transfer read data from slave processor DSP2 232. In other uses of the present invention, any system 210 processor (host processor 216, DSPl __ 224, or DSP2 232) may operate as the master processor to request a data read operation, and likewise, any system 210 processor may operate as the slave processor to service the data read request. In the FIG. 13 embodiment, DSPl 224 sends an address and data count through DSKl 222 to switch 212, which temporarily latches the address in latch 1316. When DSP2 232 is ready, then DSK2 230 provides the address to DSP2 232 via latch 1318 of DSK2 230. DSP2 232 responsively transfers the requested read data to latch 1320 in switch 212. When DSPl 224 is ready to accept the requested read data, then switch 212 transfers the read data to DSPl 224 via latch 1322 in DSKl 222. The operation of the data read cycle is further illustrated and discussed below in conjunction with FIGS. 14 and 15.
Referring now to FIG. 14, a timing diagram 1410 showing exemplary waveforms for a data read cycle is shown, in accordance with the present invention. The waveforms of FIG. 14 correspond to the signals discussed above in conjunction with FIGS. 3 through 5, and are presented to illustrate the operation of one embodiment of the present invention. In alternate embodiments, multiprocessor system 210 may readily generate and function using various other appropriate timing waveforms. One embodiment for generating and utilizing the FIG. 14 waveforms is further discussed below in conjunction with FIG. 15.
Referring now to FIG. 15, a flowchart of method steps for one embodiment to perform a data read cycle is shown, in accordance with the present invention. In the FIG. 15 embodiment, DSPl 224 functions as the master processor and DSP2 232 functions as the slave processor. However,
in alternate embodiments, any system 210 processor (host processor 216, DSPl 224, or DSP2 232) may initiate a read request as the master processor, or service a read request as the slave processor.
Initially, in step 1512, the master processor sends a read request 414 and a slave unit number 412 to switch 212 via DSKl 222. In one embodiment, the master processor utilizes a write/read signal 418 (FIG. 14) to indicate whether the request is for a write operation or a read operation. Then, in step 1514, switch 212 responsively arbitrates the read request 414, and generates a grant signal 416 to the master processor to authorize a read data transfer, as discussed above in conjunction with FIG. 5.
Next, in step 1516, switch 212 creates a data transfer bridge through switch data path 514 (FIG. 5) to temporarily store and then pass the read data from the slave processor to the master processor. Then, in step 1518, the master processor sends an address and a data count to switch 212 via DSKl 222, and switch 212 responsively stores the data count in data counter 516 and latches the address in latch 1316.
In step 1520, switch 212 preferably uses a handshaking protocol to determine whether the slave processor is ready to begin the read data transfer. If the slave processor is ready, then switch 212, in step 1522, receives the read data from the slave processor and temporarily stores the received read data into FIFO 518 (FIG. 5). Then, in step 1524, switch 212 preferably uses another handshaking protocol to determine whether the master processor is ready to receive the read data stored in FIFO 518.
If the master processor is ready to accept the read data, then switch 212, in step 1526, sends one unit of the temporarily stored read data from
FIFO 518 to the master processor via DSKl 222. In step 1528, switch 212 then decrements the data counter 516 to monitor the amount of read data that remains to be transferred to the master processor.
In step 1530, switch 212 determines whether the current data count stored in data counter 516 is equal to zero. If the current data count stored in data counter 516 is not equal to zero, then the FIG. 15 process returns to step 1526 to continue transferring the remaining units of read data. However, if the current data count stored in data counter 516 is equal to
zero, then all the read data has been transferred to DSPl 224, and switch 212 generates a termination signal 426 to end the FIG. 15 data read cycle.
Referring now to FIG. 16, a block diagram for one embodiment of a multiprocessor system 1610 is shown, in accordance with the present invention. The FIG. 16 embodiment includes a switch module 1612 that __ individually communicates with a processor 1 1614, a processor 2 1622, a processor 3 1638, and a processor 4 1630. In alternate embodiments, system 1610 may readily be configured to include more or less than the four processors 1630, 1614, 1622, and 1638 that are illustrated in the FIG. 16 embodiment.
In operation, data read cycles and data write cycles may be performed by the FIG. 16 multiprocessor system 1610 using the same or similar techniques as those discussed above in conjunction with FIGS. 1 through 15. However, the FIG. 16 system 1610 may advantageously also perform simultaneous data transfers between multiple pairs of processors 1630, 1614, 1622, and 1638, in accordance with the present invention.
For example, processor 1 1614 and processor 2 1622 may perform a data transfer, while processor 3 1638 and processor 4 1630 simultaneously perform another separate data transfer. Similarly, processor 1 1614 and processor 3 1638 may perform a data transfer, while processor 2 1622 and processor 4 1630 simultaneously perform another data transfer. Further, processor 1 1614 and processor 4 1630 may perform a data transfer, while processor 3 1638 and processor 2 1622 simultaneously perform a data transfer.
The FIG. 16 system 1610 advantageously creates multiple read or write pipelines to provide powerful simultaneous multiple data transfer capabilities for system 1610. The ability to concurrently process and perform multiple data transfers therefore allows multiprocessor system 1610 to significantly expedite and facilitate complex processing tasks.
The invention has been explained above with reference to a preferred embodiment. Other embodiments will be apparent to those skilled in the
art in light of this disclosure. For example, the present invention may readily be implemented using configurations and techniques other than those described in the preferred embodiment above. Additionally, the present invention may effectively be used in conjunction with systems other than the one described above as the preferred embodiment. Therefore, these and other variations upon the preferred embodiments are intended to be covered by the present invention, which is limited only by the appended claims.