CN101625673B

CN101625673B - Method for mapping task of network on two-dimensional grid chip

Info

Publication number: CN101625673B
Application number: CN2008101162455A
Authority: CN
Inventors: 刘祥; 陈曦; 黄毅; 张金龙; 任菲
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2008-07-07
Filing date: 2008-07-07
Publication date: 2012-06-27
Anticipated expiration: 2028-07-07
Also published as: CN101625673A

Abstract

The invention discloses a method for mapping a task of a network on a two-dimensional grid chip. The method comprises the following steps: 1) pre-distributing expected positions of all threads on a two-dimensional grid, wherein the threads comprise common threads which can be mapped to any position; 2) calculating variation Com_diff of a general communication power consumption factor after each common thread is exchanged with a close-by common thread on the expected position of the common thread or an idle position, wherein the common thread executes exchange with the common thread or the idle position which minimizes the Com_diff, until exchanges of all the common threads and the close-by threads on the expected positions of the common threads or the idle positions lead the Com_diff to be more than or equal to zero; and 3) outputting a mapping file according to the positions of all the threads. The method has high optimization degree, and solves part of the mapping problem because users can regulate parameters by oneself to control time complexity.

Description

A kind of duty mapping method of two-dimensional grid network-on-chip

Technical field

The present invention relates to a kind of method of application of polycaryon processor, is a kind of two-dimensional grid (2-D Mesh) structure network-on-chip (Network-on-Chip, duty mapping method NoC).

Background technology

Along with the development of semiconductor and integrated circuit technique, (System-on-Chip, integrated level SoC) is increasingly high, can integrated hundreds of the IP kernels such as microprocessor, storer, I/O interface on the single chip for SOC(system on a chip).On the other hand, the function of embedded electronic product becomes increasingly complex, and the uniprocessor SOC(system on a chip) can't satisfy growing function of embedded system and performance requirement, and (Multi-Processor SoC, appearance MPSoC) becomes inevitable the multinuclear SOC(system on a chip).And the multinuclear SOC(system on a chip) is had higher requirement to chip-on communication, and network-on-chip proposes for the global communication that solves nanometer era multinuclear SOC(system on a chip).Network-on-chip is used for reference the design philosophy of parallel computation and computer network; On single silicon chip, make up a micronetwork that adopts packet switch; Interconnect through switch between the IP kernel; And (Global Asynchronous Local Synchronous, GALS) mechanism realize the efficient communication between computing modules such as a large amount of processing units, storage unit in the multinuclear SOC(system on a chip) to use the Global Asynchronous local synchronization.

The topological structure of network-on-chip is varied, wherein two-dimensional grid have simple in structure, extensibility good, be convenient to realize and advantages such as analysis, thereby obtained using widely in the network-on-chip field.Along with number of transistors on the chip develops into 1,000,000,000 orders of magnitude; Power consumption becomes the primary restraining factors of chip design gradually; Thread and the mapping method between a plurality of processing units of network-on-chip based on power consumption are a lot, wherein Jingcao Hu and Marculescu, and R is at Computer-Aided Design of IntegratedCircuits and Systems; IEEE Transactions on Volume 24; Issue 4, among the document Energy-and performance-aware mapping forregular NoC architectures that April 2005Pages:551-562 delivers, hereinafter to be referred as document 1; Set forth the method that adopts branch and bound thought; Promptly in obtaining the process of next feasible solution, those can not obtain the process of optimum solution to utilize upper bound function U BC (upper bound cost) and lower limit function LBC (lower bound cost) premature termination, thereby guide this method to advance towards " branch " of optimum solution.But in this method implementation, each step all need be calculated UBC and LBC, and this will inevitably increase time complexity.A plurality of " optimum solutions " might appear in this method simultaneously, cause final majorization of solutions degree not high.

Summary of the invention

It is long that the object of the invention overcomes the mapping method execution time of the prior art, the final unwarrantable problem of majorization of solutions degree, thus a kind of duty mapping method of two-dimensional grid network-on-chip is provided.

According to an aspect of the present invention, a kind of duty mapping method of two-dimensional grid network-on-chip is provided, has comprised the following steps:

1) desired location on all threads of predistribution to the two-dimensional grid, said thread comprises the common thread that can map to any position;

2) calculate common thread or the variable quantity Com_diff of the total communication power consumption factor after the clear position exchange near the desired location of each common thread and this common thread; Said common thread is got minimum common thread or clear position execution exchange with making Com_diff, and near common thread said all common thread and its desired location or clear position exchange all make Com_diff more than or equal to 0; Wherein, the distance of near the common thread the said desired location or clear position and desired location is less than predetermined threshold;

3) export mapped file according to the position of said all threads.

Wherein, said step 1) comprises:

11) list said common thread in a formation according to the size order of the traffic of each common thread;

12) first common thread in the said formation is dispensed to the center of said two-dimensional grid;

13) calculate common thread desired location to be allocated according to the desired location of the thread that has distributed.

Wherein, said all threads also comprise the special thread that need map to ad-hoc location.

Wherein, said step 1) comprises:

11 ') list said special thread in a formation;

12 ') size order according to the traffic of each common thread adds said formation with said common thread;

13) calculate common thread desired location to be allocated according to the thread desired location of having distributed.

Wherein, said step 13) is calculated common thread desired location to be allocated according to following formula according to the thread desired location of having distributed:

Com wherein _{I, k}Data communication total amount between expression thread i, the k, x _kAnd y _kX axle and the y axial coordinate of representing thread k respectively, x _iAnd y _iX axle and the y axial coordinate of representing thread i respectively.

Wherein, said step 2) comprising:

21) all common thread in the said formation are constituted round-robin queue, appoint and get one of them common thread;

22) supposing that said common thread belongs to does not shine upon thread; Calculate the variable quantity Com_diff of the total communication power consumption factor after near common thread of said common thread and its desired location or clear position exchange, said common thread and the common thread or the clear position execution that make Com_diff get minimum are exchanged;

23) repeating step 22) near said all common thread and its desired location common thread or clear position exchange all make Com_diff more than or equal to 0.

Wherein, said and distance desired location is a manhatton distance.

The present invention provides a kind of power consumption preferential network-on-chip mapping method, and the optimal location that it constantly adjusts each thread makes final majorization of solutions degree reach the highest.In the present invention simultaneously; The user can set up a threshold value on their own, so that on method execution time and final majorization of solutions degree, weigh, when choosing less threshold value; The each circulation time of this method only need compare the numerical value of several Key Points, thereby has significantly reduced time complexity.And the present invention considers and has solved the situation of part mapping, and promptly some thread is the special case that must map on the specific PU in a NoC system.

Description of drawings

Fig. 1 is the NoC synoptic diagram of a 2-D Mesh structure;

Fig. 2 is the process flow diagram of the duty mapping method embodiment of two-dimensional grid network-on-chip of the present invention;

Fig. 3 is that data were utilized a mapping result of duty mapping method generation of the present invention at S=0 during H.264 Decoder showed during d=1;

Fig. 4 is that data were utilized another mapping result of duty mapping method generation of the present invention at S=1 during H.264 Decoder showed during d=2.

Fig. 4 is that data are utilized another mapping result of duty mapping method generation of the present invention at S=1 in H.264Decoder showing during d=2.

Embodiment

Do further detailed explanation below in conjunction with the accompanying drawing specific embodiments of the invention.

Fig. 1 shows the specific embodiment of NoC of one 4 * 3 2-D Mesh structure, and wherein S representes crosspoint, and LR representes local resource, and Adapt representes adapter; PU representes processing unit, (0,0), (0,1), (0; 2) ..., the position coordinates of (3,2) expression thread mapping (x, y).

The mapping position of supposing i thread in 2-D Mesh is (x _i, y _i), the mapping position of j thread is (x _j, y _j).Adopt manhatton distance D in the present embodiment _{I, j}=| x _i-x _j|+| y _i-y _j| the skip distance of data when expression thread i communicates by letter with thread j; It will be understood by those in the art that not break away from inventive concept, also can adopt other distance.The communication power consumption of then transmitting 1 Bit data from thread i to thread j is:

E_{bit}^{i, j} = (D_{i, j} + 1) E_{Sbit} + D_{i, j} E_{Lbit} - - - (1)

Wherein, E _SbitRepresent that each crosspoint receives and transmit the power consumption of 1 Bit data, E _LbitRepresent the circuit power consumption of transmission 1 Bit data between adjacent two processing units, make E _Sbit/ E _Lbit=θ then has

E_{bit}^{i, j} = [(θ + 1) D_{i, j} + θ] E_{Lbit} - - - (2)

Then the total communication power consumption of total system is

E_{com} = Σ_{i = 0}^{T - 1} Σ_{j = i + 1}^{T - 1} (C_{i, j} + C_{j, i}) [(θ + 1) D_{i, j} + θ] E_{Lbit} - - - (3)

Wherein, T representes total number of thread, C _{I, j}Expression thread i is to the data traffic of thread j.Note Com _{I, j}=C _{I, j}+ C _{J, i}Data communication total amount between expression thread i, the j then has

E_{com} = Σ_{i = 0}^{T - 1} Σ_{j = i + 1}^{T - 1} [(θ + 1) D_{i, j} + θ] {Com}_{i, j} \cdot E_{Lbit} - - - (4)

Starting point of the present invention is the communication power consumption factor of optimization system, thereby confirms to make minimum thread of system power dissipation and the mapping relations between a plurality of processing units of network-on-chip.Can know that by formula (4) communication power consumption factor of system is:

W = Σ_{i = 0}^{T - 1} Σ_{j = i + 1}^{T - 1} [(θ + 1) D_{i, j} + θ] {Com}_{i, j} - - - (5)

All threads are divided into two types, comprise the common thread that can map to any position and need map to the special thread of ad-hoc location.In system, special thread possibly exist also and possibly not exist, and in mapping process, needs only it is mapped to its ad-hoc location.For arbitrary common thread i; Hope that it is mapped to the minimum position of communication power consumption factor that makes above-mentioned system; Realize through following manner: calculate the variable quantity of the total communication power consumption factor after near thread (except the special thread) common thread i and its desired location or the clear position exchange, common thread i is exchanged with making that variable quantity is minimum and carry out less than 0 thread or clear position.The variable quantity of the total communication power consumption factor after thread i and other thread and the clear position exchange calculates as follows:

The coefficient of the communication power consumption sum of thread i and other all threads is

w_{i} = Σ_{k = 0}^{T - 1} [(θ + 1) D_{i, k} + θ] {Com}_{i, k} (k &NotEqual; i) - - - (6)

The coefficient of the communication power consumption sum of thread j and other all threads is

w_{j} = Σ_{k = 0}^{T - 1} [(θ + 1) D_{j, k} + θ] {Com}_{j, k} (k &NotEqual; j) - - - (7)

After thread i, the j exchange mapping position, the coefficient of the communication power consumption sum of thread i and thread j and other all threads is respectively

Therefore, behind thread i, the j switch, the variable quantity of the total communication power consumption factor of system does

If thread i and a certain clear position (s, t) exchange, that is, and thread i be mapped to clear position (s, t), and the original position of thread i becomes clear position, then exchange afterwards the coefficient of the communication power consumption sum of thread i and other all threads be:

w_{new} = Σ_{k = 0}^{T - 1} [(θ + 1) (| s - x_{k} | + | t - y_{k} |) + θ] {Com}_{i, k} (k &NotEqual; i) - - - (11)

Therefore, thread i and clear position (s, t) after the exchange, the variable quantity of the total communication power consumption factor of system is:

Com_diff = w_{new} - w_{i} = Σ_{k = 0}^{T - 1} (θ + 1) (| s - x_{k} | + | t - y_{k} | - D_{i, k}) {Com}_{i, k} (k &NotEqual; i) - - - (12)

If set omega comprises all threads that shone upon, the desired location (x of thread i so to be mapped _i, y _i) calculate symbol in the formula (13) wherein in the present embodiment with formula (13)

Expression is to symbol inner function round, and those skilled in that art are appreciated that and can also calculate through alternate manner.

Calculate according to above-mentioned theory, embodiment of the present invention is following:

Suppose in a certain task that special number of threads is S, the common thread number is T-S.List all special threads in a sequencing queue, all common thread are also added in this formation, preferably common thread can add this formation according to the size order of the traffic of other thread in this thread and the formation.Get the maximum thread of a pair of traffic, wherein comprise a common thread that does not add formation as yet at least.If two threads all not in formation, then all are attached to the formation end with these two threads, their sequencing depends on the peak volume of other threads in these two threads and the formation, is worth big person preceding; If one of them in formation, then is added in the sequencing queue end with another thread.Take off then the maximum thread of a pair of traffic continue to carry out this operation until all threads all in formation.Be 0～T-1 with this T thread by its serial number in formation at last.If special number of threads is 0 in the task, then the order of first pair of thread in formation is arbitrarily.

With demoder H.264 is example, and it relates to 12 modules (task), requires to be mapped in the processor array of a 2D Mesh of 4 * 4, and the communication flows between these modules is as shown in table 1.

Table 1 is intermodule communication flowmeter in the demoder H.264

From/To	IP0	IP1	IP2	IP3	IP4	IP5	IP6	IP7	IP8	IP9	IP10	IP11
													IP0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	7,098.7
IP1	4,465.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	344.5
													IP2	0.0	0.0	0.0	0.0	62.7	4,791.9	0.0	0.0	0.0	0.0	0.0	13,1970
IP3	0.0	5,936.1	0.0	0.0	0.0	0.0	0.0	641.0	0.0	0.0	0.0	0.0
													IP4	0.0	0.0	0.0	6,5771	0.0	0.0	406.6	0.0	494.7	0.0	0.0	0.0
IP5	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
													IP6	324.9	321.4	0.0	186.0	232.0	11.6	0.0	6.9	990.2	59.2	11.6	0.0
IP7	320.5	13.5	0.0	0.0	0.0	0.0	0.0	0.0	0.0	145.0	0.0	26.7
													IP8	0.0	0.0	0.0	0.0	0.0	0.0	826.3	0.0	0.0	0.0	0.0	0.0

?

IP9	0.0	0.0	0.0	0.0	0.0	0.0	0.0	320.5	0.0	0.0	0.0	0.0
													IP10	0.0	0.0	62.7	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
IP11	2,644.3	10,628.0	7,470.4	0.0	0.0	0.0	0.0	39.6	0.0	0.0	0.0	0.0

Suppose S=1, IP5 is special thread, and so initial sequencing queue is { IP5}.Get a pair of traffic maximum thread IP2 and IP11, they all not in formation since in IP2 and the formation traffic between the thread IP5 greater than the traffic between IP11 and the IP5, so IP2 preferentially adds tail of the queue.Therefore new sequencing queue is { IP5, IP2, IP11}.The maximum thread of the down a pair of traffic is IP1 and IP11, because IP11 is in formation, so get final product IP1 adding tail of the queue.Therefore new sequencing queue be IP5, IP2, IP11, IP1} in like manner also adds tail of the queue to IP0, then formation becomes { IP5, IP2, IP11, IP1, IP0}.The maximum thread of down a pair of letter amount is IP3 and IP4, they all not in formation since in IP3 and the formation maximal value (with the traffic of IP1) of the thread traffic greater than the traffic of each thread in IP4 and the formation, so IP3 preferentially adds tail of the queue.Therefore new sequencing queue is { IP5, IP2, IP11, IP1, IP0, IP3, IP4}.The rest may be inferred, and after all threads all added formation, formation was { IP5, IP2, IP11, IP1, IP0, IP3, IP4, IP6, IP8, IP7, IP4, IP10}.Giving these 12 thread number by this queue sequence at last is 0～11.

Calculate according to above-mentioned formula and to make the minimum mapping of communication power consumption factor of system, concrete steps are following:

Step 1: initialization two-dimensional array A [M] [N], and make all elements be changed to-1, represent that this position is idle, wherein M is the line number of network-on-chip two-dimensional grid, N is the columns of two-dimensional grid.If S=0; Then distribute No. 0 thread to center

otherwise distribute all special threads to the correspondence position of particular processor unit; Utilize formula (13) to calculate the desired location of next thread; Get unallocated location assignment this thread the shortest with this desired location manhatton distance; Distribute next thread then, assign until all threads.Be dispensed to for the i thread that (a, b) position make A [a] [b]=i, x [i]=a, y [i]=b.

Step 2: make unextimes=0, all T-S in the sequencing queue common thread are constituted round-robin queue, get first thread in this round-robin queue.In the present embodiment, with the concrete realization of chained list as formation.

Step 3: suppose that this thread does not belong to Ω; Utilize formula (13) to calculate the desired location of this thread, utilize formula (10) and formula (12) to calculate among the A with this position the Com_diff after to be the center manhatton distance smaller or equal to each position (except ad-hoc location and the infeasible position) of threshold value d exchange with this thread origin-location; Threshold value d wherein can be set up on their own by the user, the diamond-shaped area scope of decision " comparison position ", and the big more diamond-shaped area scope of d value is big more; But must not be greater than max (M; N), preferred value is 1 or 2, the complexity of this threshold affects entire method and final majorization of solutions degree.Each Com_diff relatively, the Com_diff that value is minimum and write down its pairing position (x, y).If Step 4 is changeed in Com_diff＞=0, otherwise change Step 6.

Step 4:unextimes++ if unextimes=T-S changes Step 7, otherwise changes Step 5.

Step 5: get next thread, change Step 3.

Step 6: make unextimes=0, if A [x, y]=-1 is dispensed to this thread [x, y] position, and the origin-location is changed to-1; Otherwise with the thread switch on this thread and [x, the y].Change Step 5.

Step 7: the position output mapped file according to each thread, finish.

Be example with above-mentioned H.264 demoder still, suppose not have particular thread need be fixed on certain particular processor unit, set d=1, the result that this method is carried out is: IP10 maps to (0,0), and IP9 maps to (0,2); IP7 maps to (0,3), and IP2 maps to (1,0), and IP11 maps to (1,1), and IP1 maps to (1; 2), IP3 maps to (1,3), and IP5 maps to (2,0), and IP0 maps to (2,1); IP6 maps to (2,2), and IP4 maps to (2,3), and IP8 maps to (3,2).

Suppose that thread IP5 must map to (3,1) in advance, set d=2, the result that this moment, this method was carried out is: IP10 maps to (3,3), and IP9 maps to (1,3); IP7 maps to (0,3), and IP2 maps to (3,2), and IP11 maps to (2,2), and IP1 maps to (1; 2), IP3 maps to (0,2), and IP5 maps to (3,1), and IP0 maps to (2,1); IP6 maps to (1,1), and IP4 maps to (0,1), and IP8 maps to (1,0).

Carrying out coding-decoding operation with two sections film-pieces that only contain box and hand is instantiation, and it maps to 20 threads among 5 * 5 the NoC, and document 1 said method and the inventive method power consumption and time ratio are seen table 2 and table 3 more respectively:

Table 2 document 1 said method and the inventive method power consumption comparison sheet

Table 3 document 1 said method and the inventive method time comparison sheet

Data show in table 2 and the table 3, and the optimum solution that method of the present invention is calculated is littler than document 1 described optimum solution power consumption; And when d=1, the time of finding the solution is shorter.

Should be noted that and understand, under the situation that does not break away from the desired the spirit and scope of the present invention of accompanying Claim, can make various modifications and improvement the present invention of above-mentioned detailed description.Therefore, the scope of the technical scheme of requirement protection does not receive the restriction of given any specific exemplary teachings.

Claims

1. the duty mapping method of a two-dimensional grid network-on-chip comprises the following steps:

3) export mapped file according to the position of said all threads.

2. method according to claim 1 is characterized in that, said step 1) comprises:

3. method according to claim 1 is characterized in that, said all threads also comprise the special thread that need map to ad-hoc location.

4. method according to claim 3 is characterized in that, said step 1) comprises:

11 ') list said special thread in a formation;

5. according to claim 2 or 4 described methods, it is characterized in that said step 13) is calculated common thread desired location to be allocated according to following formula according to the thread desired location of having distributed:

6. according to claim 2 or 4 described methods, it is characterized in that said step 2) comprising:

7. method according to claim 6 is characterized in that, said and distance desired location is calculated according to manhatton distance.