CN104391821A

CN104391821A - System level model building method of multiple core sharing SIMD coprocessor

Info

Publication number: CN104391821A
Application number: CN201410669796.XA
Authority: CN
Inventors: 郭炜; 崔鲁平; 魏继增
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2014-11-20
Filing date: 2014-11-20
Publication date: 2015-03-04

Abstract

Disclosed is a system level model building method of a multiple core sharing SIMD coprocessor. The system level model building method of the multiple core sharing SIMD coprocessor comprises an SOC (system on chip), wherein n cores and n vector coprocessors are arranged on the SOC, n is a positive even number, and the n vector coprocessors are connected with the n cores through a crossbar switch. The system level model building method of the multiple core sharing SIMD coprocessor further comprises a dispatcher connected with the n cores, the n vector coprocessors and the crossbar switch, and used to dispatch the vector coprocessors to communicate with the n cores through the crossbar switch, wherein the dispatcher dispatches the vector coprocessor according to current states of the vector coprocessors. The system level model building method of the multiple core sharing SIMD coprocessor significantly improves resource utilization rate of the multiple core sharing SIMD coprocessor through a sharing mechanism, and reduces system power consumption, and furthermore, compared with the prior art, the system level model building method of the multiple core sharing SIMD coprocessor efficiently completes a task under the circumstance that the quantity of resources is fixed.

Description

A kind of multinuclear shares the system-level model construction method of simd coprocessor

Technical field

The present invention relates to a kind of system-level model of processor.Particularly relate to the system-level model construction method that a kind of multinuclear shares simd coprocessor.

Background technology

SIMD (Single Instruction Multiple Data) is a kind of technology realizing data level and walk abreast, and performs identical operation to multiple data.Simultaneously the key of SIMD technology performs multiple arithmetic operation in an independent instruction, and to increase the handling capacity of processor, this feature makes SIMD technology be particularly suitable for the data-intensive computings such as multimedia application.The processor of present main flow has its SIMD subset of instructions, as the NEON subset of instructions of MMX or SSE of X86, ARM, and the Altivec subset of instructions etc. of PowerPC.In the polycaryon processor in modern times, each core on processor can be furnished with an exclusive simd coprocessor usually, also referred to as Vector Coprocessor (VP).But, due to its distinctive attributes, when some core performs the program of a shortage data level concurrency, this simd coprocessor is in idle state, and other endorse the program that can perform data level concurrency, but the simd coprocessor belonging to this core can only be used, and other idle simd coprocessors can not be used, thus cause the waste of resource and the increase of power consumption.

Be illustrated in figure 1 traditional architecture, suppose that a SOC (system on a chip) has 4 cores and 4 VP.In the structure shown here, each VP is specific to some core, can not share by other cores.When some core does not perform data-intensive program, this VP is in idle state, thus causes the waste of resource and power consumption.

Summary of the invention

Technical matters to be solved by this invention is, provides a kind of resource utilization that can improve vectorial coprocessor, reduces the system-level model that system power dissipation multinuclear shares simd coprocessor.

The technical solution adopted in the present invention is: a kind of multinuclear shares the system-level model construction method of simd coprocessor, include SOC (system on a chip), described SOC (system on a chip) is provided with n core and n vectorial coprocessor, wherein n is positive even numbers, described n vectorial coprocessor is connected with a described n nuclear phase by a cross bar switch, also be provided with respectively with a described n core, n vectorial coprocessor is connected for the scheduler be communicated with described core by described cross bar switch scheduling vector coprocessor with cross bar switch, wherein, described scheduler carrys out scheduling vector coprocessor according to each vectorial coprocessor current state.

Described each vectorial coprocessor describes current residing state by 3 status registers, wherein,

First status register, vectorial coprocessor for describing place is current by which core in n core to be used, or do not used by any one core, do not used by any one core when vectorial coprocessor is current, then be set as that described vectorial coprocessor is in idle condition, can be dispatched by scheduler;

Second status register, vectorial coprocessor for describing place is current to be in shared state or to be in specific state, the vectorial coprocessor that setting is in shared state can be scheduled device scheduling, and the vectorial coprocessor being in specific state cannot be scheduled device scheduling;

Third state register, for describing the index of vectorial coprocessor in residing core in institute's directed quantity coprocessor at place.

When core is current using multiple vectorial coprocessor time, wherein only have a vectorial coprocessor to be in specific state, other vectorial coprocessors are all then be in shared state.

SOC (system on a chip) is when original state, each vectorial coprocessor is in specific state, and wherein first vectorial coprocessor is specific to first core, and second vectorial coprocessor is specific to second core, the like, the index of each vectorial coprocessor is 0; The condition that vector coprocessor changes into shared state by specific state is: use the core of described vectorial coprocessor initiatively to abdicate the right to use of described vectorial coprocessor, after this described vectorial coprocessor is dispatched by scheduler.

When any one core in n core has the program of data level concurrency to need more vectorial coprocessor to participate in computing because performing, then described core need to the more vectorial coprocessor of scheduler application, scheduler runs a kind of dispatching algorithm of load balancing, the vectorial coprocessor being in idle condition is had if current, then that core applied for distributed to by described vectorial coprocessor by scheduler, idle vectorial coprocessor is not had if current, but there is the vectorial coprocessor being in shared state, then scheduler carries out redistributing of vectorial coprocessor resource according to situation at that time according to load balancing.

A kind of multinuclear of the present invention shares the system-level model construction method of simd coprocessor, and by shared mechanism, significantly improve the resource utilization of SIMD vector coprocessor, reduce system power dissipation, when resource is certain, the efficiency that task completes can be higher.

Accompanying drawing explanation

Fig. 1 is traditional system on chip structure;

Fig. 2 is 4 cores adopting method of the present invention to build share 4 VP system-level model by cross bar switch;

Fig. 3 is a scheduling instance.

In figure

1: core 2: vectorial coprocessor

3: cross bar switch 4: scheduler

Embodiment

Below in conjunction with embodiment and accompanying drawing, the system-level model construction method that a kind of multinuclear of the present invention shares simd coprocessor is described in detail.

A kind of multinuclear of the present invention shares the system-level model construction method of simd coprocessor, include SOC (system on a chip), described SOC (system on a chip) is provided with n core and n vectorial coprocessor, wherein n is positive even numbers, described n vectorial coprocessor is connected with a described n nuclear phase by a cross bar switch, also be provided with respectively with a described n core, n vectorial coprocessor is connected for the scheduler be communicated with described core by described cross bar switch scheduling vector coprocessor with cross bar switch, wherein, described scheduler carrys out scheduling vector coprocessor according to each vectorial coprocessor current state.

First status register, the vectorial coprocessor for describing place is current by which core in n core to be used, and may be core 0, core 1 ... core (n-1).Or do not used by any one core, do not used by any one core when vectorial coprocessor is current, be then set as that described vectorial coprocessor is in idle condition, can be dispatched by scheduler;

Second status register, the vectorial coprocessor for describing place is current to be in shared state or to be in specific state.When core is current using multiple vectorial coprocessor time, wherein only have a vectorial coprocessor to be in specific state, other vectorial coprocessors are all then be in shared state.The vectorial coprocessor that setting is in shared state can be scheduled device scheduling, and the vectorial coprocessor being in specific state cannot be scheduled device scheduling;

A kind of multinuclear of the present invention shares the system-level model construction method of simd coprocessor, SOC (system on a chip) is when original state, each vectorial coprocessor is in specific state, wherein first vectorial coprocessor is specific to first core, second vectorial coprocessor is specific to second core, the like, the index of each vectorial coprocessor is 0; The condition that vector coprocessor changes into shared state by specific state is: use the core of described vectorial coprocessor initiatively to abdicate the right to use of described vectorial coprocessor, after this described vectorial coprocessor is dispatched by scheduler.

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.

Fig. 2 is 4 cores sharing the system-level model construction method design of simd coprocessor according to multinuclear of the present invention to share 4 VP system-level model by cross bar switch.Scheduler 4 can communicate with 4 cores (Core), 1 and 4 vectorial coprocessor (VP) 2, carries out the vectorial distribution of coprocessor 2 resource and the configuration of cross bar switch 3 according to the application of core 1.

Under system initial state, 4 vectorial coprocessors 2 are specific to 4 cores 1 respectively, and the device 4 that can not be scheduled is dispatched.When wherein some core 1 perform one there is no a program of data level concurrency time, the right to use of outgoing vector coprocessor 2 can be allowed to scheduler 4 application, after this this vectorial coprocessor 2 is managed by scheduler 4, and its state also changes into shared state by specific state.Due to the dynamic characteristic of program itself, in some time periods, due to the scheduling of operating system, allow the core 1 of outgoing vector coprocessor 2 likely can perform again the program that has data level concurrency.Now, this core 1 needs vectorial coprocessor 2 resource of applying for some to scheduler 4, if the vectorial coprocessor 2 of current available free state, so a certain amount of vectorial coprocessor 2 can be distributed to this core 1 by scheduler 4, and the state of one of them vectorial coprocessor 2 is changed into exclusive state by sharing state, other the vectorial coprocessor 2 distributing to this core is still in shared state.Under any circumstance, one allows the usufructuary core of outgoing vector coprocessor, if again apply for vectorial coprocessor resource, so this core is to I haven't seen you for ages application to a vectorial coprocessor resource, will ensure that each core can not " be died of hunger " like this.

A scheduling instance shown in Fig. 3, in order to illustrate Share Model of the present invention how to carry out work.Be divided into two to block, left side is time state, and right side is 4 vectorial coprocessors (VP).At system initial state, namely State 0 state, 4 VP respectively belong to a core respectively.Each VP describes current residing state by three parameters, and with VP0 citing, three parameters are respectively: C0/0/0.C0 represents current VP0 and is used by core 0; If second parameter 0 represents and is currently in specific state, be namely specific to core 0, the device that can not be scheduled is dispatched, if 1, is then in shared state, and the device that can be scheduled is dispatched; 3rd parameter be 0 expression in all VP belonging to core 0, the index of this VP is 0.

After system cloud gray model, core 0, core 1 and core 2, owing to not performing the program of data level concurrency, therefore initiatively to scheduler application, are abdicated the right to use of VP, are now entered in State 1 state.In State 1 state, for VP0, its three parameters are: s/1/2.First parameter is that this VP of behalf is in charge of by scheduler, and second parameter is that the current VP0 of 1 expression is in shared state, the 3rd parameter be 2 representatives in the middle of all VP belonging to scheduler management, VP0 index is wherein 2.

System continues to run, in certain a period of time, core 3 is owing to performing a large amount of data-intensive tasks, therefore initiatively to scheduler application VP resource, idle state is in and the VP being in shared state because scheduler is manage 3, therefore these 3 VP resources are all distributed to core 3 to use, enter State 1 state.In a state in which, 4 VP are used by core 3, and have 3 VP (VP0 ~ VP3) to be still in shared state, a VP (VP3) is in exclusive state, and index is respectively 0 to 3.

System continues down to run, and in section sometime, due to the scheduling of operating system, core 0 is scheduled into one the program of data level concurrency, and therefore core 0 is initiatively to scheduler application VP resource, and now core 3 is still using whole VP resources.Due to scheduler operation is load balance scheduling algorithm, therefore core 0 is distributed to for two that used by core 33 can be in the VP of shared state, can arrange the status register of these two VP simultaneously, random is set to exclusive state by these two VP by shared state, another still for sharing state, enters State 3 state.In State 3 state, core 0 and core 3 use 2 VP resources respectively, and have one to be exclusive state in two VP that each core uses, one is shared state.According to running situation, the VP resource being in shared state still can be scheduled device scheduling, and whole process is the process of a dynamic conditioning always.

It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a multinuclear shares the system-level model construction method of simd coprocessor, include SOC (system on a chip), described SOC (system on a chip) is provided with n core and n vectorial coprocessor, wherein n is positive even numbers, it is characterized in that, described n vectorial coprocessor is connected with a described n nuclear phase by a cross bar switch, also be provided with respectively with a described n core, n vectorial coprocessor is connected for the scheduler be communicated with described core by described cross bar switch scheduling vector coprocessor with cross bar switch, wherein, described scheduler carrys out scheduling vector coprocessor according to each vectorial coprocessor current state.

2. a kind of multinuclear according to claim 1 shares the system-level model construction method of simd coprocessor, it is characterized in that, described each vectorial coprocessor describes current residing state by 3 status registers, wherein,

3. a kind of multinuclear according to claim 2 shares the system-level model construction method of simd coprocessor, it is characterized in that, when core is current using multiple vectorial coprocessor time, wherein only have a vectorial coprocessor to be in specific state, other vectorial coprocessors are all then be in shared state.

4. a kind of multinuclear according to claim 2 shares the system-level model construction method of simd coprocessor, it is characterized in that, SOC (system on a chip) is when original state, each vectorial coprocessor is in specific state, wherein first vectorial coprocessor is specific to first core, second vectorial coprocessor is specific to second core, the like, the index of each vectorial coprocessor is 0; The condition that vector coprocessor changes into shared state by specific state is: use the core of described vectorial coprocessor initiatively to abdicate the right to use of described vectorial coprocessor, after this described vectorial coprocessor is dispatched by scheduler.

5. a kind of multinuclear according to claim 4 shares the system-level model construction method of simd coprocessor, it is characterized in that, when any one core in n core has the program of data level concurrency to need more vectorial coprocessor to participate in computing because performing, then described core need to the more vectorial coprocessor of scheduler application, scheduler runs a kind of dispatching algorithm of load balancing, the vectorial coprocessor being in idle condition is had if current, then that core applied for distributed to by described vectorial coprocessor by scheduler, idle vectorial coprocessor is not had if current, but there is the vectorial coprocessor being in shared state, then scheduler carries out redistributing of vectorial coprocessor resource according to situation at that time according to load balancing.