CN104391821A - System level model building method of multiple core sharing SIMD coprocessor - Google Patents

System level model building method of multiple core sharing SIMD coprocessor Download PDF

Info

Publication number
CN104391821A
CN104391821A CN201410669796.XA CN201410669796A CN104391821A CN 104391821 A CN104391821 A CN 104391821A CN 201410669796 A CN201410669796 A CN 201410669796A CN 104391821 A CN104391821 A CN 104391821A
Authority
CN
China
Prior art keywords
coprocessor
vectorial
core
state
vectorial coprocessor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410669796.XA
Other languages
Chinese (zh)
Inventor
郭炜
崔鲁平
魏继增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201410669796.XA priority Critical patent/CN104391821A/en
Publication of CN104391821A publication Critical patent/CN104391821A/en
Pending legal-status Critical Current

Links

Abstract

Disclosed is a system level model building method of a multiple core sharing SIMD coprocessor. The system level model building method of the multiple core sharing SIMD coprocessor comprises an SOC (system on chip), wherein n cores and n vector coprocessors are arranged on the SOC, n is a positive even number, and the n vector coprocessors are connected with the n cores through a crossbar switch. The system level model building method of the multiple core sharing SIMD coprocessor further comprises a dispatcher connected with the n cores, the n vector coprocessors and the crossbar switch, and used to dispatch the vector coprocessors to communicate with the n cores through the crossbar switch, wherein the dispatcher dispatches the vector coprocessor according to current states of the vector coprocessors. The system level model building method of the multiple core sharing SIMD coprocessor significantly improves resource utilization rate of the multiple core sharing SIMD coprocessor through a sharing mechanism, and reduces system power consumption, and furthermore, compared with the prior art, the system level model building method of the multiple core sharing SIMD coprocessor efficiently completes a task under the circumstance that the quantity of resources is fixed.

Description

A kind of multinuclear shares the system-level model construction method of simd coprocessor
Technical field
The present invention relates to a kind of system-level model of processor.Particularly relate to the system-level model construction method that a kind of multinuclear shares simd coprocessor.
Background technology
SIMD (Single Instruction Multiple Data) is a kind of technology realizing data level and walk abreast, and performs identical operation to multiple data.Simultaneously the key of SIMD technology performs multiple arithmetic operation in an independent instruction, and to increase the handling capacity of processor, this feature makes SIMD technology be particularly suitable for the data-intensive computings such as multimedia application.The processor of present main flow has its SIMD subset of instructions, as the NEON subset of instructions of MMX or SSE of X86, ARM, and the Altivec subset of instructions etc. of PowerPC.In the polycaryon processor in modern times, each core on processor can be furnished with an exclusive simd coprocessor usually, also referred to as Vector Coprocessor (VP).But, due to its distinctive attributes, when some core performs the program of a shortage data level concurrency, this simd coprocessor is in idle state, and other endorse the program that can perform data level concurrency, but the simd coprocessor belonging to this core can only be used, and other idle simd coprocessors can not be used, thus cause the waste of resource and the increase of power consumption.
Be illustrated in figure 1 traditional architecture, suppose that a SOC (system on a chip) has 4 cores and 4 VP.In the structure shown here, each VP is specific to some core, can not share by other cores.When some core does not perform data-intensive program, this VP is in idle state, thus causes the waste of resource and power consumption.
Summary of the invention
Technical matters to be solved by this invention is, provides a kind of resource utilization that can improve vectorial coprocessor, reduces the system-level model that system power dissipation multinuclear shares simd coprocessor.
The technical solution adopted in the present invention is: a kind of multinuclear shares the system-level model construction method of simd coprocessor, include SOC (system on a chip), described SOC (system on a chip) is provided with n core and n vectorial coprocessor, wherein n is positive even numbers, described n vectorial coprocessor is connected with a described n nuclear phase by a cross bar switch, also be provided with respectively with a described n core, n vectorial coprocessor is connected for the scheduler be communicated with described core by described cross bar switch scheduling vector coprocessor with cross bar switch, wherein, described scheduler carrys out scheduling vector coprocessor according to each vectorial coprocessor current state.
Described each vectorial coprocessor describes current residing state by 3 status registers, wherein,
First status register, vectorial coprocessor for describing place is current by which core in n core to be used, or do not used by any one core, do not used by any one core when vectorial coprocessor is current, then be set as that described vectorial coprocessor is in idle condition, can be dispatched by scheduler;
Second status register, vectorial coprocessor for describing place is current to be in shared state or to be in specific state, the vectorial coprocessor that setting is in shared state can be scheduled device scheduling, and the vectorial coprocessor being in specific state cannot be scheduled device scheduling;
Third state register, for describing the index of vectorial coprocessor in residing core in institute's directed quantity coprocessor at place.
When core is current using multiple vectorial coprocessor time, wherein only have a vectorial coprocessor to be in specific state, other vectorial coprocessors are all then be in shared state.
SOC (system on a chip) is when original state, each vectorial coprocessor is in specific state, and wherein first vectorial coprocessor is specific to first core, and second vectorial coprocessor is specific to second core, the like, the index of each vectorial coprocessor is 0; The condition that vector coprocessor changes into shared state by specific state is: use the core of described vectorial coprocessor initiatively to abdicate the right to use of described vectorial coprocessor, after this described vectorial coprocessor is dispatched by scheduler.
When any one core in n core has the program of data level concurrency to need more vectorial coprocessor to participate in computing because performing, then described core need to the more vectorial coprocessor of scheduler application, scheduler runs a kind of dispatching algorithm of load balancing, the vectorial coprocessor being in idle condition is had if current, then that core applied for distributed to by described vectorial coprocessor by scheduler, idle vectorial coprocessor is not had if current, but there is the vectorial coprocessor being in shared state, then scheduler carries out redistributing of vectorial coprocessor resource according to situation at that time according to load balancing.
A kind of multinuclear of the present invention shares the system-level model construction method of simd coprocessor, and by shared mechanism, significantly improve the resource utilization of SIMD vector coprocessor, reduce system power dissipation, when resource is certain, the efficiency that task completes can be higher.
Accompanying drawing explanation
Fig. 1 is traditional system on chip structure;
Fig. 2 is 4 cores adopting method of the present invention to build share 4 VP system-level model by cross bar switch;
Fig. 3 is a scheduling instance.
In figure
1: core 2: vectorial coprocessor
3: cross bar switch 4: scheduler
Embodiment
Below in conjunction with embodiment and accompanying drawing, the system-level model construction method that a kind of multinuclear of the present invention shares simd coprocessor is described in detail.
A kind of multinuclear of the present invention shares the system-level model construction method of simd coprocessor, include SOC (system on a chip), described SOC (system on a chip) is provided with n core and n vectorial coprocessor, wherein n is positive even numbers, described n vectorial coprocessor is connected with a described n nuclear phase by a cross bar switch, also be provided with respectively with a described n core, n vectorial coprocessor is connected for the scheduler be communicated with described core by described cross bar switch scheduling vector coprocessor with cross bar switch, wherein, described scheduler carrys out scheduling vector coprocessor according to each vectorial coprocessor current state.
Described each vectorial coprocessor describes current residing state by 3 status registers, wherein,
First status register, the vectorial coprocessor for describing place is current by which core in n core to be used, and may be core 0, core 1 ... core (n-1).Or do not used by any one core, do not used by any one core when vectorial coprocessor is current, be then set as that described vectorial coprocessor is in idle condition, can be dispatched by scheduler;
Second status register, the vectorial coprocessor for describing place is current to be in shared state or to be in specific state.When core is current using multiple vectorial coprocessor time, wherein only have a vectorial coprocessor to be in specific state, other vectorial coprocessors are all then be in shared state.The vectorial coprocessor that setting is in shared state can be scheduled device scheduling, and the vectorial coprocessor being in specific state cannot be scheduled device scheduling;
Third state register, for describing the index of vectorial coprocessor in residing core in institute's directed quantity coprocessor at place.
A kind of multinuclear of the present invention shares the system-level model construction method of simd coprocessor, SOC (system on a chip) is when original state, each vectorial coprocessor is in specific state, wherein first vectorial coprocessor is specific to first core, second vectorial coprocessor is specific to second core, the like, the index of each vectorial coprocessor is 0; The condition that vector coprocessor changes into shared state by specific state is: use the core of described vectorial coprocessor initiatively to abdicate the right to use of described vectorial coprocessor, after this described vectorial coprocessor is dispatched by scheduler.
When any one core in n core has the program of data level concurrency to need more vectorial coprocessor to participate in computing because performing, then described core need to the more vectorial coprocessor of scheduler application, scheduler runs a kind of dispatching algorithm of load balancing, the vectorial coprocessor being in idle condition is had if current, then that core applied for distributed to by described vectorial coprocessor by scheduler, idle vectorial coprocessor is not had if current, but there is the vectorial coprocessor being in shared state, then scheduler carries out redistributing of vectorial coprocessor resource according to situation at that time according to load balancing.
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Fig. 2 is 4 cores sharing the system-level model construction method design of simd coprocessor according to multinuclear of the present invention to share 4 VP system-level model by cross bar switch.Scheduler 4 can communicate with 4 cores (Core), 1 and 4 vectorial coprocessor (VP) 2, carries out the vectorial distribution of coprocessor 2 resource and the configuration of cross bar switch 3 according to the application of core 1.
Under system initial state, 4 vectorial coprocessors 2 are specific to 4 cores 1 respectively, and the device 4 that can not be scheduled is dispatched.When wherein some core 1 perform one there is no a program of data level concurrency time, the right to use of outgoing vector coprocessor 2 can be allowed to scheduler 4 application, after this this vectorial coprocessor 2 is managed by scheduler 4, and its state also changes into shared state by specific state.Due to the dynamic characteristic of program itself, in some time periods, due to the scheduling of operating system, allow the core 1 of outgoing vector coprocessor 2 likely can perform again the program that has data level concurrency.Now, this core 1 needs vectorial coprocessor 2 resource of applying for some to scheduler 4, if the vectorial coprocessor 2 of current available free state, so a certain amount of vectorial coprocessor 2 can be distributed to this core 1 by scheduler 4, and the state of one of them vectorial coprocessor 2 is changed into exclusive state by sharing state, other the vectorial coprocessor 2 distributing to this core is still in shared state.Under any circumstance, one allows the usufructuary core of outgoing vector coprocessor, if again apply for vectorial coprocessor resource, so this core is to I haven't seen you for ages application to a vectorial coprocessor resource, will ensure that each core can not " be died of hunger " like this.
A scheduling instance shown in Fig. 3, in order to illustrate Share Model of the present invention how to carry out work.Be divided into two to block, left side is time state, and right side is 4 vectorial coprocessors (VP).At system initial state, namely State 0 state, 4 VP respectively belong to a core respectively.Each VP describes current residing state by three parameters, and with VP0 citing, three parameters are respectively: C0/0/0.C0 represents current VP0 and is used by core 0; If second parameter 0 represents and is currently in specific state, be namely specific to core 0, the device that can not be scheduled is dispatched, if 1, is then in shared state, and the device that can be scheduled is dispatched; 3rd parameter be 0 expression in all VP belonging to core 0, the index of this VP is 0.
After system cloud gray model, core 0, core 1 and core 2, owing to not performing the program of data level concurrency, therefore initiatively to scheduler application, are abdicated the right to use of VP, are now entered in State 1 state.In State 1 state, for VP0, its three parameters are: s/1/2.First parameter is that this VP of behalf is in charge of by scheduler, and second parameter is that the current VP0 of 1 expression is in shared state, the 3rd parameter be 2 representatives in the middle of all VP belonging to scheduler management, VP0 index is wherein 2.
System continues to run, in certain a period of time, core 3 is owing to performing a large amount of data-intensive tasks, therefore initiatively to scheduler application VP resource, idle state is in and the VP being in shared state because scheduler is manage 3, therefore these 3 VP resources are all distributed to core 3 to use, enter State 1 state.In a state in which, 4 VP are used by core 3, and have 3 VP (VP0 ~ VP3) to be still in shared state, a VP (VP3) is in exclusive state, and index is respectively 0 to 3.
System continues down to run, and in section sometime, due to the scheduling of operating system, core 0 is scheduled into one the program of data level concurrency, and therefore core 0 is initiatively to scheduler application VP resource, and now core 3 is still using whole VP resources.Due to scheduler operation is load balance scheduling algorithm, therefore core 0 is distributed to for two that used by core 33 can be in the VP of shared state, can arrange the status register of these two VP simultaneously, random is set to exclusive state by these two VP by shared state, another still for sharing state, enters State 3 state.In State 3 state, core 0 and core 3 use 2 VP resources respectively, and have one to be exclusive state in two VP that each core uses, one is shared state.According to running situation, the VP resource being in shared state still can be scheduled device scheduling, and whole process is the process of a dynamic conditioning always.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (5)

1. a multinuclear shares the system-level model construction method of simd coprocessor, include SOC (system on a chip), described SOC (system on a chip) is provided with n core and n vectorial coprocessor, wherein n is positive even numbers, it is characterized in that, described n vectorial coprocessor is connected with a described n nuclear phase by a cross bar switch, also be provided with respectively with a described n core, n vectorial coprocessor is connected for the scheduler be communicated with described core by described cross bar switch scheduling vector coprocessor with cross bar switch, wherein, described scheduler carrys out scheduling vector coprocessor according to each vectorial coprocessor current state.
2. a kind of multinuclear according to claim 1 shares the system-level model construction method of simd coprocessor, it is characterized in that, described each vectorial coprocessor describes current residing state by 3 status registers, wherein,
First status register, vectorial coprocessor for describing place is current by which core in n core to be used, or do not used by any one core, do not used by any one core when vectorial coprocessor is current, then be set as that described vectorial coprocessor is in idle condition, can be dispatched by scheduler;
Second status register, vectorial coprocessor for describing place is current to be in shared state or to be in specific state, the vectorial coprocessor that setting is in shared state can be scheduled device scheduling, and the vectorial coprocessor being in specific state cannot be scheduled device scheduling;
Third state register, for describing the index of vectorial coprocessor in residing core in institute's directed quantity coprocessor at place.
3. a kind of multinuclear according to claim 2 shares the system-level model construction method of simd coprocessor, it is characterized in that, when core is current using multiple vectorial coprocessor time, wherein only have a vectorial coprocessor to be in specific state, other vectorial coprocessors are all then be in shared state.
4. a kind of multinuclear according to claim 2 shares the system-level model construction method of simd coprocessor, it is characterized in that, SOC (system on a chip) is when original state, each vectorial coprocessor is in specific state, wherein first vectorial coprocessor is specific to first core, second vectorial coprocessor is specific to second core, the like, the index of each vectorial coprocessor is 0; The condition that vector coprocessor changes into shared state by specific state is: use the core of described vectorial coprocessor initiatively to abdicate the right to use of described vectorial coprocessor, after this described vectorial coprocessor is dispatched by scheduler.
5. a kind of multinuclear according to claim 4 shares the system-level model construction method of simd coprocessor, it is characterized in that, when any one core in n core has the program of data level concurrency to need more vectorial coprocessor to participate in computing because performing, then described core need to the more vectorial coprocessor of scheduler application, scheduler runs a kind of dispatching algorithm of load balancing, the vectorial coprocessor being in idle condition is had if current, then that core applied for distributed to by described vectorial coprocessor by scheduler, idle vectorial coprocessor is not had if current, but there is the vectorial coprocessor being in shared state, then scheduler carries out redistributing of vectorial coprocessor resource according to situation at that time according to load balancing.
CN201410669796.XA 2014-11-20 2014-11-20 System level model building method of multiple core sharing SIMD coprocessor Pending CN104391821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410669796.XA CN104391821A (en) 2014-11-20 2014-11-20 System level model building method of multiple core sharing SIMD coprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410669796.XA CN104391821A (en) 2014-11-20 2014-11-20 System level model building method of multiple core sharing SIMD coprocessor

Publications (1)

Publication Number Publication Date
CN104391821A true CN104391821A (en) 2015-03-04

Family

ID=52609727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410669796.XA Pending CN104391821A (en) 2014-11-20 2014-11-20 System level model building method of multiple core sharing SIMD coprocessor

Country Status (1)

Country Link
CN (1) CN104391821A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107636638A (en) * 2015-05-21 2018-01-26 高盛有限责任公司 Universal parallel computing architecture
US11449452B2 (en) 2015-05-21 2022-09-20 Goldman Sachs & Co. LLC General-purpose parallel computing architecture
CN115993949A (en) * 2023-03-21 2023-04-21 苏州浪潮智能科技有限公司 Vector data processing method and device for multi-core processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030084309A1 (en) * 2001-10-22 2003-05-01 Sun Microsystems, Inc. Stream processor with cryptographic co-processor
US20050021871A1 (en) * 2003-07-25 2005-01-27 International Business Machines Corporation Self-contained processor subsystem as component for system-on-chip design
CN101620587A (en) * 2008-07-03 2010-01-06 中国人民解放军信息工程大学 Flexible reconfigurable task processing unit structure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030084309A1 (en) * 2001-10-22 2003-05-01 Sun Microsystems, Inc. Stream processor with cryptographic co-processor
US20050021871A1 (en) * 2003-07-25 2005-01-27 International Business Machines Corporation Self-contained processor subsystem as component for system-on-chip design
CN101620587A (en) * 2008-07-03 2010-01-06 中国人民解放军信息工程大学 Flexible reconfigurable task processing unit structure

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107636638A (en) * 2015-05-21 2018-01-26 高盛有限责任公司 Universal parallel computing architecture
CN107636638B (en) * 2015-05-21 2021-10-26 高盛有限责任公司 General parallel computing architecture
US11449452B2 (en) 2015-05-21 2022-09-20 Goldman Sachs & Co. LLC General-purpose parallel computing architecture
CN115993949A (en) * 2023-03-21 2023-04-21 苏州浪潮智能科技有限公司 Vector data processing method and device for multi-core processor

Similar Documents

Publication Publication Date Title
CN105045658B (en) A method of realizing that dynamic task scheduling is distributed using multinuclear DSP embedded
CN102147722B (en) Realize multiline procedure processor and the method for central processing unit and graphic process unit function
CN103761139B (en) General purpose computation virtualization implementation method based on dynamic library interception
CN104239144A (en) Multilevel distributed task processing system
CN104331321A (en) Cloud computing task scheduling method based on tabu search and load balancing
CN112463709A (en) Configurable heterogeneous artificial intelligence processor
CN104854563A (en) Automated profiling of resource usage
CN106933669A (en) For the apparatus and method of data processing
CN105893158A (en) Big data hybrid scheduling model on private cloud condition
Kathiravelu et al. An adaptive distributed simulator for cloud and mapreduce algorithms and architectures
CN102306139A (en) Heterogeneous multi-core digital signal processor for orthogonal frequency division multiplexing (OFDM) wireless communication system
Huo et al. An improved multi-cores parallel artificial Bee colony optimization algorithm for parameters calibration of hydrological model
CN104391821A (en) System level model building method of multiple core sharing SIMD coprocessor
Song et al. Energy efficiency optimization in big data processing platform by improving resources utilization
Iserte et al. Improving the management efficiency of GPU workloads in data centers through GPU virtualization
CN104360962B (en) Be matched with multistage nested data transmission method and the system of high-performance computer structure
CN103020008B (en) The reconfigurable micro server that computing power strengthens
CN115775199B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN105653243B (en) The task distributing method that a kind of graphics processing unit Multi-task Concurrency performs
CN110083454A (en) A kind of mixing cloud service method of combination with quantum computer
CN102333088A (en) Server resource management system
CN105117281A (en) Task scheduling method based on task application signal and execution cost value of processor core
CN113723931B (en) Workflow modeling method suitable for multi-scale high-flux material calculation
CN111522637B (en) Method for scheduling storm task based on cost effectiveness
CN104699520B (en) A kind of power-economizing method based on virtual machine (vm) migration scheduling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150304