CN101216725B

CN101216725B - Dynamic power consumption control method for multithread predication by stack depth

Info

Publication number: CN101216725B
Application number: CN2008100193441A
Authority: CN
Inventors: 戚隆宁; 黄少珉; 胡晨
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2008-01-04
Filing date: 2008-01-04
Publication date: 2011-04-27
Anticipated expiration: 2028-01-04
Also published as: CN101216725A

Abstract

A dynamic power consumption controlling method using the depth of stacking for multithreading prediction is suitable for supplying power to the battery. The method includes various idle mode components of low power consumption and uses the devices of multi-task imbedded system. The method includes the following steps: step one: using the cluster mode of operating request to describe the characteristics of the sequence Phi of operating request by a single thread to the components; step 2: using two-stage hashtable to predict the probability distribution of the request interval under single thread mode; step 3, taking the thread as an independent request source and grouping the request source according to the task to which the thread belong and the entry address of the thread; step 4: calculating, according to the possibility distribution of the idle time of the components, the optimum overtime threshold kopt and the optimum power consumption mode sopt under multiple power consumption modes and single thread. The method improves the effective hit rate for prediction and further reduces the power consumption for the idle consumption-manageable components.

Description

Utilize stack depth to carry out the dynamic power consumption control method of multithreading prediction

Technical field

The present invention is applicable to powered battery, has multiple idle low-power consumption mode parts, and adopts the equipment of multitask embedded system.Belong to the embedded system low-power consumption technical field.

Background technology

Many parts of embedded system can provide the operational mode of multiple power consumption and performance class, and allow to pay certain performance cost and reduce power consumption.For example, processor provides multiple frequency of operation and operating voltage, and that hard disk provides is idle, multiple low-power consumption mode when idle such as standby and sleep etc.Because the switching between pattern generally all will be paid the cost on performance and the power consumption, so be not that to enter free time for any length the power consumption of idle pulley low more good more.The best low-power consumption mode that the free time of different length can enter is difference to some extent, and the length that can accurately dope free time will help reducing power consumption.

Traditional dynamic power management strategy simply is considered as the load of parts to have ignored the information of system and application from single request source.And real system multitask multithreading normally, it is a plurality of that the source of load often has, and have visibly different feature.Single request source model of conventional dynamic power managed strategy will cause the load analysis difficulty for non-stationary, and tactful prediction effect is undesirable.In the multitask environment of embedded system, because task can repeatedly be carried out, these processes that belong to same task have identical run time version, have similar behavioural characteristic.Equally, carry out identical its behavior of thread in entry address and also have similarity.Utilize these similaritys can improve the prediction effect of strategy.In most systems, operation of equipment is encapsulated in the specific api function, and application task is realized operation requests to equipment by calling api function.The invoked procedure of these api functions also has certain rules, can be used further to improve the prediction effect of strategy.

Summary of the invention

Technical matters: the purpose of this invention is to provide a kind of dynamic power consumption control method of utilizing stack depth to carry out the multithreading prediction, reduce the power consumption of embedded system parts.

Technical scheme: the present invention is dark by stack, function return address and request bunch integrated mode predictions request (Stack Based Prediction interval time, SBP), be request source independently with thread then, and request source divided into groups according to task under the thread and entry address.Obey under the assumed condition of same distribution in the request that request source on the same group produces, comprehensively each request source calculates the parts probability distribution of free time to the predicting the outcome of requesting interval time.Last according to the parts probability distribution of free time, calculate optimum overtime thresholding and optimum power consumption mode under many power consumption modes simple gate limit.Finish the back in Request Processing and start timer, when parts free time surpasses the optimum overtime thresholding that calculates, then control assembly enters corresponding optimum power consumption mode at once.

This method is specific as follows:

The first step: bunch integrated mode RCM by operation requests describes the feature of single thread to the operation requests sequence Φ of parts, and its computing method are as follows:

{RCM}_{Φ} = \mod_{2^{32}} (\underset{r_{i} &Element; Φ}{Σ} RA (r_{i})),

Φ＝{r _i|T _RI(r _i，r _i-1)＜T _BE(s ₂)}

In the formula:

r _i---system is to the operation requests of parts;

RA (r _i)---produce operation requests r _iThe return address of function;

Mod _x(y)---think that radix x carries out modular arithmetic to y;

T _RI(r _i, r _I-1)---operation requests r _iWith r _I-1Between free time;

s ₂---the 2nd grade of idle pulley;

T _BE(s ₂)---enter the break-even time of the 2nd grade of idle pulley, promptly enter the shortest free time that idle pulley can energy efficient;

Second step: adopt the two-stage Hash table to predict the probability distribution of single-threaded following requesting interval time, first order Hash table is according to mod ₃₇[RA (r _i)] the function return address Hash table that makes up, solving hash-collision with chained list, the index node of this Hash table comprises the statistics F that free time after the operation requests with same functions return address is distributed _RA, and second level Hash table: according to mod ₁₇[SD (r _i)] and mod ₁₇The dark Hash table of stack that [RCM] makes up respectively and request bunch integrated mode Hash table, wherein SD (r _i) expression generation operation requests r _iThe time stack depth, promptly storehouse top and storehouse the bottom poor; The index node of the dark Hash table of stack has comprised the free time distribution F after the operation requests with same functions return address and stack depth _SDThe index node of request bunch integrated mode Hash table has then comprised the free time distribution F after the operation requests with same functions return address and request bunch integrated mode _RCMSingle-threaded following requesting interval time, i.e. the probability distribution F of free time after the request _PredictCan predict F when only function return address Hash table hits according to the hit situation of Hash table _Predict=F _RAF when only the dark Hash table of stack hits _Predict=F _SDF when only a request bunch integrated mode Hash table hits _Predict=F _RCMWhen a dark Hash table of stack and a request bunch integrated mode Hash table hit F simultaneously _Predict=F _SD* F _RCM

The 3rd step: be request source independently with thread, and request source is divided into groups according to task under the thread and entry address, belong to the identical thread of same task and execution entry address and be included into same grouping, obey under the assumed condition of same distribution in the request of request source generation on the same group, comprehensive each request source predicts the outcome to the requesting interval time, calculates parts and enters i level idle pulley s _iProbability P (s _i); Computing method are as follows,

P (s_{i}) = \{\begin{matrix} Π_{j = 1}^{N_{T}} P_{j} (s_{i}, Δ t_{j}) - Π_{j = 1}^{N_{T}} P_{j} (s_{i + 1}, Δ t_{j}), (i < n) \\ Π_{j = 1}^{N_{T}} P_{j} (s_{i}, Δ t_{j}), (i = n) \end{matrix}

P_{j} (s_{i}, Δ t_{j}) = 1 - \frac{F_{predict} (T_{BE} (s_{i}) + Δ t_{j}) - F_{predict} (Δ t_{j})}{1 - F_{predict} (Δ t_{j})}

In the formula

N _T---the current total number of threads of using parts, i.e. request source sum;

The idle pulley sum of n---parts;

Δ t _j---the time-delay that request source j is current;

P _j(s _i, Δ t _j)---request source j is at idle Δ t _jContinue the idle time after time above i level idle pulley s _iThe probability of break-even time;

The 4th step:, calculate the optimum overtime thresholding k under many power consumption modes simple gate limit according to the parts probability distribution of free time _OptWith optimum power consumption mode s _Opt, computing method are as follows,

E_{total} (rΔt, s_{i}) = Σ_{j = 1}^{r} (a_{1} T_{j} + n_{j} b_{1}) + Σ_{j = r + 1}^{q} (a_{i} T_{j} + n_{j} ((a_{1} - a_{i}) rΔt + b_{i}))

E_{opt} = \underset{1 \leq r \leq q}{\min_{1 \leq i \leq m}} {E_{total} (rΔt, s_{i})} = E_{total} (r_{opt} Δt, s_{opt})

k _opt＝r _optΔt

In the formula

Δ t---the interval time of the overtime thresholding of candidate;

E _Total(r Δ t, s _i)---parts enter i level idle pulley s behind process r Δ t time-out time _iEnergy consumption expectation;

Q---the sum of the overtime thresholding of candidate;

a _i---the average power consumption of i level idle pulley;

The equivalence conversion energy consumption of bi---i level idle pulley;

T _j---in the free time distribution statistics, drop on the length summation of j the free time in interval;

n _j---in the free time distribution statistics, free time is dropped on the sum of j the request in interval; Finish the back in Request Processing and start timer, in case parts free time surpasses the optimum overtime thresholding k that calculates _Opt, then control assembly enters corresponding optimum power consumption mode s at once _Opt

Beneficial effect: the dynamic power consumption control method of utilizing stack depth to carry out the multithreading prediction of the present invention has improved the effective hit rate and the discrimination of dynamic power management strategy prediction, has reduced the power consumption when parts are idle.Experimental result shows that effective hit rate of the present invention is higher 7～21 percentage points than conventional dynamic power managed strategy, but power consumption reduction amplitude is about 60%～90% of maximum energy efficient.

Description of drawings

Fig. 1. the energy consumption of idle pulley and free time concern synoptic diagram, four straight line e are arranged among the figure ₁=a ₁T+b ₁, e ₂=a ₂T+b ₂, e ₃=a ₃T+b ₃, e ₄=a ₄T+b ₄, represent the 1st grade of relation respectively to the energy consumption e and the free time t of the 4th grade of idle pulley.Horizontal ordinate is represented free time t, and ordinate is represented the energy consumption E under the idle pulley.Article four, the horizontal ordinate of three of straight line intersection points is respectively t ₂, t ₃And t ₄,

Fig. 2. based on the data structure synoptic diagram of stack information prediction, the Hash table (RA Hash table) of a function return address is arranged among the figure, this Hash table has 37 index entries, and each index entry solves hash-collision by chained list.This chained list is by some chained list nodes, as formations such as RA1, RA2, by pointer link (representing with solid arrow among the figure).These nodes are key word with the function return address all, and comprise the probability distribution (representing with triangle among the figure, in upper right corner signal) of free time, a Hash table (SD Hash table) and the Hash table (RCM Hash table) of asking bunch integrated mode that stack is dark.The SD Hash table has 17 index entries, and each index entry solves hash-collision by chained list.This chained list, is linked by pointer as formations such as SD1, SD2 by some chained list nodes.These nodes are key word with stack deeply, and comprise the probability distribution of free time.The RCM Hash table has 17 index entries, and each index entry solves hash-collision by chained list.This chained list, is linked by pointer as formations such as RCM1, RCM2 by some chained list nodes.These nodes are key word with request bunch integrated mode, and comprise the probability distribution of free time,

Fig. 3. based on the prediction schematic flow sheet of stack information prediction,

Fig. 4. based on the renewal schematic flow sheet of stack information prediction,

Fig. 5. according to process and thread grouping data predicted structural representation.A Hash table (PG Hash table) and a two-dimentional Hash lookup table (P/T ID Hash table) that a process group is arranged among the figure.The PG Hash table has 256 index entries, and each index entry solves hash-collision by chained list.This chained list, is linked by pointer as formations such as PG1, PG2 by some chained list nodes.Each node is a key word with process group sign (PGID), and comprises the probability distribution that predicts the outcome and the Hash table (TG Hash table) of a sets of threads.The TG Hash table has 17 index entries, and each index entry solves hash-collision by chained list.This chained list, is linked by pointer as formations such as TG1, TG2 by some chained list nodes.Each node is with sets of threads sign (TGID), and promptly the thread entry address is a key word, and comprises the fallout predictor (representing with circle in the drawings, as SBP1 and SBP2 etc.) of a free time, the probability distribution that predicts the outcome and thread tabulation.The thread tabulation is by some chained list nodes, as formations such as TI1, TI2, by pointer link (representing with solid arrow in the drawings).Each node is a key word with thread identification (TID), and comprises the attribute information of thread.P/T ID Hash table is one 37 * 37 a matrix, according to process identification (PID) (PID) and thread identification (TID) index matrix element.Each element of matrix is one and searches chained list.Chained list points to thread tabulation node, and as TI1, TI2 etc., node is by pointer link (representing with empty arrow in the drawings).

Embodiment

Fig. 1. described the idle energy consumption of different idle pulley lower member and the relation of free time, every straight line represents that promptly idle energy consumption is with the variation of free time, the slope a of straight line among the figure _iRepresent the average power consumption of i level idle pulley, intercept b _iRepresent the equivalence conversion energy consumption of i level idle pulley.The straight line of other idle pulley correspondence of adjacent level intersects three intersection points that produce, the horizontal ordinate t of these intersection points in twos _iBe the break-even time T between i level idle pulley and the 1st grade of idle pulley _BE(s _i).

Fig. 2. illustrated to carry out single-threaded data predicted structure based on stack information.Whole data structure is a tree based on Hash table.The root of tree is the Hash table (RA Hash Table) of a function return address, the function return address is hashed on the concordance list of [0,36] by hash function, and we claim that this concordance list is the RA Hash table.In order to solve hash-collision (Hash Collision), we adopt the single-track link table structure to write down each RA node.The data structure of RA node comprises five parts:

The function return address: the key word of RA node is used for the coupling of searching of Hash table;

Prediction distribution: at Fig. 2. in be expressed as leg-of-mutton mark.Adopt histogrammic form to write down of the distribution of all function return addresses for request free time thereafter of node RA value.According to idle pulley and the analysis that concerns between free time, we can come the reduced representation distribution of free time by the best idle pulley of statistics free time correspondence.This is distributed as later prediction foundation is provided;

The Hash table (SD Hash Table) that stack is dark: the mode of retrieving deeply according to stack is provided, and hash function is the same with the RA Hash table to adopt modular arithmetic, and only the radix of mould is 17, and importing variable into is the dark rather than function return address of stack.Also adopt list structure to write down each SD node and solved hash-collision.The data structure of SD node only comprises the dark key word of stack and based on the prediction distribution of SD;

The Hash table (RCM Hash Table) of request bunch integrated mode: the mode of retrieving according to request bunch integrated mode is provided, institutional framework is similar to the SD Hash table, and the data structure of different is RCM node comprises is RCM key word and based on the prediction distribution of RCM;

Node pointer: point to next stage RA node, constitute chained list.If current node is last record, then this pointer is empty.

Fig. 3. the prediction flow process of carrying out single-threaded prediction based on stack information has been described:

[prediction beginning] Request Processing finishes, and obtains RA, SD and the RCM of request, enters [step 1];

[step 1] searches the RA Hash table, judges whether RA hits.If hit and enter [step 3], otherwise enter [step 2];

[step 2] is provided with RA hit in tag flag _RA=false.Predict that free time is shorter, should enter ready idle pulley, s _Predict=s ₁, [prediction finishes];

[step 3] is provided with RA hit in tag flag _RA=true.Obtain the RA node, search the SD Hash table on this node, judge whether SD hits.If hit and enter [step 7], otherwise enter [step 4];

[step 4] is provided with SD hit in tag flag _SD=false.Search the RCM Hash table on the RA node, judge whether RSM hits.If hit and enter [step 6], otherwise enter [step 5];

[step 5] is provided with RCM hit in tag flagR _CM=false.Prediction distribution F according to the RA node _RACarry out model prediction, s _Predict=Predict (F _RA), [prediction finishes];

[step 6] is provided with RCM hit in tag flag _RCM=true.Obtain the RCM node, according to the prediction distribution F of this node _RCMCarry out model prediction, s _Predict=Predict (F _RCM), [prediction finishes];

[step 7] is provided with SD hit in tag flag _SD=true.Obtain the SD node.Search the RCM Hash table on the RA node, judge whether RSM hits.If hit and enter [step 9], otherwise enter [step 8];

[step 8] is provided with RCM hit in tag flag _RCM=false.Prediction distribution F according to the SD node _SDCarry out model prediction, s _Predict=Predict (F _SD), [prediction finishes];

[step 9] is provided with RCM hit in tag flag _RCM=true.Obtain the RCM node, according to the prediction distribution F of this node _RCMPrediction distribution F with the SD node _SDCarry out model prediction, s _Predict=CombinePredict (F _RCM, F _SD), [prediction finishes];

Fig. 4. the more new technological process that carries out single-threaded prediction based on stack information has been described:

[upgrading beginning] has new request to arrive, and obtains apart from the real space time interval t of last request _Idle, and should free time the best idle pulley S of correspondence _Opt(t _Idle) (computing method of best idle pulley are referring to formula (1.2)).Enter [step 41];

[step 41] is according to RA hit in tag flag _RAJudge last time, whether the RA of request hit.If hit and enter [step 44], otherwise enter [step 42];

The best idle pulley S of [step 42] judgement correspondence of actual free time _Opt(t _Idle) whether be ready mode.Be that ready mode enters [step 43], otherwise [upgrade and finish];

[step 43] sets up new RA node for request last time, adds the chained list stem of manipulative indexing in the RA Hash table to.In the SD of this RA node Hash table and RCM Hash table, add newly-built SD node and RCM node respectively.And the prediction distribution of initialization RA node, SD node and RCM node.[upgrade and finish];

[step 44] removes RA hit in tag flag _RA=false.Upgrade prediction distribution (the i.e. best idle pulley S of increase correspondence of actual free time of the corresponding RA node of request last time _Opt(t _Idle) statistical value), enter [step 45];

[step 45] is according to SD hit in tag flag _SDJudge last time, whether the SD of request hit.If hit and enter [step 46], otherwise enter [step 47];

[step 6] removes SD hit in tag flag _SD=false.Upgrade the prediction distribution of the corresponding SD node of request last time, enter [step 49];

The best idle pulley S of [step 47] judgement correspondence of actual free time _Opt(t _Idle) whether be ready mode.Be that ready mode enters [step 49], otherwise enter [step 48];

[step 48] is according to the newly-built SD node of SD information of asking last time and add the SD Hash table to, and the initialization prediction distribution, enters [step 49];

[step 49] is according to RCM hit in tag flag _RCMJudge last time, whether the RCM of request hit.If hit and enter [step 410], otherwise enter [step 411];

[step 410] removes RCM hit in tag flag _RCM=false.Upgrade the prediction distribution of the corresponding RCM node of request last time, [upgrade and finish];

The best idle pulley S of [step 411] judgement correspondence of actual free time _Opt(t _Idle) whether be ready mode.Be that ready mode enters [upgrade and finish], otherwise enter [step 412];

[step 412] is according to the newly-built RCM node of RCM information of asking last time and add the RCM Hash table to, and the initialization prediction distribution, [upgrade and finish].

Fig. 5. illustrated under the multithreading grouping data predicted structure.The major part of data structure is a search tree based on Hash table.The top of tree is a process group Hash table (PG Hash Table), can search according to the executable file path of process.We are mapped as one 32 integer value with the executable file path by following character string hash function, be called the process group sign (Processes Group Identifier, PGID).This hash algorithm has utilized 7/15 biased ring shift characters that move to add up, and the length of character string also as a calculating factor.The least-significant byte of process group sign is used as the index of PG Hash table, so the length of PG Hash table is 2 ⁸=256.Though for embedded system, so many PG have been enough to distinguish various application programs,, still adopt the single-track link table structure to write down each PG node for solving possible hash-collision.The PG node only comprises four contents: process group sign PGID, empty prediction distribution, attached sets of threads Hash table (TG Hash Table) and next stage PG node pointer.Wherein PGID is the key word of node, and the TG Hash table then is that the entry address with thread is a key word, is called sets of threads sign TGID.Its hash function employing radix is 17 modular arithmetic, and TGID is hashed on the concordance list of [0,16].The TG Hash table has equally also adopted single-track link table to solve hash-collision.The TG node is made up of five parts:

The thread entry address: i.e. TGID, 32 integer values, the key word of TG node is used for the coupling of searching of TG Hash table;

SBP fallout predictor: at Fig. 5. middlely represent, utilize the SBP forecasting techniques that the prediction service of idle pulley is provided for thread on the same group with circle.Generally speaking, have similar behavioural characteristic with the thread that enters the mouth, the thread execution flow processs of different inlets differ greatly.So the SBP fallout predictor of the shared TG node of thread on the same group, the SBP fallout predictor of each TG node is then separate;

The thread tabulation: (Thread Information, the TI) single-track link table of node formation have been preserved the thread information that the SBP forecasting institute needs by thread information.Each TI node is represented a thread that has moved, and also just means an independently request source for the model of MSR;

The prediction statistical information: with the histogram-fitting distribution of free time, and the method for employing moving window, for each idle pulley provides distribution statistics information.

Node pointer: point to next stage TG node, constitute chained list.If current node is last record, then this pointer is empty.

TI node in the thread tabulation provides following thread information:

Identification information: comprise the identification information of thread and identification information two parts of affiliated process.The identification information of thread is thread identification TID and sets of threads sign TGID, and the identification information of process is process identification (PID) PID and process group sign PGID;

Stack information: comprise initial stack pointer, current stack pointer, current function return address RA and current request bunch integrated mode RCM.It is dark that wherein initial stack pointer and current stack pointer are used to calculate stack;

Temporal information: write down the moment t that last time, Request Processing finished _Previous

By TGID and PGID we can be to the processing of classifying of numerous threads, but more often need to find efficiently thread information, so we have set up a two-dimentional Hash lookup table (P/T ID HashTable), be used for searching fast the TI node.Its data structure is at Fig. 5. in be expressed as one 37 * 37 array, be that 37 modular arithmetic hashes to TID and PID [0,36] respectively and goes up and constitute line index and column index by radix.Each unit in the P/T ID Hash table all points to a single-track link table that is made of the TI node, and this chained list is different with the thread chained list of TG node, at Fig. 5. and the empty arrow of middle usefulness is represented.Generally, embedded OS concurrent process number can be not too many, and plain length with this chained list generally can not surpass 3.Factors such as the consideration embedded system memory is limited for the scale of further restricted T I node, and improve seek rate, only the thread that once and used PMC are carried out index.New process or thread have only when having used PMC, just can be added in the concordance list.Need to use PMC thread establishment and withdraw from, and the arrival of PMC request all may cause the variation of node: 1. during thread creation, search the PG Hash table and the TG Hash table confirms whether use PMC according to PGID and TGID, only when hitting, just add TI node (joining simultaneously in the TI chained list of P/T ID Hash table); When 2. thread withdraws from, search P/T ID Hash table, only deletion TI node (from the TI chained list of TG node, deleting simultaneously) when hitting according to PID and TID.Even but the TI node chained list of TG node is all deleted, the TG node still keeps, for new thread is got ready; When 3. the PMC request arrives, search P/T ID Hash table, only when miss, add TI node (generate PGID and TGID simultaneously, join in the TI chained list of corresponding TG node,, then also want newly-built PG or TG) if fail to hit PG or TG according to PID and TID.

But at first at power consumption management component (Power Manageable Component PMC) sets up the linear model of energy consumption when idle, as Fig. 1. shown in.The horizontal ordinate t of three intersection points among the figure ₂, t ₃, t ₄To be divided into four time periods free time, each time period all has one makes the minimum idle pulley of energy consumption, and the idle pulley difference of different time sections correspondence, and the average power consumption of the idle pulley of time period correspondence more backward is low more.Therefore, be the iptimum relationship function E of energy consumption and free time with these three intersection points and the piecewise function of the broken line that line segment was constituted that is connected intersection point (overstriking demonstration) representative _Opt(t):

E_{opt} (t) = \{\begin{matrix} a_{1} t + b_{1}, (0 \leq t < t_{2}) \\ a_{i} t + b_{i}, (t_{i} \leq t < t_{i + 1}, 1 < i \leq n - 1) \\ a_{n} t + b_{n}, (t_{n} \leq t) \end{matrix} - - - (1.1)

Can also obtain the idle pulley of corresponding optimum energy consumption and the relation function S of free time equally _Opt(t):

S_{opt} (t) = \{\begin{matrix} s_{1}, (0 \leq t < t_{2}) \\ s_{i}, (t_{i} \leq t < t_{t + 1}, 1 < i \leq n - 1) \\ s_{n}, (t_{n} \leq t) \end{matrix} - - - (1.2)

Like this for given free time length, according to S _Opt(t) and E _Opt(t) just can calculate best idle pulley and minimum energy consumption.

The difference between the single request has been reflected in the dark and function return address of stack, and also exists certain contact between the request, and the length of free time is relevant with the feature of asking bunch collection.We with break-even time of the 2nd grade of idle pulley as thresholding, the requesting interval time is defined as a request bunch Φ (Request Cluster) less than the continuous request of this thresholding.That is to say that at interval free time is all greater than break-even time of the 2nd grade of idle pulley between any two requests bunch.The feature of request bunch represents with the function return address sum of all requests that it comprised, be defined as ask bunch integrated mode (Requests-Clustering Mode, RCM):

Φ＝{r _i|T _RI(r _i，r _i-1)＜T _BE(s ₂)}

{RCM}_{Φ} = \mod_{2^{32}} (\underset{r_{i} &Element; Φ}{Σ} RA (r_{i})) - - - (1.3)

In the formula

r _i---the PMC operation requests;

T _RI(r _i, r _I-1)---request r _iWith r _I-1Between free time;

s ₂---(ready idle pulley is s to the 2nd grade of idle pulley ₁);

Mod _x(y)---think that radix x carries out modular arithmetic to y;

RA (r _i)---request r _iThe function return address.

We just can predict, function return address dark according to stack and these stack information of request bunch integrated mode comparatively accurately to the free time of single thread, i.e. SBP.The data structure of SBP is as Fig. 2. shown in, based on this data structure, the prediction flow process of SBP is as Fig. 3. shown in.The Predict algorithm of predicting according to the prediction distribution of single thread is to seek the idle pulley s that satisfies following condition from the angle of probability _Predict:

P (s_{1}) \leq \underset{i > 1}{Σ} P (s_{i}) - - - (1.4)

P (s_{predict}) = \max_{i > 1} {P (s_{i})}

In the formula

P (s _i)---enter idle pulley s in the prediction distribution _iProbability.

If do not find the idle pulley that satisfies above-mentioned condition, then be predicted as and enter ready idle pulley, i.e. s _Predict=s ₁And according to the prediction distribution F of SD and RCM _SDAnd F _RCMThe CombinePredict algorithm that carries out associated prediction be in earlier will prediction distribution with SD and RCM identical free time correspondence probable value multiply each other i.e. F in twos _SD* F _RCM, obtain a new distribution F _Predict, and then go out the idle pulley s that should enter according to the Predict algorithm predicts _Predict=Predict (F _Predict).The original date structure only has the RA Hash table of a sky, can add more RA node gradually through renewal with after adjusting, and the prediction distribution of each node is also being brought in constant renewal in.In order to utilize historical data and off-line analysis, we import to whole data structure in the file, and making after SBP starts can be according to the whole data structure of this document initialization, thereby avoids setting up once more the time that data structure will spend.

In the multitask environment of embedded system, because task can repeatedly be carried out, these processes that belong to same task have identical run time version, have similar behavioural characteristic, we with its be included into same process group (Processes Group, PG).Equally, also have similarity because carry out identical its behavior of thread in entry address, we also it is included into same sets of threads (Threads Group, TG).We can obtain to be created all information of process and thread when establishment process and thread, and obtain notice when process and thread withdraw from, and these information are sent to the power managed device PM of PMC.That we pay close attention in the establishment information of process is sign (the Process Identifier of process, PID) and the path of executable file (being application name and place catalogue), and we pay close attention in the establishment information of thread be thread sign (ThreadIdentifier, TID), entry address (Thread Entry) and initial stack pointer.Process ID and Thread Id belong to multidate information, are the unique identifications of distinguishing different processes and different threads.The path of the executable file of process and the entry address of thread then belong to static information, process and thread can be divided into groups according to these two kinds of information.When carrying out solicit operation, PMC obtains the sign TID of thread of executable operations and the sign PID of affiliated process thereof like this.According to TID and PID, this solicit operation be mapped to a unique sorted group (PG, TG) on, and corresponding one single-threaded, thereby can enough SBP forecasting techniquess carry out single-threaded idle pulley prediction.

We are according to Fig. 5. mode made up a Hash tree.Each thread can be considered as an independently request source, a TI node in corresponding the Hash tree.Dope the distribution that requesting interval time of single thread may obey by statistics and analysis to solicit operation.At first,, and hit under the situation of the TI node in the P/T ID Hash table SBP fallout predictor that PGID that is preserved according to the TI node and TGID find the TG node at the TID and the PID that obtain to ask.Then, use the current stack information and the time information renovation TI node that obtain, and current stack information is passed to the prediction that the SBP fallout predictor carries out idle pulley.According to predicting the outcome and the prediction statistical information of TG node, the distribution of classified statistics free time.In the prediction statistical information of TG node, longest-idle is T _Max, will [0, T _Max] time period (T _MaxGenerally be taken as 2/T _BE(s _n)) be divided into N equably _BinIndividual minizone (N _BinGenerally get 40), and with a N _Bin+ 1 array is deposited the cumulative number that the free time in each section interval occurs, and free time surpasses T _MaxAll charge to last unit of array.The approximate distribution that has characterized free time of frequency histogram that occurs by free time of this array like this, we be called free time frequency distribution table (Idle-Frequency Distribution Table, IFDT).For the variation that can distribute the idle time of self-adaptation, adopted the statistical method of moving window.Moving window is that a maximum length is L _SWFifo queue (L _SWGeneral value is 40), formation preserved free time at one's leisure between index in the frequency distribution table.The TG node has been set up independently free time frequency distribution table and moving window for each idle pulley.After choosing free time frequency distribution table and moving window according to the idle pulley of prediction, with free time at one's leisure between index in the frequency distribution table deposit formation in, if queue length has exceeded maximum length L _SWRestriction, the dequeue of formation stem so simultaneously increases and reduces statistical value in the free time frequency distribution table respectively according to the index of going into formation and dequeue.So the reflection of free time frequency distribution table is the interior free time distribution statistics of moving window.Free time distributes last, (asks current time and last time time interval t-t constantly according to free time frequency distribution table and time delay _Previous), (ConditionalProbability Function CPF) calculates the probability that surpasses the different idle pulley break-even times in the current following free time that predicts the outcome to the conditional probability computing module.Suppose that the statistics of free time frequency distribution table on i interval is n _i(1≤i≤N _Bin), what the distribution F (t) of free time can be similar to so is expressed as:

F (t) = Σ_{i = 1}^{k} n_{i} / Σ_{i = 1}^{N_{bin} + 1} n_{i} = Σ_{i = 1}^{k} n_{i} / L, k = \min ([\frac{t}{m}] + 1, N_{bin} + 1) - - - (1.5)

In the formula

The interval time interval of m---free time frequency distribution, i.e. m=T _Max/ N _Bin

K---the sequence number between free time t location;

L---the length of current moving window formation.

Because after the moving window formation reaches maximum length, just can not reduce, so at L _SWThe saturated L ≡ of the total amount L of statistics after the individual request _SWIf current time apart from time interval (being called elapsed-time standards) of solicit operation last time be Δ t=t-t _Previous, free time this moment be distributed as conditional probability, so the algorithm of CDF is as follows:

CDF (t, Δt) = \{\begin{matrix} \frac{F (t + Δt) - F (Δt)}{1 - F (Δt)}, (Δt < T_{\max}) \\ 1, (Δt &GreaterEqual; T_{\max}, n_{N_{bin} + 1} = 0) \\ 0, (Δt &GreaterEqual; T_{\max}, n_{N_{bin} + 0} &NotEqual; 0) \end{matrix} - - - (1.6)

Can calculate free time above idle pulley s according to CDF _iThe probability P (s of break-even time _i, Δ t) be:

P(s _i，Δt)＝1-CDF(T _BE(s _i)，Δt) (1.7)

If order

Represent elapsed-time standards Δ t respectively and surpass the interval sequence number at break-even time place afterwards, then have according to formula (1.6) and (1.7):

P (s_{i}, Δt) = \{\begin{matrix} (L - Σ_{i = 1}^{k_{i}^{'}} n_{i}) / (L - Σ_{i = 1}^{k_{i}} n_{i}), (k_{i} \leq N_{bin}) \\ 0, (k_{i} = N_{bin} + 1, n_{N_{bin} + 1} = 0) \\ 1, (k_{i} = N_{bin} + 1, n_{N_{bin} + 1} &NotEqual; 0) \end{matrix} - - - (1.8)

Just can draw the probability that free time surpasses the different idle pulley break-even times by statistics like this to free time frequency distribution table.

The establishment of process and thread and the message that withdraws from have been arranged, just can divide the time interval of load, thereby carried out the load estimation under the multithreading according to the influence of thread.The request source that only has PID and TID to hit when request arrives carries out SBP prediction (time-delay of this request source is updated to 0), and other request source only recalculates the probability P (s that each idle pulley break-even time is exceeded according to elapsed-time standards Δ t _i, Δ t).Calculate PMC again and enter i level idle pulley s _iProbability P (s _i):

P (s_{i}) = \{\begin{matrix} Π_{j = 1}^{N_{T}} P_{j} (s_{i}, Δ t_{j}) - Π_{j = 1}^{N_{T}} P_{j} (s_{i + 1}, Δ t_{j}), (i < n) \\ Π_{j = 1}^{N_{T}} P_{j} (s_{i}, Δ t_{j}), (i = n) \end{matrix} - - - (1.9)

In the formula

N _T---the current total number of threads of using PMC, i.e. request source sum;

N---PMC idle pulley sum;

Δ t _j---the time-delay that request source j is current;

P _j(s _i, Δ t _j)---the free time of request source j surpasses idle pulley s _iThe probability of break-even time, its computing method are suc as formula (1.8).

After acquisition PMC entered the probability of each idle pulley, the method for prediction idle pulley was identical with formula (1.4), selected the final idle pulley that enters of predicting of conduct of probability maximum, preferentially selected the idle pulley of more low-power consumption under the identical situation of probability.

After all used the thread of PMC all to withdraw from, formula (1.9) was just no longer suitable.Need predict according to the empty request source of reserve this moment.Empty request source is not represented any thread, does not participate in prediction when the thread operation of using PMC is arranged.Empty request source has been predicted the PMC free time under the thread situation of no any use PMC.Prediction algorithm adopts the method for easy exponential average (Exponential Average):

s _predict(n)＝I ^-1(α _spredict(n-1)+(1-α)I(s _opt(t _n-1))) (1.10)

In the formula

s _Predict(n)---the idle pulley of the n time prediction;

t _n---the actual free time of the n time prediction;

I (s _i)---with idle pulley s _iBe mapped as the operator of subscript sequence number i;

I ^-1(i)---the inverse operation of I, i is mapped as idle pulley s with the subscript sequence number _i, when sequence number is rounded up to the integer sequence number during for decimal;

α---the exponential damping factor between 0 to 1, generally gets 0.5.

The prognoses system that is made of empty request source and thread request source is by just providing the prediction service for system on the whole time period like this.

When prediction can make mistakes, in order to reduce the loss that misprediction brings, we adopted the overtime strategy of single overtime thresholding to carry out final decision-making according to the probability distribution that predicts the outcome.Because adopt histogram approximate representation probability distribution, and add up with moving window, thus this strategy be called the overtime strategy of simple gate limit self-adaptation based on probability (Probability-based Single-threshold Adaptive Timeout, PSAT).Overtime thresholding k is an element of limited arithmetic sequence set, k=r Δ t, and (0≤r＜q), Δ t is the equal difference interval of thresholding, and r is a thresholding k corresponding sequence number.Minimum threshold is 0, and max threshold is q Δ t.J level idle pulley s then _jUnder total energy consumption be:

E_{total_idle} = Σ_{i = 1}^{r} (a_{1} T_{i} + n_{i} b_{1}) + Σ_{i = r + 1}^{q} (a_{j} T_{i} + n_{i} ((a_{1} - a_{j}) rΔt + b_{j})) - - - (1.11)

In the formula

T _i---drop on the length summation of i the free time in the interval;

n _i---free time is dropped on the sum of i the request in the interval;

If probability distribution is definite, i.e. T _iAnd n _iConstant, then total energy consumption can be expressed as the function E of overtime thresholding _{Total_idle}=E (r), relatively the total energy consumption under the individual overtime thresholding of q can be determined idle pulley s _iUnder lowest energy consumption E _Opt(i)=minE _{Total_idle}Best overtime thresholding k with correspondence _Opt(i).By comparing the lowest energy consumption under each idle pulley, just can determine final best overtime thresholding k then _OptWith the idle pulley s that should enter _Opt:

E_{opt} = E_{opt} (j) = \min_{1 \leq i \leq m} {E_{i} (k_{opt} (i))}

k _opt＝k _opt(j)

s _opt＝s _j (1.12)

In the formula

k _Opt(i)---enter idle pulley s _iOptimum overtime thresholding;

E _i(k)---under thresholding k, enter idle pulley s _iEnergy consumption;

E _Opt(i)---enter idle pulley s _iLowest energy consumption.

Claims

1. one kind is utilized stack depth to carry out the dynamic power consumption control method that multithreading is predicted, it is characterized in that: the first step: bunch integrated mode RCM by operation requests describes the feature of single thread to the operation requests sequence Φ of parts, and its computing method are as follows:

{RCM}_{Φ} = \mod_{2^{32}} (\underset{r_{i} &Element; Φ}{Σ} RA (r_{i})), Φ = {r_{i} | T_{RI} (r_{i}, r_{i - 1}) < T_{BE} (s_{2})}

In the formula:

r _i---system is to the operation requests of parts;

RA (r _i)---produce operation requests r _iThe return address of function;

Mod _x(y)---think that radix x carries out modular arithmetic to y;

T _RI(r _i, r _I-1)---operation requests r _iWith r _I-1Between free time;

s ₂---the 2nd grade of idle pulley;

P (s_{i}) = \{\begin{matrix} Π_{j = 1}^{N_{T}} P_{j} (s_{i}, Δ t_{j}) - Π_{j = 1}^{N_{T}} P_{j} (s_{i + 1}, Δ t_{j}), (i < n) \\ Π_{j = 1}^{N_{T}} P_{j} (s_{i}, Δ t_{j}), (i = n) \end{matrix}

P_{j} (s_{i}, Δ t_{j}) = 1 - \frac{F_{predict} (T_{BE} (s_{i}) + Δ t_{j}) - F_{predict} (Δ t_{j})}{1 - F_{predict} (Δ t_{j})}

In the formula

The idle pulley sum of n---parts;

Δ t _j---the time-delay that request source j is current;

E_{total} (rΔt, s_{i}) = Σ_{j = 1}^{r} (a_{1} T_{j} + n_{j} b_{1}) + Σ_{j = r + 1}^{q} (a_{i} T_{j} + n_{j} ((a_{1} - a_{i}) rΔt + b_{i}))

E_{opt} = \underset{1 \leq r \leq q}{\min_{1 \leq i \leq m}} {E_{total} (rΔt, s_{i})} = E_{total} (r_{opt} Δt, s_{opt})

k _opt＝r _optΔt

In the formula

Δ t---the interval time of the overtime thresholding of candidate;

Q---the sum of the overtime thresholding of candidate;

a _i---the average power consumption of i level idle pulley;

The equivalence conversion energy consumption of bi---i level idle pulley;