CN100478873C - Access address generating method aimed at stream application - Google Patents

Access address generating method aimed at stream application Download PDF

Info

Publication number
CN100478873C
CN100478873C CNB2007100345789A CN200710034578A CN100478873C CN 100478873 C CN100478873 C CN 100478873C CN B2007100345789 A CNB2007100345789 A CN B2007100345789A CN 200710034578 A CN200710034578 A CN 200710034578A CN 100478873 C CN100478873 C CN 100478873C
Authority
CN
China
Prior art keywords
address
offset
stream
clcnt
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2007100345789A
Other languages
Chinese (zh)
Other versions
CN101021784A (en
Inventor
穆长富
张明
陈海燕
马驰远
高军
李晋文
衣晓飞
阳柳
曾献君
李勇
倪晓强
唐遇星
张承义
杨学军
张民选
邢座程
蒋江
汤明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CNB2007100345789A priority Critical patent/CN100478873C/en
Publication of CN101021784A publication Critical patent/CN101021784A/en
Application granted granted Critical
Publication of CN100478873C publication Critical patent/CN100478873C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for generating visit-address for the application of cluster. The steps shows as follow: (1) the cluster-controller sends signals to the address-generator and writes it into the designated address-register and cluster memory, and makes GO effective to start the address-generator. (2) it initializes offset address registers group according to visiting mode and generates the initial visiting addresses corresponding to the number of offset address registers group. (3) it generates addresses for the remaining cluster elements and reduces the length of cluster. (4) When the length of the cluster is 0 and the current visiting and depositing address of cluster generates completely, it makes the GO to be invalid and notices the cluster controller addresses generator to be at idle. Otherwise, it carries out Step (3).

Description

Memory access address generating method at the stream application
Technical field
The present invention is mainly concerned with the design field of microprocessor, refers in particular to a kind of memory access address generating method of using at stream.
Background technology
The stream handle structure is a class is used the SIMD type that can efficiently handle at stream a processor structure.Stream is used bandwidth is had high requirements, and its data amount of having is big, continue to flow into and less characteristics such as reuse.The stream handle key process unit is the calculating group of a series of concurrent workings.It is handled unit and is stream (stream), stream has order isomorphic record (record) to form by some, what form record then is a series of relevant data elements (element), and one of them data element is a word, and the inner data element of record is deposited in storer continuously.This component characteristic of stream has caused the generation of its address to be different from the address generating method of common microprocessor, the structure of carrying out the generation of this address in stream handle is called the address production part, be vital parts in the stream handle, it is responsible for and will calculates the parallel efficient algorithm of handling of group and resolve into single data element sequence and carry out accessing operation according to adapting to the stream of the unit of being recorded as in the superstructure.Stream handle has designed multiple memory access pattern (such as special anti-memory access pattern in position at FFT), from the angle of memory access the memory access of some common application has been done support at hardware, increased the dirigibility of user program, increased substantially the treatment effeciency of processor this type application.
Summary of the invention
The technical problem to be solved in the present invention just is: at the technical matters of prior art existence, the invention provides a kind of can by to subdivision and the reorganization effectively of address generating mode, the address produced the path rationally divide, to reduce between the station memory access address generating method of using at stream that postpones, improves position frequency of components to greatest extent.
For solving the problems of the technologies described above, the solution that the present invention proposes is: a kind of memory access address generating method of using at stream is characterized in that step is:
(1), stream controller sends enabling signal to the address production part, meanwhile the address production part receives value from the memory address register of stream controller appointment, class of operation offset and information such as the stream length that sends from the stream registers file and these information are filled into certain idle storage flow control register, GO position in the storage flow control register is set to effectively then, the work of enabling address production part;
(2), initialization offset address registers group, carry out simultaneously initialized process and the memory access address that produces some, this some is corresponding with offset address registers group number;
(3), the residual stream element to memory access stream carries out the address generation;
(4), when the STRCNT register value is 0, current stream memory access address generates and to finish, the GO position is set to invalidly in the storage flow control register, notification streams controller address production part is in idle condition, can accept next stream accessing operation; Otherwise, carry out the operation of (3) step.
Described address production part is the flowing structure at three stations:
(1), first stop is offset address registers group OFFSET[i] (i=0,1,2 ... n-1), the offset address registers group is carried out assignment at this station;
(2), second station will produce the OFFSET value of address.Deposit the OFFSET value that the MUX_5 gating is gone out, the control signal of MUX_5 switch is CLCNT, and it is that a wheel changes moving counter, indicates current which OFFSET register of should selecting and produces the final address;
(3), the 3rd station is with accessing operation type, address, data and the useful signal of standing out.
In the described first stop, the value that the assignment of offset address registers group source has under a plurality of, different patterns, the different processing stage OFFSET of same pattern is endowed is all different, and the OFFSET value mainly contains following source:
1. for OFFSET[0], striding and the position will be initialized to 0 value in the anti-pattern;
2. initialization OFFSET[CLCNT under the pattern that strides] when (CLCNT=1,2,3), receive from OFFSET[CLCNT-1] value;
3. under the pattern that strides, finish (when being RECCNT=RECLEN) afterwards, be according to OFFSET[CLCNT when recording address produces] register value calculate this calculatings group correspondence next record first address and send back to OFFSET[CLCNT] in;
4. position initialization OFFSET[CLCNT during anti-pattern] when (wherein CLCNT=1,2,3), receive from OFFSET[CLCNT-1] through the value after the certain operations;
5. under the anti-pattern of position, finish (when being RECCNT=RECLEN) afterwards, be according to OFFSET[CLCNT when recording address produces] register value calculate this calculatings group correspondence next record first address and send back to OFFSET[CLCNT] in;
The value of the record first address that 6. from the index stream passage, receives under the indexing model;
7. current record does not dispose, and the address of next element adds 1 and obtain exactly in the record on the basis, current address.
If 8. OFFSET is when next bat remains unchanged, the value of oneself be sent back to.
Compared with prior art, advantage of the present invention just is:
1, the present invention can make hard-wired complexity reduction, power consumption reduce.By the analysis of process that the address of three kinds of memory access patterns is generated, and functional part effectively made up realize that data path shares, exchange extremely low resource requirement for very little design complexities cost;
2, the present invention is by to the realization of three kinds of memory access modes, makes that stream handle convection current in the larger context should be as well supporting, the programming dirigibility that can improve software can greatly improve treatment effeciency to the application of some high frequency of utilization.
3, the present invention realizes memory access by above-mentioned three kinds of memory access modes, use for stream can reach efficiently, effect fast.And give full play to the parallel processing capability that calculates the group, improve system effectiveness.
Description of drawings
Fig. 1 is the corresponding relation synoptic diagram of address of the present invention production part and superstructure;
Fig. 2 is a schematic flow sheet of the present invention;
Fig. 3 is the synoptic diagram that production part station in address is divided among the present invention.
Embodiment
Below with reference to the drawings and specific embodiments the present invention is described in further details.
Referring to shown in Figure 2, a kind of memory access address generating method of using at stream of the present invention the steps include:
(1), stream controller sends enabling signal (it comprises storage base address, memory access pattern, action type, record length, stream length) to the address production part, meanwhile the address production part receives value from the memory address register of stream controller appointment, class of operation offset and information such as the stream length that sends from the stream registers file and these information are filled into certain idle storage flow control register, GO position in the storage flow control register is set to effectively then, the work of enabling address production part;
(2), initialization offset address registers group, carry out simultaneously initialized process and the memory access address that produces some, this some is corresponding with offset address registers group number;
(3), the residual stream element to memory access stream carries out the address generation;
(4), when the STRCNT register value is 0, current stream memory access address generates and to finish, the GO position is set to invalidly in the storage flow control register, notification streams controller address production part is in idle condition, can accept next stream accessing operation; Otherwise, carry out the operation of (3) step.
Referring to shown in Figure 3, the address production part is the flowing structure at three stations:
(1), first stop is offset address registers group OFFSET[i] (i=0,1,2 ... n-1), the offset address registers group is carried out assignment at this station;
(2), second station will produce the OFFSET value of address.Deposit the OFFSET value that the MUX_5 gating is gone out, the control signal of MUX_5 switch is CLCNT, and it is that a wheel changes moving counter, indicates current which OFFSET register of should selecting and produces the final address;
(3), the 3rd station is with accessing operation type, address, data and the useful signal of standing out.This station comprises action type (load/store), data, address and useful signal: address field is obtained by the totalizer of front value and the BASE territory addition calculation with second station; Data and useful signal territory do not mark their path in the drawings, thereby in fact they also can be realized and the synchronous pairing of address by three station flowing water, perhaps are stored in certain formation, control it by control signal and correctly flow out with the address pairing.They are sent to lower floor buffering and carry out aftertreatment afterwards.
In first stop, the value that the assignment of offset address registers group source has under a plurality of, different patterns, the different processing stage OFFSET of same pattern is endowed is all different, and the OFFSET value mainly contains following source:
1. for OFFSET[0], striding and the position will be initialized to 0 value in the anti-pattern;
2. initialization OFFSET[CLCNT under the pattern that strides] when (CLCNT=1,2,3), receive from OFFSET[CLCNT-1] value;
3. under the pattern that strides, finish (when being RECCNT=RECLEN) afterwards, be according to OFFSET[CLCNT when recording address produces] register value calculate this calculatings group correspondence next record first address and send back to OFFSET[CLCNT] in;
4. position initialization OFFSET[CLCNT during anti-pattern] when (wherein CLCNT=1,2,3), receive from OFFSET[CLCNT-1] through the value after the certain operations;
5. under the anti-pattern of position, finish (when being RECCNT=RECLEN) afterwards, be according to OFFSET[CLCNT when recording address produces] register value calculate this calculatings group correspondence next record first address and send back to OFFSET[CLCNT] in;
The value of the record first address that 6. from the index stream passage, receives under the indexing model;
7. current record does not dispose, and the address of next element adds 1 and obtain exactly in the record on the basis, current address.
If 8. OFFSET is when next bat remains unchanged, the value of oneself be sent back to.
Combining above-mentioned 8 aspects can obtain the index stream path that second station among Fig. 3 comes to five feedback networks of first stop and one from SRF and one and give OFFSET[0] compose the path of 0 value.The signal of these paths is sent in the corresponding OFFSET register after by the selector switch gating.
Referring to shown in Figure 1, address of the present invention generates the corresponding relation figure of parts and superstructure.Between OFFSET register in the production part of address and stream registers file (SRF) and the calculating group (CLUSTER) following corresponding relation: CLUSTER[i is arranged] → SRF.BANK[i] → OFFSET[i] (i=0,1,2,3).Load when operation, pass through OFFSET[i] address that the generates data of visiting out finally all can be sent to CLUSTER[i] go; Store when operation, CLUSTER[i] data that generate can be according to OFFSET[i] address that generates stores in the storer and goes.Therefore in the stream handle structure, if calculating group's number changes, bank number in the register file and the OFFSET register number in the production part of address all can change so, and this variation simultaneously also can have influence on the rule of calculated address in the production part of address.
In the present embodiment, in invention, relate to offset address registers group OFFSET[i] (i=0,1,2, n-1), each offset address register corresponding one calculate group unit (CLUSTER[i] (i=0,1,2, n-1)), promptly by offset address register OFFSET[i] data that obtain of visit finally all can be sent to calculate group CLUSTER[i], hereinafter all be example for sake of convenience with n=4, reality realizes the view volume architecture and decides, and generally gets 2 power.
In addition, also relate to two groups of important control registers among the present invention: the storage flow control register (MSCR[i] (i=0,1) and memory address register (MAR[j] (and j=0,1 ... 15).
In the present embodiment, MSCR is exactly the startup of control address production part as its major function of storage flow control register and closes and how correctly effectively to split and to make up the current stream of handling.It comprises following territory:
BASE (32bits): storage base address.
STRIDE (12bits): the visit amount of striding.
STRCNT (16bits): the stream length counter, whether all memory access addresses of controlling current stream generate and finish.
ALIGN (4bits): the alignment bit in the middle of the anti-pattern of position.
RECLEN (11bits): record length register.
RECCNT (11bits): recording counter shows handle which element of current record.
CLCNT (2bits): calculate group's counter (bit wide decide according to calculating group's number), OFFSET[CLCNT] expression belongs to the OFFSET that calculates for CLCNT number crowd.
MODE (2bits): mode register.
OP (1bit): operation note.
GO (1bit): start register, when the address production part is worked, be equipped with effect.
MAR is the visible registers group of user, and its value is filled by the instruction of stream level.It is used for filling in the MSCR register, so some territory of its territory and MSCR is identical:
BASE (32bits): storage base address.
ALIGN (4bits): the alignment bit of position in the middle of the anti-pattern is worth and is
Figure C20071003457800081
RECLEN (11bits): record length register.
STRIDE (12bits): the visit amount of striding.
MODE (2bits): mode register.
In the memory access process of a stream, the BASE territory indicates the base address of memory access, and the address of other all elements all is based on the side-play amount of this plot in flowing later on.All there are 4 OFFSET registers production part inside, address, and each OFFSET register is corresponding with a calculating group, and the offset address that calculates the element of the corresponding record of group with this is preserved in the inside.The common counter that forms 13 of RECCNT and CLCNT, this counter has been finished by the control that records the data word conversion, the each memory access of counter adds 1, CLCNT preserves the pairing group's of calculating of next memory access numbering, RECCNT preserves the side-play amount of element in respective record of less important memory access down, and therefore the address of next memory access element is exactly BASE+OFFSET[CLCNT].
RECLEN preserves the record length of the stream of current accessed.Because each inner data element of record is deposited in storer continuously, therefore as RECCNT during less than RECLEN, the OFFSET register of correspondence is exactly simply to add 1.When the value of RECCNT was increased to RECLEN, the visit of the corresponding group's of calculating current record finished, and new start-of-record is accessed.
The memory access pattern of stream handle promptly generates the next create-rule of wanting reference address by the current address.For certain specific calculating group, the generation of new record first address depends on the memory access pattern.In the method for the present invention, mentioned three kinds of memory access patterns, below these three kinds of patterns have been elaborated, and (create-rule of address is closely related with the number that calculates the group to further specify its implementation strategy, being convenient narration, is 4 calculating to calculate group's number all in this patent):
1. (stride) pattern strides: the first address of a record increases the amount of striding and obtains on current record first address basis.
For preceding four records of stream, the address is as follows:
OFFSET[0]=0;
OFFSET[CLCNT]=OFFSET[CLCNT-1]+STRIDE (wherein CLCNT=1,2,3).
For specific calculating group, the generation rule of former and later two records is as follows: OFFSET[CLCNT afterwards]=OFFSET[CLCNT]+STRIDE*M-RECLEN (M=4 wherein, value with to calculate group's number relevant).STRIDE has indicated the distance of two continuous recording starting points, and RECLEN has indicated the number of the element of storing continuously in the record, and the RECLEN value is necessarily less than the STRIDE value, otherwise the overlapping situation of record data before and after will occurring.
2. anti-(bit-reversed) pattern in position: this is a kind of memory access pattern of setting up in order to support FFT, and the back that is inverted of bit is as new address during addressing.The generation of the address sequence of this pattern is than very big difference is arranged in the common microprocessor in stream handle.How formal description with class C language realizes the anti-operation in position in stream is handled below:
When on stream handle, realizing fast fourier transform algorithm, each calculates one section of group's processing then need to be divided into four sections (it is corresponding with calculating group's number in the stream handle to be divided into what sections) to entire stream, the start address of four sections (use init_offset[i] represent) calculate by following formula:
init_offset[0]=0
For (i=1~nClust-1):
init_offset[i]=brAdd(init_offset[i-1],stride<<(log 2(nClust)+align))
BrAdd is the anti-add operation in position, when carrying out addition, is not to a high position but to the low level carry promptly.
Next, each Cluster according to the rules order visit respectively gives the record of that own section, and this visit is offset (j the first address rec_offset that writes down that will visit jRepresent) calculate by following formula:
rec_offset 0=0
rec_offset j=brAdd(rec_offset j-1>>align,stride)<<align
The address of first element of j the record that i Cluster should visit is:
offset[i] j=init_offset[i]+rec_offset j
It should be noted that above-mentioned formula calculates is the address of next first element that should accessed record in the data stream, the access process of other elements of whole record is remained in order visit successively.
Therefore we can therefrom to extract hardware design methods as follows:
The address of preceding four records of stream produces rule: OFFSET[0]=0;
OFFSET[CLCNT]=OFFSET[CLCNT-1]+* STRIDE<<(log2 (nCluster)+ALIGN) (CLCNT=1 wherein, 2,3; NCluster=4).
For specific calculating group, the generation rule of former and later two records is as follows afterwards:
OFFSET[CLCNT]=(OFFSET[CLCNT]>>ALIGN+*STRIDE)<<ALIGN
Wherein+* is the anti-add operation in position.
3. index (index) pattern: OFFSET[CLCNT]=offset address that from index stream, passes over.

Claims (3)

1, a kind of memory access address generating method of using at stream is characterized in that step is:
(1), stream controller sends enabling signal to the address production part, meanwhile the address production part receives value from the memory address register of stream controller appointment, class of operation offset and the stream length information that sends from the stream registers file and these information are filled into certain idle storage flow control register, startup register GO position in the storage flow control register is set to effectively then, the work of enabling address production part;
(2), initialization offset address registers group, carry out simultaneously initialized process and the memory access address that produces some, this some is corresponding with offset address registers group number;
(3), the residual stream element to memory access stream carries out the address generation;
(4), when the value of stream length counter STRCNT is 0, current stream memory access address generates and to finish, start in the storage flow control register register GO be set to invalid, the notification streams controller, the address production part is in idle condition, can accept next stream accessing operation; Otherwise, carry out the operation of (3) step.
2, the memory access address generating method of using at stream according to claim 1 is characterized in that described address production part is the flowing structure at three stations:
(1), first stop is offset address registers group OFFSET[i], i=0 wherein, 1,2 ... n-1, the offset address registers group is carried out assignment at this station;
(2), second station will produce the OFFSET value of address, deposit the OFFSET value that the MUX_5 gating is gone out, the control signal of MUX_5 switch is CLCNT, and CLCNT is that a wheel changes moving counter, indicates current which OFFSET register of should selecting and produces the final address;
(3), accessing operation type, address, data and useful signal will be exported in the 3rd station.
3, the memory access address generating method of using at stream according to claim 2, it is characterized in that in the described first stop, the assignment source of offset address registers group has a plurality of, under the different patterns, the value that is endowed of the different processing stage OFFSET of same pattern is all different, the OFFSET value has following source:
1. for OFFSET[0], striding and the position will be initialized to 0 value in the anti-pattern;
2. initialization OFFSET[CLCNT under the pattern that strides], wherein CLCNT=1 2,3 o'clock, receive from OFFSET[CLCNT-1] value;
3. under the pattern that strides, when recording address produce finish after, when being RECCNT=RECLEN, be according to OFFSET[CLCNT] register value calculates the first address of corresponding next record, and sends back to OFFSET[CLCNT] in;
4. position initialization OFFSET[CLCNT during anti-pattern], wherein CLCNT=1 2,3 o'clock, receive from OFFSET[CLCNT-1] through the value after the certain operations;
5. under the anti-pattern of position, when recording address produce finish after, promptly during RECCNT=RECLEN, be according to OFFSET[CLCNT] register value calculates the first address of corresponding next record, and sends back to OFFSET[CLCNT] in;
The value of the record first address that 6. from the index stream passage, receives under the indexing model;
7. current record does not dispose, and the address of next element adds 1 and obtain exactly in the record on the basis, current address;
If 8. OFFSET is when next bat remains unchanged, the value of oneself be sent back to.
CNB2007100345789A 2007-03-19 2007-03-19 Access address generating method aimed at stream application Expired - Fee Related CN100478873C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007100345789A CN100478873C (en) 2007-03-19 2007-03-19 Access address generating method aimed at stream application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007100345789A CN100478873C (en) 2007-03-19 2007-03-19 Access address generating method aimed at stream application

Publications (2)

Publication Number Publication Date
CN101021784A CN101021784A (en) 2007-08-22
CN100478873C true CN100478873C (en) 2009-04-15

Family

ID=38709559

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007100345789A Expired - Fee Related CN100478873C (en) 2007-03-19 2007-03-19 Access address generating method aimed at stream application

Country Status (1)

Country Link
CN (1) CN100478873C (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019945B (en) * 2012-11-26 2016-08-17 北京北大众志微系统科技有限责任公司 A kind of execution method of access instruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
网格技术及基于网格服务的工作流系统模型. 汪静,刘铁英,王国光.内蒙古大学学报(自然科学版),第36卷第4期. 2005
网格技术及基于网格服务的工作流系统模型. 汪静,刘铁英,王国光.内蒙古大学学报(自然科学版),第36卷第4期. 2005 *

Also Published As

Publication number Publication date
CN101021784A (en) 2007-08-22

Similar Documents

Publication Publication Date Title
CN111178519B (en) Convolutional neural network acceleration engine, convolutional neural network acceleration system and method
CN108241890B (en) Reconfigurable neural network acceleration method and architecture
JP2021515300A (en) Neural network accelerator
CN107018184A (en) Distributed deep neural network cluster packet synchronization optimization method and system
CN101083643A (en) Low memory spending hybrid base FFT processor and its method
CN102541749B (en) Multi-granularity parallel storage system
CN113220630A (en) Reconfigurable array optimization method and automatic tuning method of hardware accelerator
CN109547192A (en) The parallelization optimization method of SM3 cryptographic Hash algorithm
JP2013179378A (en) Multi-core type error correction processing system and error correction processor
CN107506329B (en) A kind of coarse-grained reconfigurable array and its configuration method of automatic support loop iteration assembly line
CN105227259A (en) A kind of M sequence walks abreast production method and device
CN101082906A (en) Fixed-base FFT processor with low memory spending and method thereof
CN100478873C (en) Access address generating method aimed at stream application
CN116048811A (en) Fully homomorphic encryption neural network reasoning acceleration method and system based on resource multiplexing
Chen et al. ThunderGP: resource-efficient graph processing framework on FPGAs with hls
CN109992741A (en) A kind of serial FFT implementation method of mixed base 2-4 and device
CN111461336B (en) MPI multi-process-based noise-containing double-quantum logic gate implementation method and device
CN107423030A (en) Markov Monte carlo algorithm accelerated method based on FPGA heterogeneous platforms
CN106846236A (en) A kind of expansible distributed GPU accelerating method and devices
Chattopadhyay et al. Designing high-throughput hardware accelerator for stream cipher HC-128
CN107450963A (en) The chemical reaction optimized algorithm that a kind of Virtual machine is placed
CN113159302B (en) Routing structure for reconfigurable neural network processor
CN104951279A (en) Vectorized Montgomery modular multiplier design method based on NEON engine
Yang et al. Understanding the performance of in-network computing: A case study
CN102200962A (en) Finite difference stencil parallelizing method based on iteration space sticks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090415

Termination date: 20110319