CN103678204A

CN103678204A - Processor and data processing method

Info

Publication number: CN103678204A
Application number: CN201310746619.2A
Authority: CN
Inventors: 马健; 张戈; 刘奇; 李文刚
Original assignee: Loongson Technology Corp Ltd
Current assignee: Loongson Technology Corp Ltd
Priority date: 2013-12-30
Filing date: 2013-12-30
Publication date: 2014-03-26
Anticipated expiration: 2033-12-30
Also published as: CN103678204B

Abstract

The invention provides a processor and a data processing method. Data transferring is performed between an access accelerating area and an internal storage by the processor through a first DMA module, data transferring is performed between the access accelerating area and a register file through a second DMA module, and the access accelerating area is an address locked area in the internal storage and is used for storing data to be read and written by the first DMA module and the second DMA module. Due to the fact that one segment address area is located in the internal storage, the address area is used as the access accelerating area, the first DMA module is additionally arranged, data transferring is performed between the access accelerating area and the internal storage through the first DMA module, data transferring is performed between the access accelerating area and the register file in cooperation with the second DMA module, and therefore the second DMA module can read and write data from the access accelerating area all the time, the situation that data are not read is avoided, and the calculating efficiency of the processor is improved.

Description

Processor and data processing method

Technical field

The present invention relates to computer realm, relate in particular to a kind of processor and data processing method.

Background technology

High-performance calculation processor is in high-performance calculation process, and (Direct Memory Access is called for short: DMA) module is carried out data-moving between register file and L2 cache (cache), can to pass through direct memory access.By this dma module, on secondary buffer memory, operate, its read or write speed is very fast, if do not have hit situation but there are data on L2 cache, according to the data path shown in Fig. 1, prior art need to deposit elder generation's reading out data from internal memory in L2 cache in, and DMA reads desired data by L2 cache again.

But, adopt in this way, can cause processor high-performance calculation part because waiting pending data to prepare counting yield is reduced greatly.And because high-performance calculation general data scale is all very large, other process impacts of system in addition, the scene that L2 cache does not hit can frequently occur, and occurs " running out of number ".Can further cause the phenomenon of processor computing unit appearance " hunger ", reduce the counting yield of processor.

Summary of the invention

The invention provides a kind of processor and data processing method, for improving processor, process the efficiency of high-performance data.

First aspect of the present invention is to provide a kind of processor, comprising: outside direct memory access dma module, inner direct memory access dma module, internal memory, register file;

Wherein, described the first dma module is connected with described internal memory, memory access accelerating region respectively, and described the second dma module is connected with described memory access accelerating region, described register file respectively;

Described the first dma module, for carrying out data-moving between described memory access accelerating region and described internal memory;

Described the second dma module, for carrying out data-moving between described memory access accelerating region and register file;

Described memory access accelerating region is the lock address district in described internal memory, described memory access accelerating region, the data of reading and writing for storing described the first dma module and described the second dma module.

In conjunction with first aspect, in the possible implementation of the first, described the first dma module, writes described memory access accelerating region specifically for the data that the non-locking address area from described internal memory is read; Or,

Described the first dma module, specifically for writing the non-locking address area in described internal memory by the data that read from described memory access accelerating region;

Described the second dma module, specifically for writing the data that read from described register file described memory access accelerating region; Or,

Described the second dma module, specifically for writing described register by the data that read from described memory access accelerating region.

In conjunction with the possible implementation of the first of first aspect or first aspect, in the possible implementation of the second, described memory access accelerating region is divided at least two the first subspaces, and described register file is divided at least two the second subspaces;

Described the second dma module, specifically for adopting ping pong scheme to carry out data-moving in described at least two the first subspaces and described at least two the second subspaces.

In conjunction with the possible implementation of the second of first aspect, in the third possible implementation, described internal memory has at least continuous physical address space of 6Gbit, so that described the first dma module is from whole section of reading out data of described continuous physical address space.

Second aspect of the present invention is to provide a kind of data processing method, is applied to a processor, and described processor comprises the first direct memory access dma module, the second direct memory access dma module, internal memory and register file, and described data processing method comprises:

Described the first dma module carries out data-moving between memory access accelerating region and described internal memory;

Described the second dma module carries out data-moving between described memory access accelerating region and described register file;

Described memory access accelerating region is the lock address district in described internal memory, the data that described memory access accelerating region is read and write for storing described the first dma module and described the second dma module.

In conjunction with first aspect, in the possible implementation of the first, described the first dma module carries out data-moving between described memory access accelerating region and described internal memory, comprising:

The data that described the first dma module reads the non-locking address area from described internal memory write described memory access accelerating region; Or,

Described the first dma module writes the non-locking address area in described internal memory by the data that read from described memory access accelerating region;

Described the second dma module carries out data-moving between described memory access accelerating region and described register file, comprising:

Described the second dma module writes described memory access accelerating region by the data that read from described register file; Or,

Described the second dma module writes described register by the data that read from described memory access accelerating region.

In conjunction with the possible implementation of the first of second aspect or second aspect, in the possible implementation of the second, described memory access accelerating region is divided at least two the first subspaces, and described register file is divided at least two the second subspaces;

Described the second dma module adopts ping pong scheme to carry out data-moving in described at least two the first subspaces and described at least two the second subspaces.

In conjunction with the possible implementation of the second of second aspect, in the third possible implementation, described internal memory has at least continuous physical address space of 6Gbit, so that described the first dma module is from whole section of reading out data of described continuous physical address space.

The processor that the present embodiment provides and data processing method, by lock a sector address district in internal memory, Bing Jianggai sector address district is as memory access accelerating region, set up first dma module, by this first dma module, between memory access accelerating region and internal memory, carry out data-moving, and in conjunction with the second dma module, between memory access accelerating region and register file, carry out data-moving, the second dma module can be read and write data from memory access accelerating region always, avoid the situation that occurs that data are not hit, improved the counting yield of processor.

Accompanying drawing explanation

Fig. 1 is the data path schematic diagram of processor in prior art;

The structural representation of a kind of processor that Fig. 2 provides for the embodiment of the present invention;

A kind of data processing method schematic flow sheet that Fig. 3 provides for the embodiment of the present invention;

The another kind of data processing method schematic flow sheet that Fig. 4 provides for the embodiment of the present invention.

Embodiment

In prior art, high-performance calculation CPU, has very high Floating-point Computation ability, but the program design framework that current high-performance calculation adopts, memory access becomes Calculation bottleneck, does not give full play to processor architecture feature, brings into play its computing power.With reference to data path traditional shown in Fig. 1, between register file and L2 cache, can pass through direct memory access (Direct Memory Access, be called for short: DMA) module is carried out data-moving, now on internal memory, operate, read or write speed is also very fast, if do not have hit situation but there are data on L2 cache,, again in internal memory during reading out data, processor calculating section is because waiting pending data to prepare counting yield is reduced greatly.And because high-performance calculation general data scale is all very large, other process impacts of system in addition, major part is not hit for internal memory.There is " running out of number ", the phenomenon of processor computing unit " hunger ".

For the problems referred to above, the present invention following embodiment provide a kind of processor, improves the efficiency that processor is processed high-performance data.The structural representation of a kind of processor that Fig. 2 provides for the embodiment of the present invention, as shown in Figure 2, comprising: register file 10, the second direct memory access dma module 11, the first direct memory access dma module 12, internal memory 13.

Wherein, the first dma module 12 is connected with internal memory 13, memory access accelerating region 13a respectively, and the second dma module 11 is connected with memory access accelerating region 13a, register file 10 respectively.

The first dma module 12, for carrying out data-moving between memory access accelerating region 13a and internal memory 13.

The second dma module 11, for carrying out data-moving between memory access accelerating region 13a and register file 10.

Memory access accelerating region 13a is the lock address district in internal memory 13, memory access accelerating region 13a, the data of reading and writing for storing the first dma module 12 and the second dma module 11.

Concrete, memory access accelerating region is the lock address district in internal memory, the data that memory access accelerating region is read and write for storing the first dma module and the second dma module.

Limit one section of memory address and can not be replaced out memory access accelerating region.Because memory access accelerating region is transparent to software, can not apply this mechanism and can make certain section of memory address always hit on memory access accelerating region directly to its operation, indirectly realize the read-write of data on memory access accelerating region.In kernel, realize this function, and leave the interface that user program can call for, offer user space program, make application exclusively enjoy a part of memory access accelerating region and operate, by other processes, do not disturbed.

The processor that the present embodiment provides, by lock a sector address district in internal memory, Bing Jianggai sector address district is as memory access accelerating region, set up first dma module, by this first dma module, between memory access accelerating region and internal memory, carry out data-moving, and in conjunction with the second dma module, between memory access accelerating region and register file, carry out data-moving, the second dma module can be read and write data from memory access accelerating region always, avoid the situation that occurs that data are not hit, improved the counting yield of processor.

Preferably, the first dma module 12, writes memory access accelerating region 13a specifically for the data that the non-locking address area from internal memory 13 is read.

Or the first dma module 12, specifically for by the non-locking address area in the data write memory 13 reading from memory access accelerating region 13a.

The second dma module 11, specifically for writing the data that read from register file 10 memory access accelerating region 13a.

Or the second dma module 11, specifically for writing register by the data that read from memory access accelerating region 13a.

Further, memory access accelerating region 13a is divided at least two the first subspaces, and register file 10 is divided into at least two the second subspaces.

The second dma module 11, specifically for adopting ping pong scheme to carry out data-moving at least two the first subspaces and at least two the second subspaces.

Preferably, internal memory 13 has at least continuous physical address space of 6Gbit, so that the first dma module 12 is from whole section of reading out data of continuous physical address space.

In prior art, continuous physical address space in internal memory based on Godson 3B processor is 64KB or 32MB, cause like this when needs are moved larger data, the physical address of data there will be discontinuous situation, now, need to change discontinuous address, thereby reduce the operation efficiency of Godson 3B processor.And the continuous physical address space of 6Gbit that has been Memory Allocation in the present embodiment, address corresponding to data that the first dma module is read from internal memory is continuous, retains a part of continuous physical address space and internal memory, offers specially high-performance calculation program.High-performance calculation feature is: data scale is large, and reusability is better; So provide specially separately for it provides large section of continuous physical address space, and to the large page of this area applications storage allocation, thereby reduce because of bypass conversion buffered (Translation lookaside buffer, abbreviation: the hydraulic performance decline that TLB) disappearance is brought.This function realizes in kernel, leaves the interface that user program can call for, offers user space program.User, when program design, according to actual conditions, rationally applies.And, adopted the continuous physical address space of the 6Gbit that the present embodiment provides, do not need to increase extra operation and carry out address translation, thereby improved the treatment effeciency of Godson 3B processor.

A kind of data processing method schematic flow sheet that Fig. 3 provides for the embodiment of the present invention, the executive agent of the method is a processor, processor comprises the first direct memory access dma module, the second direct memory access dma module, internal memory and register file, with reference to Fig. 3 the method, comprises the steps:

(Direct Memory Access is called for short: DMA) module is carried out data-moving between memory access accelerating region and internal memory for step 100, the first direct memory access.

Step 101, the second direct memory access dma module carry out data-moving between memory access accelerating region and register file.

The data processing method that the present embodiment provides, by lock a sector address district in internal memory, Bing Jianggai sector address district is as memory access accelerating region, set up first dma module, by this first dma module, between memory access accelerating region and internal memory, carry out data-moving, and in conjunction with the second dma module, between memory access accelerating region and register file, carry out data-moving, the second dma module can be read and write data from memory access accelerating region always, avoid the situation that occurs that data are not hit, improved the counting yield of processor.

Further, for Fig. 1 step 100, its a kind of feasible implementation is:

The data that step 100a, the first dma module read the non-locking address area from internal memory write memory access accelerating region.

Or step 100b, the first dma module are by the non-locking address area in the data write memory reading from memory access accelerating region.

Further, for Fig. 1 step 101, its a kind of feasible implementation is:

Step 101a, the second dma module write memory access accelerating region by the data that read from register file.

Or step 101b, the second dma module write register by the data that read from memory access accelerating region.

Preferably, memory access accelerating region is divided at least two the first subspaces, and register file is divided at least two the second subspaces.

The second dma module adopts ping pong scheme to carry out data-moving at least two the first subspaces and at least two the second subspaces.

To memory access accelerating region, internal memory, register file is divided respectively some regions such as not according to actual conditions, realizes table tennis model, makes the second dma module needn't wait until that one-level data-moving completes, then starts next stage and move; Further, open a plurality of data channel of the second dma module and the first dma module, realize the parallel data of the second dma module and the first dma module and move.

Preferably, the internal memory in above-described embodiment has at least continuous physical address space of 6Gbit, so that the first dma module is from whole section of reading out data of continuous physical address space.

Ping pong scheme, its principle is: in storage space, define two sub spaces: subspace Buffer0 sum of subspace Buffer1, when Buffer0 preserves the data that send, program can be processed the data in Buffer1; And when Buffer1 preserves the data that send, program is processed the data in Buffer0, two sub spaces are as source address and destination address, and their mutual conversion completes in interrupt routine.So just can realize the interrupted transmission of data and process in real time.The another kind of data processing method schematic flow sheet that Fig. 4 provides for the embodiment of the present invention, adopts the data-moving process of ping pong scheme to be elaborated with reference to Fig. 4 to the first dma module and the second dma module:

Because memory access accelerating region is divided at least two the first subspaces, register file is divided at least two the second subspaces, and each first subspace has two states: ready state, standby condition.Each second subspace has three kinds of states: ready state, completion status and standby condition.

For register file: after once having calculated, the state of the second corresponding subspace just switches to completion status from not-ready status; When the second dma module is from a certain the second subspace reading out data, and after transmitting these data and complete to memory access accelerating region, the state of this second subspace switches to standby condition from completion status; When the second dma module writes the data that read from memory access accelerating region a certain the second subspace, the state of this second subspace switches to ready state from standby condition; When register file carries out when multiplexing, the state of corresponding the second subspace switches to ready state from completion status.Therefore when the second dma module adopts ping pong scheme, select to carry out data-moving in the second subspace of ready state.

Similarly, for memory access accelerating region: when the second dma module is from a certain the first subspace reading out data and while sending register file to, the state of this first subspace switches to standby condition from ready state; When the second dma module is from register file reading out data and while writing a certain the first subspace, the state of this first subspace is from switching to ready state with regard to preparing state; When the first dma module is from internal memory reading out data and while writing a certain the first subspace, the state of this first subspace is from switching to ready state with regard to preparing state; When the first dma module is from a certain the first subspace reading out data and while sending internal memory to, the state of this first subspace is from switching to standby condition with regard to ready state.Therefore when the second dma module adopts ping pong scheme, select to carry out data-moving in the first subspace of ready state.

It should be noted that, above-mentioned processor and data processing method can be applied on Godson 3B processor, and the embodiment of the present invention be take Godson 3B processor and processor of the present invention and data processing method described as example, but is not construed as limiting the invention.

One of ordinary skill in the art will appreciate that: all or part of step that realizes above-mentioned each embodiment of the method can complete by the relevant hardware of programmed instruction.Aforesaid program can be stored in a computer read/write memory medium.This program, when carrying out, is carried out the step that comprises above-mentioned each embodiment of the method; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CDs.

Finally it should be noted that: each embodiment, only in order to technical scheme of the present invention to be described, is not intended to limit above; Although the present invention is had been described in detail with reference to aforementioned each embodiment, those of ordinary skill in the art is to be understood that: its technical scheme that still can record aforementioned each embodiment is modified, or some or all of technical characterictic is wherein equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims

1. a processor, is characterized in that, comprising: the first direct memory access dma module, the second direct memory access dma module, internal memory, register file;

Described the second dma module, for carrying out data-moving between described memory access accelerating region and described register file;

2. processor according to claim 1, is characterized in that, described the first dma module writes described memory access accelerating region specifically for the data that the non-locking address area from described internal memory is read; Or,

3. processor according to claim 1 and 2, is characterized in that, described memory access accelerating region is divided at least two the first subspaces, and described register file is divided at least two the second subspaces;

4. processor according to claim 3, is characterized in that, described internal memory has at least continuous physical address space of 6Gbit, so that described the first dma module is from whole section of reading out data of described continuous physical address space.

5. a data processing method, is characterized in that, is applied to a processor, and described processor comprises the first direct memory access dma module, the second direct memory access dma module, internal memory and register file, and described data processing method comprises:

Described the first dma module carries out data-moving between described memory access accelerating region and described internal memory;

6. data processing method according to claim 5, is characterized in that, described the first dma module carries out data-moving between described memory access accelerating region and described internal memory, comprising:

7. according to the data processing method described in claim 5 or 6, it is characterized in that, described memory access accelerating region is divided at least two the first subspaces, and described register file is divided at least two the second subspaces;

8. data processing method according to claim 7, is characterized in that, described internal memory has at least continuous physical address space of 6Gbit, so that described the first dma module is from whole section of reading out data of described continuous physical address space.