WO2016082367A1

WO2016082367A1 - Method and device for realizing hardware table traversal based on network processor

Info

Publication number: WO2016082367A1
Application number: PCT/CN2015/073808
Authority: WO
Inventors: 黄治文; 张文军
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-11-25
Filing date: 2015-03-06
Publication date: 2016-06-02
Also published as: CN105700859A

Abstract

The present invention provides a method and device for realizing the hardware table traversal based on a network processor. The method comprises: receiving CPU packets, wherein the CPU packets carrying the matching information about hardware table items; configuring copy registers according to the total number of the hardware table items, wherein the value of the copy registers being the number of the CPU packets to be copied, and the value of the copy registers equaling to the total number of the hardware table items; copying the CPU packets according to the value of the copy registers, and assigning a copy number to each copied CPU packet, wherein the copy number being the sequence number when the CPU packets are copied; comparing the hardware table items in the hardware table with the copied CPU packets one by one, and determining whether the information about the hardware table items matches the matching information about the CPU packets or not; and if matching, storing the matched hardware table items in a pre-configured cache table, and updating the value of a pre-configured counter. The average response time for high-capacity hardware table traversal and batch reporting the matched hardware table items in the network processor can be effectively reduced by the method in the present invention.

Description

Method and device for implementing hardware table traversal based on network processor

Technical field

The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for implementing hardware table traversal based on a network processor.

Background technique

Passive Optical Network (PON) refers to a specific optical communication network that does not contain any electronic devices and electronic power sources, including optical line terminals (OLT), optical network units (ONUs), and optical distribution networks (ODNs). The OLT device is located at the central control station, and multiple ONUs are installed at the user site. The ODN between the OLT and the ONU contains fiber optics as well as passive splitters or couplers, but does not contain any active electronics. Therefore, the PON network has the advantages of no power supply pressure, strong environmental adaptability, and immunity from electromagnetic lightning. Using PON technology for access network construction can also save the cost of equipment room construction and reduce operation and maintenance costs. With the emergence of Ethernet Passive Optical Network (EPON) technology, EPON passive optical access network has become one of the main choices for the construction of next-generation optical access networks.

A network processor is a programmable device that is specifically applied to various tasks of a communication network, such as message processing, protocol analysis, and route lookup. The unique hardware design of the network processor makes it widely used in OLT devices with flexible programming and efficient message processing. There are many applications in OLT devices that require large-capacity hardware table support, such as a hardware MAC address table. When the CPU of the CPU needs to obtain all the entries in the large-capacity hardware table that meet certain characteristics, how to quickly traverse the large-capacity hardware table and report the matching entries in batches becomes one of the technical problems in the development of OLT device drivers.

Relying on the CPU to read the large-capacity hardware table in the chip for traversal, the CPU and the chip need to interact with a large number of reading messages, and the efficiency of this method is very low. In order to quickly traverse large-capacity hardware tables, a special hardware search engine can be designed to directly traverse the hardware table under the control of the CPU, but this method has higher requirements on the chip hardware and increases the development cost.

Summary of the invention

The embodiment of the invention provides a method and a device for implementing hardware table traversal based on a network processor, which can effectively reduce the average response time of large-capacity hardware table traversal and matching hardware table batch reporting in the network processor.

In order to achieve the above objective, an embodiment of the present invention provides a method for implementing hardware table traversal based on a network processor, where the method includes:

Receiving CPU packets, the CPU packets carrying matching information of hardware entries;

According to the total number of hardware entries, configure the copy register. The value of the copy register is the number of copies of the CPU message to be copied. The value of the copy register is equal to the total number of hardware entries.

Copy the CPU message according to the value of the copy register, and assign a copy sequence number to each CPU message copied, and the copy sequence number is the sequence number when the CPU message is copied;

Comparing the hardware entries in the hardware table with the copied CPU packets one by one, and determining whether the information of the hardware entries matches the matching information of the CPU packets;

If there is a match, the matching hardware entry is stored in a pre-configured cache table and the value of the pre-configured counter is updated.

After the step of storing the matched hardware table in the pre-configured cache table and updating the value of the pre-configured counter, the method further includes:

The saved hardware entry is obtained from the cache table according to the value of the counter.

The steps of configuring the replication register according to the total number of hardware entries include:

Detecting type information of CPU packets;

Configure the copy register based on the type information.

The steps of comparing the hardware entries in the hardware table with the copied CPU packets, and determining whether the matching information of the hardware entries matches the matching information of the CPU packets include:

Obtaining, according to the copy sequence number of the copied CPU packet, a hardware entry whose hardware entry index number is equal to the copy sequence number;

Determine whether the matching information of the CPU packet matches the information of the obtained hardware entry.

Discard the CPU packets corresponding to the copy sequence number.

An embodiment of the present invention further provides an apparatus for implementing hardware table traversal based on a network processor, where the foregoing apparatus includes:

The receiving module is configured to receive a CPU message, and the CPU message carries matching information of the hardware entry;

The configuration module is configured to configure a copy register according to the total number of hardware entries, and the value of the copy register is the number of copies of the CPU message to be copied, and the value of the copy register is equal to the total number of hardware entries;

The copy module is set to copy the CPU message according to the value of the copy register, and assign a copy sequence number to each copied CPU message, and the copy sequence number is the sequence number when the CPU message is copied;

The judging module is configured to compare the hardware entries in the hardware table with the copied CPU packets one by one, and determine whether the information of the hardware entries matches the matching information of the CPU packets;

The update module is configured to: when the information of the hardware entry matches the matching information of the CPU message, store the matched hardware entry in a pre-configured cache table, and update the value of the pre-configured counter.

Wherein, the device further comprises:

The obtaining module is set to obtain the saved hardware entry from the cache table according to the value of the counter.

The configuration module includes:

a detecting unit, configured to detect type information of a CPU message;

The startup unit is set to configure the copy register based on the type information.

The judging module includes:

The first unit is configured to obtain, according to the copy sequence number of the copied CPU message, a hardware entry whose hardware entry index number is equal to the copy sequence number;

The second unit is configured to determine whether the matching information of the CPU message matches the information of the obtained hardware entry.

Wherein, the device further comprises:

Discard the module and set the CPU packet corresponding to the copy sequence number.

The beneficial effects of the above embodiments of the present invention are as follows:

In the method for implementing hardware table traversal based on the network processor in the embodiment of the present invention, the method of traversing the hardware table by the microcode module and the logical multicast module of the network processor greatly shortens the time for traversing the hardware table, thereby effectively reducing the network. Large-capacity hardware table traversal in the processor and matching the average response time of batch reporting of hardware entries.

DRAWINGS

1 is a flow chart of steps of a method for implementing hardware table traversal based on a network processor according to an embodiment of the present invention;

2 is a flowchart of specific steps of step 12 in FIG. 1 according to an embodiment of the present invention;

3 is a flowchart of specific steps of step 14 in FIG. 1 according to an embodiment of the present invention;

4 is a schematic structural diagram of a line card and a network processor according to an embodiment of the present invention;

FIG. 5 is a form of an entry of a sys_mac_table and a mac_cache_table according to an embodiment of the present invention;

6 is a format of a packet header of a CPU packet according to an embodiment of the present invention;

7 is a format of a packet header of a CPU message MAC_GET_PKT according to an embodiment of the present invention;

8 is a flowchart of a microcode program mac_iter_process in an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of an apparatus for implementing hardware table traversal based on a network processor according to an embodiment of the present invention.

detailed description

The technical problems, the technical solutions, and the advantages of the present invention will be more clearly described in the following description.

The present invention is directed to the problem that the central processing unit CPU directly traverses the large-capacity hardware table in the prior art, resulting in a long response time, and provides a method and device for implementing hardware table traversal based on the network processor, which can effectively reduce the large size of the network processor. The capacity hardware table traverses and matches the average response time of batch reporting of hardware entries.

As shown in FIG. 1 , an embodiment of the present invention provides a method for implementing hardware table traversal based on a network processor, where the method includes:

Step 11: Receive a CPU message, where the CPU message carries matching information of the hardware entry.

Step 12: Configure a copy register according to the total number of hardware entries, and the value of the copy register is the number of copies of the CPU message to be copied, and the value of the copy register is equal to the total number of hardware entries;

Step 13, according to the value of the copy register, copy the CPU message, and assign a copy sequence number to each CPU message copied, and the copy sequence number is the sequence number when the CPU message is copied;

Step 14: Compare the hardware entries in the hardware table with the copied CPU packets one by one, and determine whether the information of the hardware entries matches the matching information of the CPU packets.

Step 15. If they match, the matched hardware entries are stored in a pre-configured cache table, and the values of the pre-configured counters are updated.

Optionally, in the foregoing embodiment of the present invention, after performing step 15, the method further includes: acquiring the saved hardware entry from the cache table according to the value of the counter.

Optionally, in the foregoing embodiment of the present invention, after the step 15 is performed, the method further includes: discarding the CPU message corresponding to the copy sequence number.

In a specific embodiment of the present invention, the logical multicast module and the microcode module of the network processor are used to implement fast traversal of the large-capacity hardware table, and the matching entry is cached, so that the CPU can directly read the matching entry. When a hardware table needs to be traversed, the CPU sends a CPU message to the network processor, and the CPU message header carries the matching feature of the hardware table entry. The matching feature of the entry refers to a certain field of the entry or Several fields are equal to the specified value of the CPU software. If the CPU software does not specify the value of any field of the hardware entry, it is required to obtain all valid entries in the hardware table. Then, the CPU message enters the microcode module of the network processor, and the corresponding microcode program configures the copy register. The value of the copy register is the number of copies of the message, and the number of copies is equal to the total number of entries in the hardware table. The microcode module is a packet processing module shared by a general network processor, and can store multiple microcode programs; by configuration, packets from different sources can correspond to different microcode programs. When the message enters the microcode module, the corresponding microcode program begins processing the message. The main process of the microcode program corresponding to the CPU message is to configure the copy register of the CPU message, and the copy register stores the number of copies of the message, which should be equal to the total number of entries in the hardware table N. Then, the CPU message is sent to the logical multicast module for copying, and each of the copied packets carries a copy sequence number, and the copy sequence number is the sequence number when the CPU message is copied, wherein the logical multicast module refers to the general network. A hardware unit that implements message replication in the processor. Messages with a copy register value greater than 1 are copied in the logical multicast module, and each copied message is assigned a copy sequence number and stored in a specific register. The CPU message is copied into N shares in the logical multicast module, and each message corresponds to a copy sequence number between 0 and N-1. After the logical multicast module copies the CPU message, each copied message returns to the microcode module, and the corresponding microcode program reads the entry in the hardware table whose index is equal to the copy number of the packet. The item index number is a sequence number of the hardware entry in the hardware table, and determines whether the entry satisfies the matching feature of the packet header. If yes, the value of the counter used for the statistics matching entry is incremented by one, and the The entry is written to the cache table of the cache matching entry. If it is not satisfied, the entry is not processed. Finally, the microcode program discards the duplicated packet regardless of whether the entry matches.

In a specific embodiment of the present invention, each of the copied packets carries a copy sequence number back to the microcode module, and the corresponding microcode program is executed once; each time the microcode program is executed, the hardware is read and judged. A different entry in the table, the index of the entry is equal to the copy number 0 to N-1 of the packet. Thus, after the microcode program is executed N times, it is equivalent to realizing the traversal lookup of the hardware table. Wherein, since the length of the microcode program may be limited by the hardware resources, it is generally impossible to sequentially read all the hardware entries in a microcode program. Therefore, the embodiment of the present invention uses the same microcode program multiple times, each execution only The way to read an entry. After the network processor traverses the hardware table, the CPU waits for a certain period of time to directly read the value of the counter, and reads the matching entry in the cache table according to the value of the counter, so as to obtain a batch matching entry (for example, a counter) The value is M, the CPU reads the first M items of the cache table according to the value M of the counter, that is, all matching entries), wherein the waiting time of the CPU should be based on the table capacity of the hardware table and the processing speed of the network processor. Determination can also be determined experimentally.

In a specific embodiment of the present invention, after the microcode program reads a matching entry, it is not immediately reported to the CPU, but is reported by the counter and stored in the cache table, which can reduce the CPU processing flow. It is convenient for the CPU to obtain batch matching entries. The initial value of the counter is 0, which can be cleared by the CPU. The index of the matching entry stored in the cache table is determined by the current count value of the counter. Generally, adding a 1 to the hardware counter and returning the count value is an inseparable atomic operation, which can avoid the problem that the matching counter intensive operation causes confusion when the microcode program is executed multiple times. In addition, the CPU message leaves the microcode module and returns to the microcode module. The configuration of this loop can be configured when the network processor is initialized.

In a specific embodiment of the present invention, the microcode module processes the message very efficiently, so the processing time of the whole process is far less than the response time of the CPU directly traversing the large-capacity hardware table. The time at which the CPU reads the cache table is only proportional to the actual number of matching entries. In most cases, the actual number of matching entries is much smaller than the total number of hardware table entries. Therefore, the response time of the method of the embodiment of the present invention is much smaller than the response time of the CPU directly traversing the hardware table.

As shown in FIG. 2, in the above embodiment of the present invention, the specific steps of step 12 are:

Step 21: Detect type information of a CPU packet.

In step 22, the copy register is configured according to the type information.

As shown in FIG. 3, in the above embodiment of the present invention, the specific step flow of step 14 is:

Step 31: Obtain a hardware entry whose hardware entry index number is equal to the copy sequence number according to the copy sequence number of the copied CPU packet.

Step 32: Determine whether the matching information of the CPU packet matches the information of the obtained hardware entry.

In a specific embodiment of the present invention, each of the copied packets carries a copy sequence number back to the microcode module, and the corresponding microcode program is executed once; each time the microcode program is executed, the hardware table is read and judged. A different entry, the entry number of the entry is equal to the copy sequence number of the packet. The following is a specific implementation:

In an EPON, an OLT includes a main control card, an uplink card, and a line card. Consider a line card implemented by a network processor. As shown in Figure 4, the line card forwards and learns the MAC address of the uplink and downlink packets. The current MAC address information is recorded in a hardware table sys_mac_table, and the table size is N, that is, a maximum of N MAC addresses are saved. As shown in FIG. 5, the contents of the sys_mac_table entry include a valid flag valid, a MAC address mac_addr, a port value portId, a vlan value vlanId, and a static/dynamic flag bit static_flag. The CPU can obtain the MAC address of a sys_mac_table that meets certain characteristics, such as obtaining a MAC address of a port, a vlan, or a port + a vlan.

In order to implement the technical solution of the foregoing embodiment, a line card is configured to leave a packet from the microcode module in the network processor chip, and then return to the loop queue of the microcode module, as shown in FIG. And in the network processor chip, a hardware counter mac_cache_counter for counting matching entries is defined, and a hardware table mac_cache_table for caching matching entries is used. The format of the entry of the cache table mac_cache_table is the same as that of the MAC address table sys_mac_table, such as Figure 5 shows. The microcode module stores a plurality of microcode programs, including but not limited to the microcode programs ecm_process and mac_iter_process. The microcode program ecm_process is used to process CPU messages sent by the CPU, and the microcode program mac_iter_process is used to process the loopback. The queue returns to the message of the microcode module. The microcode module can process the corresponding microcode program according to the source of the message.

Consider the case where the line card CPU obtains all MAC addresses on port 1 of a line card. As shown in FIG. 4, the line card CPU sends a CPU message for obtaining a MAC address to the network processor chip. As shown in Figure 6, the format of the CPU message header is defined as two parts, TYPE and VALUE. As shown in FIG. 7, the CPU packet of the MAC address is defined as MAC_GET_PKT, and the value of the VALUE part is defined as {port_match=1, portId=port1, vlan_match=0, vlanId=0}.

As shown in FIG. 4, the CPU message that obtains the MAC address first enters the microcode module of the network processor chip for processing, and the corresponding microcode program is ecm_process. Ecm_process first determines that the message type is MAC_GET_PKT, and then jumps to the corresponding sub-process seq_get_mac_addr. The sub-flow seq_get_mac_addr configures the replication register of the packet so that the number of copies is equal to N, and the destination queue of the configured packet is the loopback queue, so that the packet can be returned to the microcode module after leaving the microcode module.

As shown in FIG. 4, the copy register value of the CPU message is the copy number greater than 1, so it is copied into N copies in the logical multicast module, and each copy of the copied message carries a copy sequence number, which is sequentially taken. The value is 0 to N-1. The copy sequence number is saved in a specific register. These duplicate messages are processed back to the microcode module through the loopback queue, and the corresponding microcode program is mac_iter_process.

As shown in FIG. 8, the microcode program mac_iter_process first reads the specific register to obtain the copy sequence number k of the message, and then reads the entry K in the MAC address table sys_mac_table whose index is equal to k. If the entry K is invalid, the configuration discards the message and the program ends. If the entry K is valid, since the packet header port_mtach=1, the portId needs to match, so it is determined whether the portId of the entry K is equal to the portId (=port1) of the packet header. If the portIds of the two are not equal, the configuration discards the message and the program ends. If the portIds of the two are equal, the vlanId does not need to match because the packet header vlan_mtach=0. Therefore, the subsequent process adds 1 to the counter mac_cache_counter and returns the count value C, and writes the entry K to the cache table. The index in the mac_cache_table is equal to C. The entry of -1, the last configuration discards the message, and the program ends.

The counter mac_cache_counter is used to count the number of matching entries. The CPU software should clear mac_cache_counter to zero before each MAC address is obtained. Since the operation of adding 1 to the hardware counter and returning the count value is an inseparable atomic operation, when the microcode program mac_iter_process is executed multiple times and the macro_1 is returned to the mac_cache_counter, the count value C returned by each operation is correct. Reflects the current counting result, so the write of the cache table mac_cache_table will not be confusing, and all matching M entries can be correctly written to the first M items of the cache table mac_cache_table.

After waiting for all the copied messages to be processed, the CPU can directly read the counter mac_cache_counter to obtain the total number M of matching MAC addresses, and read the first M items of mac_cache_table. In the specific implementation, the waiting time of the CPU can be determined experimentally. The waiting time should be such that when the total number of MAC addresses reaches the maximum number of MAC addresses supported by the line card, that is, the table size N of the MAC address table sys_mac_table, all copied messages can be processed by the microcode program mac_iter_process.

Clear the counter mac_cache_counter from the CPU software, send the CPU message to obtain the MAC address, and finally the CPU reads all the matching MAC address information. This process should be guaranteed not to be interrupted. The code interval of the corresponding CPU software should be Use semaphores and other methods to ensure mutual exclusion. This can prevent the CPU from continually performing the operation of obtaining the MAC address, and the competition between the counter mac_cache_counter and the cache table mac_cache_table is confusing. Only the case of obtaining all MAC addresses on the designated port of the line card is discussed here. The process of obtaining the MAC address of the vlan and obtaining the MAC address of the line card and all the MAC addresses of the vlan are also applicable to the above process.

In order to achieve the above objective, as shown in FIG. 9, the embodiment of the present invention further provides an apparatus 90 for implementing hardware table traversal based on a network processor, where the apparatus 80 includes:

The receiving module 91 is configured to receive a CPU packet, where the CPU packet carries matching information of the hardware entry.

The configuration module 92 is configured to configure a copy register according to the total number of hardware entries, and the value of the copy register is the number of copies of the CPU message to be copied, and the value of the copy register is equal to the total number of hardware entries;

The copying module 93 is configured to copy the CPU message according to the value of the copy register, and assign a copy sequence number to each copied CPU message, and the copy sequence number is a sequence number when the CPU message is copied;

The determining module 94 is configured to compare the hardware entries in the hardware table with the copied CPU messages one by one, and determine whether the information of the hardware entries matches the matching information of the CPU packets;

The update module 95 is configured to: when the information of the hardware entry matches the matching information of the CPU message, store the matched hardware entry into the pre-configured cache table, and update the value of the pre-configured counter.

Wherein, the device 90 further includes:

The configuration module 92 includes:

a detecting unit, configured to detect type information of a CPU message;

The determining module 94 includes:

In a specific embodiment of the present invention, the functions of the receiving module 91, the configuration module 92, the copying module 93, the determining module 94, and the updating module 95 can be implemented by a receiving module, a microcode module, and a logical multicast module of the network processor. .

The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Industrial applicability

As described above, the method and apparatus for implementing hardware table traversal based on a network processor provided by the embodiment of the present invention have the following beneficial effects: the method of traversing the hardware table by the microcode module and the logical multicast module of the network processor The time for traversing the hardware table is greatly shortened, thereby effectively reducing the average response time of large-capacity hardware table traversal and matching hardware item batch reporting in the network processor.

Claims

A method for implementing hardware table traversal based on a network processor, comprising:

Receiving a CPU message, where the CPU message carries matching information of a hardware entry;

And configuring, according to the total number of the hardware entries, a copy register, where the value of the copy register is a number of copies of the CPU message, and the value of the copy register is equal to the total number of hardware entries;

Copying, according to the value of the copy register, the CPU message, and assigning a copy sequence number to each copied CPU message, where the copy sequence number is a sequence number when the CPU message is copied;

Comparing the hardware entries in the hardware table with the copied CPU packets one by one, and determining whether the information of the hardware entries matches the matching information of the CPU packets;

If there is a match, the matching hardware entry is stored in a pre-configured cache table and the value of the pre-configured counter is updated.
The method of claim 1, wherein, if the matching, the matching hardware entry is stored in a pre-configured cache table, and the value of the pre-configured counter is updated, the method further comprises:

The saved hardware entry is obtained from the cache table according to the value of the counter.
The method of claim 1, wherein the step of configuring the copy register according to the total number of the hardware entries comprises:

Detecting type information of the CPU packet;

The copy register is configured based on the type information.
The method of claim 1, wherein the hardware table items in the hardware table are compared with the copied CPU messages one by one, and it is determined whether the information of the hardware table matches the matching information of the CPU message. The steps include:

Obtaining, according to the copy sequence number of the copied CPU packet, a hardware entry whose hardware entry index number is equal to the copy sequence number;

And determining whether the matching information of the CPU packet matches the information of the obtained hardware entry.
The method of claim 4, wherein, if the matching, the matching hardware table entry is stored in a pre-configured cache table, and the value of the pre-configured counter is updated, the method further comprises:

Discard the CPU packets corresponding to the copy sequence number.
A device for implementing hardware table traversal based on a network processor, comprising:

The receiving module is configured to receive a CPU message, where the CPU message carries matching information of the hardware entry;

a configuration module, configured to configure a copy register according to the total number of the hardware entries, where the value of the copy register is a number of copies of the CPU message, and the value of the copy register is equal to the total number of hardware entries;

a copy module, configured to: copy the CPU message according to the value of the copy register, and assign a copy sequence number to each copied CPU message, where the copy sequence number is a sequence number when the CPU message is copied;

The judging module is configured to compare the hardware entries in the hardware table with the copied CPU messages one by one, and determine whether the information of the hardware entries matches the matching information of the CPU packets;

The update module is configured to: when the information of the hardware entry matches the matching information of the CPU packet, store the matched hardware entry in a pre-configured cache table, and update the value of the pre-configured counter.
The device of claim 6 wherein said device further comprises:

The obtaining module is configured to obtain the saved hardware entry from the cache table according to the value of the counter.
The apparatus of claim 6 wherein said configuration module comprises:

a detecting unit, configured to detect type information of the CPU message;

A boot unit configured to configure a copy register based on the type information.
The apparatus of claim 6 wherein said determining module comprises:

The first unit is configured to obtain, according to the copy sequence number of the copied CPU message, a hardware entry whose hardware entry index number is equal to the copy sequence number;

The second unit is configured to determine whether the matching information of the CPU message matches the information of the obtained hardware entry.
The device of claim 9 wherein said device further comprises:

Discard the module and set the CPU packet corresponding to the copy sequence number.