WO2008143622A1 - System and method for designing and implementing packet processing products - Google Patents

System and method for designing and implementing packet processing products Download PDF

Info

Publication number
WO2008143622A1
WO2008143622A1 PCT/US2007/012583 US2007012583W WO2008143622A1 WO 2008143622 A1 WO2008143622 A1 WO 2008143622A1 US 2007012583 W US2007012583 W US 2007012583W WO 2008143622 A1 WO2008143622 A1 WO 2008143622A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
output
header
packet processing
input
Prior art date
Application number
PCT/US2007/012583
Other languages
French (fr)
Inventor
Vispi Cassod
Anthony Dalleggio
Amine Kandalaft
Original Assignee
Modelware, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Modelware, Inc. filed Critical Modelware, Inc.
Priority to PCT/US2007/012583 priority Critical patent/WO2008143622A1/en
Publication of WO2008143622A1 publication Critical patent/WO2008143622A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design

Definitions

  • the present invention relates to digital component design and implementation systems and, more particularly, to a system and method for designing and implementing packet processing products.
  • Computer-based communications are dominated by the transmission of packets of data.
  • a packet contains a payload, i.e., a portion of an overall data message, surrounded by a number of header bits or bytes, that are used to insure that the payload is transmitted and received without error.
  • the header bits or bytes can be divided into a number of fields designating commands, responses, packet characteristics, etc.
  • the fields can take on one or more values depending on the particular protocol used. Some protocols are custom-designed, while others, such as asynchronous transfer mode (ATM) or Transmission Control Protocol/Internet Protocol (TCP/IP), are standardized.
  • ATM asynchronous transfer mode
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • NPU network processing unit
  • the present invention relates to a system and method for designing and implementing packet processing products, wherein a user can create instructions for building a packet processing integrated circuit.
  • the system includes a user interface for allowing a user to define a desired packet processing algorithm by defining a plurality of discrete, packet processing blocks, each of the blocks corresponding to a portion of the desired packet processing algorithm, as well as connections between the plurality of packet processing blocks.
  • the system processes the plurality of packet processing blocks and the connections to provide a list of instructions in a hardware description language for producing an integrated circuit capable of executing the desired packet processing algorithm.
  • the list of instructions can be delivered to a customer, or the customer can be provided with an integrated circuit constructed using the list of instructions.
  • the customer can also be provided with a NETLIST generated using said list of instructions.
  • the packet processing blocks of the present invention include a Packet Processing Unit (PPU) 1 a Packet Modification Unit (PMU), and a Decision and Forwarding Unit (DFU).
  • the PPU includes functionality for extracting a header of a packet; for pointing to a portion of the header of a predetermined width using a predetermined index of a bit location in the header; for comparing the data represented by the portion of the header with at least one predetermined value; and for declaring a match when the result of the comparison is true.
  • a variation of a PPU, called a PPUX includes functionality for accessing an external Content- Addressable Memory (CAM) or Random-Access Memory (RAM).
  • CAM Content- Addressable Memory
  • RAM Random-Access Memory
  • the PMU includes functionality for extracting a packet; pointing to a portion of the packet of a predetermined width using a predetermined index of a bit location in the packet; and modifying the portion of the packet.
  • a packet can be modified in one of three ways: deletion, insertion, or overwriting a portion of the packet.
  • the DFU can perform one of drop, queue, and forwarding operations on packets coming from at least one PPU, PPUX, or PMU.
  • the PPU, PPUX, PMU, and DFU can be programmed by an external microprocessor.
  • FIG. 1 is a flowchart showing a process according to the present invention for designing a packet processing product
  • FIG. 2A is a screen shot of a window in a graphical user interface (GUI) according to the present invention for choosing a type of packet processing block to be configured;
  • GUI graphical user interface
  • FIG. 2B is a screen shot of a window in a graphical user interface (GUI) for selecting configuration parameters for generating a Packet Processing Unit (PPU) of the present invention
  • GUI graphical user interface
  • FIG. 2C is a screen shot of a window in a graphical user interface (GUI) for selecting configuration parameters for generating a Packet Modification Unit (PMU) of the present invention
  • FIG. 2D is a screen shot of a window in a graphical user interface (GUI) according to the present invention for selecting configuration parameters for generating a Decision and Forwarding Unit (DFU) of the present invention;
  • GUI graphical user interface
  • FIG. 3 is a block diagram of a plurality of packet processing blocks according to the present invention for designing a packet processing product
  • FIG. 4 is a block diagram showing, in greater detail, a Packet Parsing Unit (PPU) of the present invention
  • FIG. 5 is a block diagram showing, in greater detail, a Packet Parsing Unit with an external interface to a CAM/RAM (PPUX) of the present invention
  • FIG. 6 is a block diagram showing, in greater detail, a Packet Modification Unit (PMU) of the present invention.
  • PMU Packet Modification Unit
  • FIG. 7 is a block diagram showing, in greater detail, the Decision and Forwarding Unit (DFU) of the present invention.
  • FIG. 8 is a block diagram showing a sample packet processor design for determining the queuing precedence of a VLAN/non-VLAN frame.
  • the present invention allows a user to design packet processing products using a high-level programming language which generates a NETLIST for generating a hardware design specification of a digital circuit.
  • a NETLIST describes the connectivity of an electronic design.
  • the design process begins at step 1 , wherein a set of user requirements and specifications are received, which may be in the form of a packet parsing architecture or a packet parsing and classification algorithm. Typically, these requirements are in the form of a text description of the system to be generated.
  • the description is translated by the user or provider into a textual or graphical design using packet processing blocks which include Packet Parsing Units (PPU), Packet Parsing Units with an external interface to a CAM/RAM (PPUX) 1 Packet Modification Units (PMU), and Decision and Forwarding Units (DFU), which will be described hereinbelow with reference to FIGS. 3-7.
  • PPU Packet Parsing Unit
  • PPUX Packet Parsing Units with an external interface to a CAM/RAM
  • PMU Packet Modification Units
  • DFU Decision and Forwarding Units
  • step 2 if the customer needs a firewall that accepts TCP packets and rejects UDP packets, then three PPUs and one DFU are required.
  • One of the PPUs is devoted to determining a source IP address; a second PPU is devoted to extracting a destination IP address; and a third PPU is devoted to distinguishing between TCP and UDP packets.
  • the three PPUs are connected in parallel (since the information can be extracted simultaneously from the same packet), and the "match" outputs of the PPUs (to be described with reference to FIG. 4) and a source packet is forwarded to a DFU.
  • the DFU takes each match input and the packet and makes a decision: If the packet is a TCP packet and the source and destination addresses are allowed, then the packet is passed on, otherwise the packet is to be dropped. Thus, in step 2, the user can select the required number and combination of packet processing blocks to be used in the design.
  • the packet processing block requirements including their required inputs and outputs, are entered into a connection document, which can be a text based EXCELTM spreadsheet or a VISIOTM block diagram. Typical inputs to the connection document include entries for each PPU and DFU block, which may include an index representing the point of entry into a packet to be processed, and whether a lookup in an internal table of data in a PPU is required.
  • packet processing blocks e.g., each PPU and DFU
  • Configuring a packet processing block involves taking a "default" packet processing block file, such as a generic PPU or DFU file, and modifying portions of it and setting variables within each file.
  • Code for the packet processing blocks to be described in FIGS. 4-7 (written in pseudo-code) can be found in Appendices A-E and G-L attached hereto.
  • the pseudo-code for the PPU calls code found in the following appendices: a file for describing a generic header extraction block called a Hardware Lookup Unit (HLU) (see Appendices D and K), and a file for describing a generic Match/Lookup Unit (MLU) (see Appendices E and L). Both the HLU and MLU will be described hereinbelow as part of the description of the PPU.
  • the packet processing blocks are implemented in a hardware design language (HDL) which models digital circuits, with gates, flip flops, counters, and other logic in a C- like software language.
  • HDL hardware design language
  • the "pruning" process can be performed by manually copying and editing a maximally configured processing block file, or by applying a preprocessor in the form of shell scripts to cull code from and substitute variables within a maximally configured processing block files.
  • Preprocessing shell scripts can include textual or graphically-based user prompts for answering questions about specific parameters desired by the user for a particular block.
  • FIGS. 2A-2D show one possible example of graphical user interface (GUI) which can be used to enter parameters for packet processing blocks.
  • GUI graphical user interface
  • a Main generation GUI window 13 is presented to the user, as shown in FIG. 2A.
  • One of a number of radio buttons 13 is selected by the user to indicate the type of processing block to be configured.
  • a configuration window 15 is displayed, one for each type of processing block (i.e., PPU/PPUX (see FIG. 2B); PMU (see FIG. 2C); and DFU (see FIG. 2D)).
  • Each configuration window 15 contains a field 16 for naming the processing block.
  • a series of configuration screen elements 17 are presented to the user for allowing parameters of each processing block to be specified by the user (including, e.g., data bus width, start of packet width, end of packet width, maximum header words, qualifier width, result width, result .expression, external memory parameters, number of interfaces, etc.), and which may vary according each type of processing block.
  • the user can click on either a "Generate” button 18 to cause the particular processing block code to be generated, or a "Cancel” button 19.
  • the GUI code can pass the input parameters to a preprocessor, such as a preprocessor called "veriloop2."
  • a preprocessor such as a preprocessor called "veriloop2.”
  • the pseudo-code for veriloop2 can be found in Appendix F. Veriloop2 first performs substitutions into appropriate variables using the parameters passed from the GUI. Veriloop2 then searches for constructs such as name-value pairs, conditional constructs, and loops having a particular syntax, and then culls the maximally configured packet processing block file to produce a preprocessed header-like library files, each containing a function or class representing a particular PPU, DFU, etc.
  • Pseudo code for types of preprocessor constructs can be found in Appendix G. Pseudo code for sample pre-processed files of FIG.
  • Appendices H-L there is only one PPU/MLU/HLU file for all three PPUs, which share the same number of inputs/outputs and share the same general structure.
  • the number of PPUs that need to be generated depends upon the degree of parallelism needed for a particular design. If all the operations for a number of PPUs can be performed in series, then one PPU is needed, since all that changes between instances of PPUs is the input parameters (e.g. opcode, mask, etc.). There is one generated PPU for each parallel operation.
  • There are separate DFU Appendices i.e., Appendices B, H 1 and I because each DFU can have a different number of inputs/ outputs).
  • the present invention distils the implementation of maximally configured processing blocks into common sub-blocks which have unique names (e.g., PPlM , DFU_2) or modules which have inputs and outputs that can be interconnected in such a way as to perform all of the functions necessary for implementing a desired packet processing product.
  • the common blocks described herein are preferably instantiations of packet processing blocks written in VHDL 1 Verilog, or System C, but other suitable hardware description languages can be used.
  • the software implementation of packet processing blocks is platform independent, and can be written in a platform independent language such as JAVA.
  • packet parser/classifier functionality of the present invention can run both in Windows and in different versions of the Unix operating system, as well as others.
  • the programmer/designer can invoke instances of these common modules using a C-like application programming interface (API) surrounded by other C-like code for interconnecting the sub-blocks.
  • API application programming interface
  • integration involves declaring instantiations of each processing block by name, and making connections between instantiated packet parsing blocks in a top-level main program file (the top-level main program file is similar to the file containing the main() function call in C language). These connections are called “wires” or “signals” which are declared like variables, and associations are made between two processing block instances which have a common wire. For example, signal “x” in PPU1 ties to signal “y” in the top level file. Signal “z” of DFU 1 also ties to signal “y” in the top level file. In this way, signal “x” of PPU1 is tied to Signal “z” of DFU1 which may also be tied to one or more other signals. Certain input parameters can also be "hard-coded” within the top-level file.
  • step 6 if the customer desires only the design, then at step 7, the generated packet processing block files and the top level file can be delivered to the customer. If the customer desires to have a NETLIST, then at step 8, the generated files are run through a commercially-available synthesis tool, as is known in the art.
  • Sample synthesis tools include Design Compiler from Synopsis, Precision Synthesis from Mentor Graphics, Sinplify from Synplicity, or XST from Xilinx. The synthesis tool behaves like an optimizing compiler which produces a NETLIST for producing an electrical schematic for a custom integrated circuit which is implemented with a minimum number of logic gates, flip-flops, counters, etc.
  • NETLIST generated depends on whether the customer desires to have a foundry-specific device, e.g. a Xilinx FPGA or a generic ("virtual") NETLIST which is not specific to a particular vendor's product.
  • a foundry-specific device e.g. a Xilinx FPGA or a generic (“virtual") NETLIST which is not specific to a particular vendor's product.
  • Customers which are EDA (electronic design automation) vendors desire a non-specific NETLIST.
  • the NETLIST could be a foundry-specific or "virtual" bitstream or binary file that is delivered to customer.
  • the NETLIST is delivered to the customer, otherwise, at step 11 , the NETLIST is run through a place and route program, which physically constructs the gates defined in the NETLIST on a silicon die and interconnects them.
  • the choice of a place and route tool depends on whether the packet parser/classifier is to be implemented as an ASIC (fixed logic) or an FPGA (programmable logic).
  • Sample place and route programs include Quartus Il from Altera and ISE from Xilinx.
  • the integrated circuit is delivered to the customer.
  • FIG. 3 a block diagram of a graphical design environment using packet processing blocks according to the present invention for designing a packet processing product, indicated generally at 20, is depicted.
  • the blocks 20 can be implemented in a text-based or graphical design environment.
  • the environment 20 includes combinations of any number of Packet Parsing Units (PPUs) 22, PPUXs 24 (which are PPUs that can access CAM/RAM memory 26), Packet Modification Units (PMUs) 28, and Decision and Forwarding Units (DFUs) 30.
  • PPUs 22, PPUXs 24, PMUs 28, and DFUs 30 can be connected by a designer in a variety of ways to create parsing/classification logic for any desired packet processing algorithm.
  • the PPUs 22 operate on packet headers 21.
  • the packet itself can be passed through the environment 20 intact.
  • only the packet header 21 is passed through the environment, which requires the creation and passing of a pointer to the packet data to be output after the DFUs 30.
  • the packets are stored in memory upon arrival and retrieved from memory upon departure.
  • a copy of the header 21 and a pointer to the packet location is passed to the development environment 20.
  • the length of the copied header 21 is variable. It starts at a programmable position in the header 21 and ends at the last field that must be processed.
  • a PPU takes a header 21 and can seek, i.e., locate, any field of constant or variable length. Once the field is found in the header 21, the PPU 22 can perform a check on that field, such as whether the field is equal to or greater than a given value, or matches a particular value, and then output that value depending on the operation performed.
  • PPUXs 24 are PPUs that can perform lookups or searches using external random-access memories (RAMs) or CAMs (a CAM is defined as a RAM-like memory which can determine whether an input value is present in the memory device).
  • a PMU 28 is a PPU which allows fields in the header of a packet or the packet itself to be modified by means of insertions, deletions, or substitution of bytes.
  • the PPUs 22 and PPUXs 24 only allow the fields of a packet header to be examined. Any number of PPUs 22, PPUXs 24, and PMUs 28 can be chained together in series or in parallel to implement complex expressions.
  • the DFUs 30 combine the output of one or more PPUs 22 and/or PPUXs 24 and/or PMUs 28 using a programmable condition, and then forward the header to one of a plurality of outputs.
  • the outputs can represent Boolean True and False values, and decisions as to whether to drop, forward, or queue the packet.
  • the DFUs 30 make decisions to forward, drop, or enqueue packets based on the results from the. PPUs 22.
  • the output of the last DFU in the chain such as the DFU labeled "A"
  • the traffic manager 31 is a device which performs a set of actions and operations for a network to guarantee the operability of the network.
  • Traffic Management is exercised in the form of traffic control and flow control.
  • the traffic manager 31 operates on a packet stream once the classification & processing is done on a packet (i.e. once it passes from PPU/DFU blocks).
  • PPU/DFU blocks are used to figure out the priority number of a packet.
  • the traffic manager is given that priority number and the packet to do a traffic control operation to guarantee that high priority packets pass before low priority packets.
  • the PPU 22 performs basic parsing of the packet header 21 and may perform mathematical/logical operations on the parsed fields of packet header 21.
  • the PPU 22 includes a plurality of inputs and outputs 32-83. The function of each input and output 32-83, as well as the values that each input or output handle, are described with reference to Table 1 hereinbelow.
  • the input CIk 32 is supplied from external hardware, such as the clock of a microprocessor.
  • the Input Rst 34 is used to cause the PPU to go into a predefined state where most internal variables and outputs are set to an initial value. This condition is usually needed at power-up of the hardware in logic systems to stabilize the system before execution of a packet processing algorithm.
  • the system is initially Reset. A predetermined amount of time later, when it is known that all circuits have stabilized, then the circuit is put into operation by toggling Rst 34.
  • the PPU 22 includes a Hardware Lookup Unit (HLU) 84, a Delay/FIFO module 86 containing an optional Delay Line 88 or a FIFO 90, a Match and Lookup Unit (MLU) 92, Result Generation (process) 94, Sequence Generation (process) 96, an Output Alignment (process) 98, interconnected as shown.
  • the sub-blocks 84-98 are implemented as modules or processes.
  • a module is similar to a class or subclass in an object-oriented language like C++, while a process is similar to a function.
  • the PPU also contains (not shown) a predetermined but limited number of internal general-purpose registers for storing and retrieving values for comparisons, lookups, etc.
  • a stream of data is continuously presented to the input Dataln 36 of the HLU 84. No data of the input stream is stored in a memory. In such circumstances, it is the job of the HLU 84 to extract information from a packet and present that information to the other blocks of the PPU 22.
  • the HLU 84 takes a snapshot of the data stream according to the location in the data stream specified by the inputs Index 56 and Width 58.
  • the inputs SOHIn 38, EOHIn 40, and InVaI 42 allow for fine tuning of locating data from the output of other PPUs, PPUXs, PMUs, or external hardware.
  • SOHIn 38, EOHIn 40, and InVaI 42 tell the PPU 22 how to delimit data a packet header.
  • SOHIn 38 tells the hardware where packet starts and EOHIn 40 tells the hardware when a packet header ends. Once the packet starts, then at every clock cycle, the data presented at Dataln 36 is either valid or invalid, as indicated by the input InVaI 42.
  • the extracted header bits are present as an output CompDat 100 and as an input to the MLU 92.
  • CompDat 100 stands for the data that needs to be compared in the MLU 92.
  • the Delay/FIFO module 86 is used to synchronize the outputs of the PPU 22 to be presented to a subsequent block, such as a DFU.
  • the Delay/FIFO module 86 is needed because the inputs to the PPU, such as Dataln 36, along with the control input signals SOHIn 38, EOHIn 40, and InVaI 42, need to be aligned in time in the Output Alignment process 98 with intermediate outputs of other sub-blocks of the PPU 22, such as the Match output 110 of the MLU 92, which may be delayed relative to the inputs due to delays in processing within the MLU 92.
  • the MLU 92 performs its decision making (e.g., a comparison of a bit within Dataln 36 with a user specified parameter (Parami )) without full packet storage. Therefore, Dataln 36 along with the control input signals SOHIn 38, EOHIn 40, and InVaI 42 are pipelined to the Result Generation process 94 and the Output Alignment process 98 by way of intermediate I/O Val_i 102, SOH_i 104, EOH_i 106, and Data_i 108. There are fixed delays (measured in clock cycles) associated with processing in the in Result Generation process 94 and the MLU 92. There is a variable delay associated with the HLU 84 depending upon value of Index 56.
  • the inputs described above must be delayed in the Output Alignment process 98 by the sum of the aforementioned individual delays. For example, if Index 56 is 8, then CompDat 100 is received at the MLU 92 eight clock cycles after Dataln 36 arrives at the PPU 22. If the MLU 92 processes CompDat 100 in three clock cycles, then the PPU 22 inputs need to be delayed by 8 + 3 clock cycles in the Output Alignment process 98.
  • the choice of the optional Delay Line 88 or the FIFO 90 depends on the size of the delay needed. A FIFO always works but requires using scarce memory in the PPU 22.
  • the MLU 92 performs the bulk of the packet parsing and classification operation to be performed on one unit of a packet processing algorithm.
  • the MLU 92 is programmable, i.e., it can compare the data/fields extracted in the HLU 84 with values stored in internal registers by means of the inputs Opcode 62, Parami 64, Param2 66, and Mask 68 and declares a match or no match which appears on the internal output Match 110, which, in turn, appears as an output of the Result Generation process 94.
  • the inputs QualEnb 52 and QualCond 54 enable or disable the MLU 92 depending on certain conditions.
  • the operation to be performed in the MLU 92 are enabled if the result of the check of the QualEnb 52 using the QualCond 54 is true.
  • QualEnb 52 is a value stored in a qualEnb register (not shown) which is user programmable through an address map.
  • the Qualifier Condition 44 can be: Always True, Equal, Less Than, Less Than or Equal, Greater Than, Greater Than or Equal, etc.
  • QualEnb 52 can be programmed through the qualEnb register (not shown) to be the value 6.
  • QualCond 54 is set to Equal To (EQ).
  • the packet type is retrieved from a mode register from an external CPU. If the packet type is 6 (IPV6), then the MLU 92 is enabled; if the packet type is 4 (IPV4), then the MLU 92 is disabled, and no comparison takes place. If it is desired to have all types of IP packets, then QualCond 54 is set to Less Than or Equal (LE) or Always True.
  • the match/no-match functionality of the MLU 92 is performed on the portion of the Dataln 36 packet header pointed to by Index 56 and Width 58. Additional inputs Mask 60, Opcode input 62, Parami 64, and optionally Param2 66 are needed to perform the comparison/match/no-match operation.
  • the MLU 92 performs a seek and operation function.
  • the seek function finds a data field in a packet header (not shown) based on an offset from the start of the packet header indicated by the input Index 56. If Index 56 is 0, then the first byte of the packet header is indicated. An Index 56 of six indicates the seventh byte from the beginning of the packet header.
  • the interconnections that can be made to the Index input 56 include a fixed value (e.g. 4), a value stored in an internal user defined control register, or the result output 70 of another PPU, PMU, or DFU. If the Index input 56 is driven from another PPU, PMU, or DFU 1 the value placed on the Index input 56 is variable, depending on the condition(s) evaluated in the previous PPU, PMU 1 or DFU.
  • the operation function performs a check, an extraction, or a lookup on "Data Field", which is the contents of the packet header pointed to by the Index input 56 of width equal to the value in bits placed on the Width input 58.
  • Data Field is the contents of the packet header pointed to by the Index input 56 of width equal to the value in bits placed on the Width input 58.
  • the Data_Field may be filtered (AND'ed) with the Mask input 60.
  • Op is one of the opcodes placed on the Opcode input 62 given the Parami input 64, and optionally the Param2 input 66.
  • Table 2 The types of operations are shown in Table 2 below:
  • a single MLU can be programmed to check if an IP address less than 224.XX.XX.XX, by specifying the following values:
  • IP DA Points to IP DA or SA and can be adjusted automatically for VLAN tagging using a PPU.
  • the inputs MapWrRd_n 76, MapAddr 78, and MapWrData 80, and the output MapRdData 81 are used as the interface between an external microprocessor and the internal registers of the PPU 22 to allow for reading of and writing to the registers.
  • the PPU 22, PPUX 24, PMU 28, and DFU 30 can contain a user defined number of internal registers for packet header manipulation either internally or via an external microprocessor.
  • the opcodes LUP and SPCL can be used to directly manipulate data in internal registers.
  • the output Match 110 of the MLU 92 is fed to the input of the Result Generation process 94 to be described hereinbelow.
  • the Match output 110 is True if the operation performed in the MLU 92 is True, or False otherwise.
  • the Result Generation process 94 takes the Match output 110, the outputs of the Delay/FIFO module 86, and optionally a tag value present on TAG 83 and produces the result output iResult 112, which is fed as an input to the Output Alignment process 98 and ultimately is the output Result 70 of the PPU 22.
  • the Result Generation process 94 also outputs iResVal 114, which indicates when iResult 112 is valid. This is needed as a handshaking device, since result generation can take more than a single clock cycle.
  • iMatch 116 is the value of Match 110 passed along from the MLU 92. Assuming the MLU 92 was enabled, iResult 112 can take on two values corresponding to the True or False evaluation of the operation performed in the MLU 92. The True/False result values can be fixed or an arithmetic or logical function of any of the PPU 22 inputs. The iResult output 112 is later passed through the Output Alignment process 98 to be described hereinbelow as Result 70, which can be used to drive a DFU input or any input of another PPU or a PMU. Result 70 can also be a complex expression that the user may want to program. This allows the Index 56, QualEnb 52, Opcode 62, or Param ⁇ 1,2> 64, 66 inputs of a PPU to be driven with different values depending on the Result 70 output of other PPUs.
  • the PPU 22 generates or forwards a sequence number using the Sequence Generation process 96.
  • the sequence number can optionally come from an external process/hardware via the input Seqln 82 and passed along to a DFU; otherwise sequence numbers are internally generated within a PPU 22 using the Sequence Generation process 96.
  • the sequence number which appears as an internal output iSeq 118, is passed through the Output Alignment process 98 to a DFU through the PPU output SeqOut 74. Sequence numbers are incremented sequentially for each use of a PPU and are used for internal synchronization of all the inputs of a DFU. Sequence numbers are needed because different PPUs can present their output packet header data, match data, and results at different times.
  • one PPU may index at bit 0 of an incoming packet, in which case match output may appear at an input to a DFU after three clock cycles. If another PPU indexes on a VLAN type field, then index is set to block 5 or 6, which gives its results to the same DFU after 6 + 3 clock cycles.
  • the DFU takes the matches packet headers, and sequence number from each of the PPUs and arranges them in correct sequence to be described hereinafter.
  • the Output Alignment process 98 aligns all outputs to the start of packet (SOP) or the end of packet (EOP). This is done in order to provide proper delineation of the output signals of one PPU to the next PPU/PPUX/PMU/DFU. For example, if PPU1 is connected to PPU2, and PPU1 operates either on an 802.3 Ethernet frame or an Ethernet type 2 frame, then PPU 1 examines a byte field which is either 20 bytes or 40 bytes from the beginning of a packet header. Therefore, all outputs of PPU 1 need to be aligned on SOP as a requirement for input to PPU2. As another example, some protocols use trailer insertion, e.g., inserting a checksum at the end of a packet. Therefore, outputs are aligned at EOP.
  • FIG. 5 a block diagram of a PPUX 24 is depicted.
  • a PPUX 24 has the same I/O signals and sub-blocks as the PPU 22 except for additional I/O needed to access an external CAM/RAM 220.
  • Elements illustrated in FIG. 5 which correspond to the elements described above in connection with the PPU 22 of FIG. 5 have been identified by corresponding reference numbers increased by one hundred. Unless otherwise indicated, both the PPU 22 and the PPUX 24 have the same construction and operation.
  • a PPU there is a predetermined number of internal registers/memory which can be programmed by a user.
  • a typical need for programmed memory is for performing a lookup of values by MLU 192. For example, if there is a need to compare Parami 164 to one hundred IP addresses, then internal memory is used. However, if the number of lookups and hence values to be stored in memory is on the order of thousands of bytes or more, then it may be necessary to store and retrieve these values to/from an external CAM/RAM 220.
  • a PMU Packet Modification Units
  • a PMU allows for modification, i.e., insertion, deletion, or replacement, of bytes in a packet, including both the header and payload data.
  • the PMU 28 includes a Delay/FIFO module 300 containing an optional Delay Line 302 or a FIFO 304, a Modification Unit (MU) 306, a Result Generation process 308, a Sequence Generation process 310, and an Output Alignment process 312, interconnected as shown.
  • These sub-blocks 300-312 are implemented as software modules or processes.
  • the inputs InVaI 314, SOHIn 316, EOHIn 318, Dataln 320, Tagln 322, Rst 324, and CIk 326 have the same functionality as is found in the PPU 22 and the PPUX 24.
  • the delay/FIFO module 300 can be used to synchronize the inputs InVaI 314, SOHIn 316, EOHIn 318, Dataln 320, and Tagln 322 with the outputs of the Result Generation Process 308 and the outputs of the Modification Unit (MU) 306 as is done in the PPU 22, but it also provides a second function: to delay incoming packet data by an amount equal to the number of bytes that may be inserted into a packet in the Modification Unit 306.
  • the choice of the optional Delay Line 302 or the FIFO 304 depends on the size of the delay needed. If only a few clock cycles worth of delay (a few words to be inserted) are needed, then the Delay Line 302 is used, otherwise the FIFO 304 is used.
  • InVaI 314, SOHIn 316, EOHIn 318, and Dataln 320 are pipelined to the a Modification Unit (MU) 306 as the intermediate outputs VaI i 328, SOH_i 330, EOH_i 332, and Data_i 334.
  • MU Modification Unit
  • Val_i 328 is also directed to the Result Generation Process 308.
  • the Result Generation Process 308 has a different purpose from the one found in a PPU 22.
  • the intermediate outputs iResVal (result valid) 358 and iResult (the result) 360 are not based on a field value, but reflect the number of bytes inserted.
  • iResult 360 becomes the output Result 378 which can be used as an input to another PPU/PPUX/PMU/DFU. It can also be a complex expression that the user may want to program.
  • the Sequence Generation Process 310 with the optional Seqln input 362 has the same functionality as in the PPU 22.
  • the Modification Unit (MU) 306 inserts/modifies/removes data as specified by a user.
  • the MU 306 is specified at preprocessing time as one of an inserting type, modifying type, or removing type PMU.
  • the type of operations performed by the input signals ByteOffset 336, ByteValid 338, and ByteData 340 are shown in Table 4 below:
  • MapWrRd_n 342, MapAddr 344, and MapWrData 346, and the output MapRdData 348 provide a future programming interface for an external microprocessor to allow for the reading and writing from/to internal registers of the PMU 28 to, for example, dynamically program an MU to either insert, delete, or modify a packet at run time.
  • VaM 350, SOH_i 352, and EOH_i 354 are passed after a delay intact from their corresponding inputs to the MU 306 to the Output Alignment process 312.
  • the modified packet, represented as the intermediate input/output Data_i 356 is also presented to the Output Alignment process 312.
  • the Output Alignment process 312 has the same purpose and functionality as found in the PPU or PPUX, i.e., aligning all intermediate outputs iSeq 362, iResVal 358, iResult 360, VaIM 350, SOH_i 352, EOH_i 354 and Data_i 356 on either the start of packet (SOP) or the end of packet (EOP) to become the aligned outputs SeqOut 366, OutVal 368, SOHOut 370, EOHOut 372, DataOut 374, ResVal 376, Result 378, and TagOut 380.
  • the DFU 30 performs drop, queue, or forward operations based on input from 1 to N PPUs, PPUXs, PMUs, or other DFUs.
  • the DFU 30 includes a plurality of inputs and outputs 400-444. The function of each input and output 400-444, as well as the values each input or output can take on, are described with reference to Table 5 hereinbelow.
  • the DFU 30 includes sub-blocks Latch 445a-445n, Data Selection MUX 446, Result Generation process 448, and Output Alignment process 450.
  • the triangles within FIG. 7 are for blocking together intermediate outputs and do not themselves have inherent functionality. All sub-blocks are processes.
  • Latch 445a-445n latches the incoming results, data, and other output signals coming from 0 to N-1 PPUs/PPUXs/PMUs to be processed at a later time inside the DFU 30.
  • the Latch 445a-445n are necessary since each PPU/PPUX/PMU may present packet data at different times.
  • each Latch 445a-445n namely iRlnVal 459a-459n, iMln 460a-460n, iRln 462a- 462n, and iRlnSeq 464a-464n corresponding to the latched inputs RInVaI 406a- 406n, MIn 402a-402n, RIn 400a-400n, and RlnSeq 404a-404n, respectively, and representing together control/result signals from each PPU/PPUX/PMU, belong to groups, which are fed together to the Result Generation process MUX 448.
  • the Data Selection MUX 446 selects one of the sets of N-1 data groups and forwards the data group to the output group which includes iDValOut 466, iSOHOut 468, iEOHOut 470, and iDOut 472 as inputs to the Output Alignment Process 450.
  • the Result Generation Process 448 has a similar purpose to that found in the PPU/PPUX, namely, generating a result iRout 482 which depends on the evaluation of a programmable logical expression which may depend on the value of the inputs RIn[O - (N-1)] 400a-400n and/or Min [ 0 - (N-1 ) ] 402a-402n.
  • the evaluation of this complex logical expression can determine an output port to which the packet is to be routed, i.e., the pass along/queue outputs A and B, or the drop port D, represented as active high enabling intermediate outputs iROutAVal 476, iROutBVal 478, and iROutDVal 480.
  • These outputs are passed along to the Output Alignment Process 450, which has the same purpose and function as the PPU 22, PPUX 24, and PMU 28.
  • the intermediate outputs 466-482 become the DFU outputs DValOut 426, SOHOut 428, EOHOut 430, DOut 432, SeqOut 416, ROutAVal 408, ROutBVal 410, and ROutDVal 412. and Rout 414, respectively.
  • the output DOut 432 is routed to one of three output ports: DOutA 484, DOutB 486, or DOutD 488.
  • DOutA 484 and DOutB 486 can be used for normal output and DOutD 478 can be used for dropping a packet (not shown).
  • DOutD 488 can be used as a third routing output port.
  • the packet is either forwarded to a destination, or another chain of PPUs/PPUXs/PMUs, or sent to a queue of a traffic manager.
  • the design environment of the present invention can be connected to a set of internal PPU/PPUX/PMU/DFU registers and programmed through a microprocessor interface. The operations that the microprocessor would perform are reads and writes to/from the registers. Table 6 below shows a sample interface for a microprocessor manufactured by Freescale, Inc. (formerly Motorola):
  • Chip Select This active low signal enables the
  • control inputs of the PPUs or DFUs can be driven with fixed values (hardwired), from programmable registers, or from the outputs of other PPUs or DFUs.
  • Table 7 shows the options for control signal connections, with some typical examples of standard packet processing:
  • Each PPU/PPUX/PMU/DFU is configurable at synthesis time using the parameters shown in Table 8:
  • the packet processing algorithm relates to extracting the precedence field of an IP packet for a VLAN/Non-VLAN frame from a packet header 500 belonging to a packet 499.
  • Pseudo code which implements the two DFUs and the three PMUs of FIG. 8 can be found in Appendix H-L.
  • a top-level file for the example of FlG. 8, expressed in pseudo code, can be found in Appendix M.
  • the precedence field is used as the QID of the queue into which the packet is to be stored in a traffic manager.
  • the packet header 500 is fed to a Dataln input 502 of a PPU 504.
  • the operation to be performed is:
  • the Result output 506 of the PPU 504 is set to point to the location or offset in the packet header 500 of the IP address in a VLAN type frame, otherwise it points to the location in the packet header 500 of the IP address in a non-VLAN frame.
  • This IP address is fed to the Index input 508, along with the header 500 to a second PPU 510.
  • the most significant byte is checked and must be less than 224, signifying that the input IP address is valid. The operation to be performed is:
  • the ToS tells the application how a datagram should be used, e.g. delay, precedence, reliability, minimum cost, throughput etc. Depending on the value of the ToS field, one can change a priority assigned to a packet which is then sent to a traffic manager which processes the packet based on the set priority.
  • the IP precedence field is extracted from the header 500 with the following operation:
  • the IP precedence field is fed to the Din[0] input 524 of a second DFU 526.
  • the DFU 526 places the packet header on the DOutA output 528 of an AND gate 530 for queueing, and the precedence field is placed on the DOutB output 532 of an AND gate 534.
  • the precedence field functions as the Queue Identifier (QID) for the packet to be queued and both inputs 536, 538 are fed to a traffic manager 540.
  • the traffic manager 540 outputs the classified packet on output 542 and the QID on output 544.
  • packet processing blocks having other types of functionality can be provided, such as:
  • the programmer/designer can use a graphical design program such as OrCAD or Microsoft Visio to draw and interconnect sub-blocks with input windows for entering interconnecting expressions and entering program inputs.
  • a graphical design program such as OrCAD or Microsoft Visio to draw and interconnect sub-blocks with input windows for entering interconnecting expressions and entering program inputs.
  • the present invention has several advantages over prior art packet processing products.
  • the present invention can be used to produce an inexpensive piece of digital hardware, while the prior art products are limited to programs running on a microprocessor.
  • the present invention is scalable to handle simple to complex classification tasks, and software modules can be connected and configured in a variety of ways.
  • DataWidth OEQ (DataWidth) ;
  • MaxHdrWords OEQ MaxHdrWords
  • QualifierWidth OEQ (QualifierWidth) ;
  • ResultWidth OEQ (ResultWidth) ;
  • FieldWidth OEQ FieldWidth
  • LookupAvaliable OEQ (LookupAvaliable) ;
  • LookupDepth OEQ (LookupDepth) ;
  • Log2LookupDepth ®L2 (LookupDepth) ;
  • SobBits OEQ SObBitS
  • WordNo[i] (Index [IndexWidth-1 : Log2DataWidth] +i)
  • Startlndex [i] 0,- ENDLOOP signal iOutVal ; signal iOutMatch; signal QEnable,- signal [QualifierWidth-1 : 0] QualEnbCondition;
  • ResVal (iSOHOut [SobBits-1] & iValOut) ;
  • PROCESS (on rising edge of CIk) : OUTPUT_ALINGEMENT_PROCESS IF Rst is high assign reset values to iEnbOut assign reset values to OutState assign reset values to Match ELSE
  • PROCESS (on rising edge of CIk) : OUTPUT_ALINGEMENT_PROCESS IF Rst is high assign reset values to iResVal assign reset values to Match assign reset values to Sellndex ELSE
  • DataWidth OEQ(DataWidth) ;
  • InResultWidth OEQ (InResultWidth) ;
  • Log2NumPPUConnects ®L ⁇ 2 (NumPPUConnects) ;
  • DFUID OEQ(DFUID);
  • SopBits OEQ (SopBits) ,-
  • EopBitS OEQ (EopBits) ;
  • PROCESS (on rising edge of CIk) : DATA_SELECTION_MUX_PROCESS IF Rst is high assign initial value to ROutAVal assign initial value to ROutBVal assign initial value to ROutDVal assign initial value to DValOut assign initial value to SOHOut 54 assign initial value to EOHOut assign initial value to DOut assign initial value to StartOutputData ELSE
  • PROCESS (on rising edge of CIk) : RESULT_GENERATION_PROCESS IF Rst is high assign initial value to ROut assign initial value to SOut assign initial value to OutOfSeqErr 55 assign initial value to ⁇ equenceCheck ELSE
  • IF Result output delay is more reclock to align with the result outputs ROutAVal reclock to align with the result outputs ROutBVal reclock to align with the result outputs ROutDVal reclock to align with the result outputs DValOut reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DOut ENDIF IF
  • Data output delay is more reclock to align with the result outputs ROut reclock to align with the result outputs SOut ENDIF ENDIF ENDPROCESS // register map
  • DataWidth ®EQ (DataWidth) ;
  • ResultWidth ®L2 (MaxHdrWords* (DataWidth/8) );
  • FieldWidth @EQ (MaxHdrWord ⁇ *DataWidth) ;
  • FieldValidWidth @EQ (MaxHdrWords*DataWidth/8) ;
  • TAGBits @EQ (TAGBits) ,-
  • SobBits OEQ (SobBits) ;
  • EobBits OEQ (EobBits) ;
  • HdrWord [j] HdrData [ ( (i+1) *DataWidth) -1 : (i*DataWidth) ] ;
  • HdrWordMod[j] HdrByteVal [ ( (i+l) * (DataWidth/8) ) -1 :
  • HdrState [i] i; ENDLOOP enum OpCodesType ⁇ Opr_Ins, Opr_Mod, Opr_Rem ⁇ ; OpCodeaType OpCode,- PROCESS (on rising edge of CIk) IF Rst is high assign reset values to HdrExtState assign reset values to InVal_i assign reset values to lnSop_i assign reset values to InSop_l assign reset values to InEop_i assign reset values to InDat_i assign reset values to Result assign reset values to CurPktlndex
  • InVal_i InVal_r [Adjusted OFFSET based on no of bytes inserted] ,
  • InDat_i InDat_r [Adjusted OFFSET based on no of bytes inserted] ;
  • IF Result output delay is more reclock to align with the result outputs OutVal reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DataOut ENDIF IF Data output delay is more reclock to align with the result outputs ResVal reclock to align with the result outputs Result ENDIF ENDIF ENDPROCESS //
  • PROCESS (on rising edge of CIk) : REGISTER_MAP IF Rst is high assign reset values to MapRdData assign reset values to Opcode (i.e. Opcode present) assign reset values to SPsignal_O assign reset values to SPsignal_l assign reset values to SPsignal_2 61 assign reset values to ⁇ Psignal_3 ELSE
  • This f i le contains generic hardware look -up uni t (header extraction block) (HLU)
  • SobBita ⁇ EQ(SobBits) ,-
  • MaxHdrWords ⁇ EQ(MaxHdrWord ⁇ ) ;
  • MaxHdrWords is not equal to MaxHdrWords increment HdrWordCntr by one ENDIF
  • LookupAvaliable @EQ (LookupAvaliable) ;
  • conditionN //else_if_generate (conditionN) - optional code needed to be generated if conditionN is true
  • PROCESS (on rising edge of CIk) : VALUE_LATCH_PROCESS IF Rst is high assign initial value to iRInVal assign initial value to iMIn
  • PROCESS (on rising edge of CIk) : DATA_SELECTION_MUX_PROCESS IF Rst is high assign initial value to ROutAVal assign initial value to ROutBVal assign initial value to ROutDVal assign initial value to DValOut assign initial value to SOHOut assign initial value to EOHOut assign initial value to DOut assign initial value to StartOutputData
  • EOHOut EOHIn [LastArrivalData] ;
  • PROCESS (on rising edge of CIk) : RESULT_GENERATION_PROCESS IF Rst is high assign initial value to ROut assign initial value to SOut assign initial value to OutOfSeqErr assign initial value to SequenceCheck
  • IF Result output delay is more reclock to align with the result outputs ROutAVal reclock to align with the result outputs ROutBVal reclock to align with the result outputs ROutDVal reclock to align with the result outputs DVaIOut reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DOut ENDIF IF Data output delay is more reclock to align with the result outputs ROut reclock to align with the result outputs SOut ENDIF ENDIF ENDPROCESS
  • PROCESS (on rising edge of CIk) : DATA_SELECTION_MUX_PROCESS IF Rst is high assign initial value to ROutAVal assign initial value to ROutBVal assign initial value to ROutDVal assign initial value to DValOut assign initial value to SOHOut assign initial value to EOHOut assign initial value to DOut assign initial value to StartOutputData
  • PROCESS (on rising edge of CIk) : RESULT_GENERATION_PROCESS IF Rst is high assign initial value to ROut assign initial value to SOut assign initial value to OutOfSeqErr assign initial value to SequenceCheck
  • IF Result output delay is more reclock to align with the result outputs ROutAVaI reclock to align with the result outputs ROutBVal reclock to align with the result outputs ROutDVal reclock to align with the result outputs DValOut reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DOut ENDIF IF Data output delay is more reclock to align with the result outputs ROut reclock to align with the result outputs SOut ENDIF ENDIF ENDPROCESS
  • IndexWidth LLoo ⁇ g22DDaattaWidth+Log2MaxHdrWords ;
  • MapWrRd_n 85 input [3:0] MapAddr, input [31:0] MapWrData,- output [31:0] MapRdData; // Result values output Match; output ResVal ; output [Re ⁇ ultWidth-1 :0] Result; output [7:0] SeqOut , ) ,
  • WordNo[i] (Index [IndexWidth-1 :Log2DataWidth] +i) ;
  • PROCESS (on rising edge of CIk) : OUTPUT_ALINGEMENT_PROCESS IF Rst is high assign reset values to iResVal assign reset values to Match assign reset values to Sellndex ELSE
  • PROCESS (on rising edge of CIk) : SEQUENCE GENERATION IF R ⁇ t is high assign reset values to SeqOut ELSE
  • PROCESS (on rising edge of CIk) IF R ⁇ t is high assign reset values to OutVal assign reset values to OutMatch assign reset values to LklnStart assign reset values to LklnProgress assign reset values to LnkProcessState ELSE

Abstract

A system and method for allowing a user to create instructions for building a packet processing integrated circuit The system includes a user interface for allowing a user to define a desired packet processing algorithm (4) using a plurality of discrete packet processing blocks (22, 24, 28, 30), each of the blocks corresponding to a portion of the desired packet processing algorithm (4) The system allows the user to define connections (10) between the plurality of packet processing blocks (22, 24, 28, 30) The system processes a plurality of packet processing blocks (22, 24, 28, 30) and the connections to provide a list of instructions in a hardware description language for producing an integrated circuit capable of executing the desired packet processing algorithm (19).

Description

SYSTEM AND METHOD FOR DESIGNING AND IMPLEMENTING PACKET
PROCESSING PRODUCTS
SPECIFICATION BACKGROUND OF THE INVENTION
FIELD OF TH E INVENTION
The present invention relates to digital component design and implementation systems and, more particularly, to a system and method for designing and implementing packet processing products.
RELATED ART
Computer-based communications are dominated by the transmission of packets of data. Typically, a packet contains a payload, i.e., a portion of an overall data message, surrounded by a number of header bits or bytes, that are used to insure that the payload is transmitted and received without error. The header bits or bytes can be divided into a number of fields designating commands, responses, packet characteristics, etc. The fields can take on one or more values depending on the particular protocol used. Some protocols are custom-designed, while others, such as asynchronous transfer mode (ATM) or Transmission Control Protocol/Internet Protocol (TCP/IP), are standardized. For any type of protocol, there is a need to extract and examine the header bits or bytes to make decisions as to how to classify a type of packet, where to route the packet, and whether to drop or temporarily store (queue) the packet for future processing. The header must be parsed, bits or bytes examined or processed, and then routing decisions must be made.
Various hardware and software products have, in the past, been developed for designing and implementing products for processing and classifying data packets. In one approach, parsing, decision, and routing functions are implemented in software modules executed by the host processor and memory of the receiving computer. Processing large amounts of data in real time is often slow, since doing so puts a strain on processor resources. A second approach is to use a specialized microprocessor and associated hardware, called a network processing unit (NPU). The NPU provides a programmable interface for programming nearly any type of protocol functionality. However, the ability to program nearly every aspect of a transmission packet protocol burdens an NPU with a large amount of functionality, rendering an NPU both expensive and slow (low data rates). Also, the time needed for a developer to program an NPU may take several hours to days, which can be cost prohibitive. Another approach is to design a customized application specific integrated circuit (ASIC). This approach often wastes large numbers of gates to achieve only limited functionality, and is thus not cost effective. As such, there is a lack of an adequate system or methodology for designing and implementing packet parsing and classification products, wherein such products can be designed and implemented.
Accordingly, what would be desirable, but has not yet been provided, is a system and method for designing and implementing packet processing products which addresses the foregoing limitations.
SUMMARY OF THE INVENTION
The present invention relates to a system and method for designing and implementing packet processing products, wherein a user can create instructions for building a packet processing integrated circuit. The system includes a user interface for allowing a user to define a desired packet processing algorithm by defining a plurality of discrete, packet processing blocks, each of the blocks corresponding to a portion of the desired packet processing algorithm, as well as connections between the plurality of packet processing blocks. The system processes the plurality of packet processing blocks and the connections to provide a list of instructions in a hardware description language for producing an integrated circuit capable of executing the desired packet processing algorithm. The list of instructions can be delivered to a customer, or the customer can be provided with an integrated circuit constructed using the list of instructions. The customer can also be provided with a NETLIST generated using said list of instructions.
The packet processing blocks of the present invention include a Packet Processing Unit (PPU)1 a Packet Modification Unit (PMU), and a Decision and Forwarding Unit (DFU). The PPU includes functionality for extracting a header of a packet; for pointing to a portion of the header of a predetermined width using a predetermined index of a bit location in the header; for comparing the data represented by the portion of the header with at least one predetermined value; and for declaring a match when the result of the comparison is true. A variation of a PPU, called a PPUX, includes functionality for accessing an external Content- Addressable Memory (CAM) or Random-Access Memory (RAM). The PMU includes functionality for extracting a packet; pointing to a portion of the packet of a predetermined width using a predetermined index of a bit location in the packet; and modifying the portion of the packet. A packet can be modified in one of three ways: deletion, insertion, or overwriting a portion of the packet. The DFU can perform one of drop, queue, and forwarding operations on packets coming from at least one PPU, PPUX, or PMU. The PPU, PPUX, PMU, and DFU can be programmed by an external microprocessor. Further features and advantages of the invention will appear more clearly on a reading of the detailed description of an exemplary embodiment of the invention, which is given below by way of example only with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention, reference is made to the following detailed description of an exemplary embodiment considered in conjunction with the accompanying drawings, in which:
FIG. 1 is a flowchart showing a process according to the present invention for designing a packet processing product;
FIG. 2A is a screen shot of a window in a graphical user interface (GUI) according to the present invention for choosing a type of packet processing block to be configured;
FIG. 2B is a screen shot of a window in a graphical user interface (GUI) for selecting configuration parameters for generating a Packet Processing Unit (PPU) of the present invention;
FIG. 2C is a screen shot of a window in a graphical user interface (GUI) for selecting configuration parameters for generating a Packet Modification Unit (PMU) of the present invention;
FIG. 2D is a screen shot of a window in a graphical user interface (GUI) according to the present invention for selecting configuration parameters for generating a Decision and Forwarding Unit (DFU) of the present invention;
FIG. 3 is a block diagram of a plurality of packet processing blocks according to the present invention for designing a packet processing product;
FIG. 4 is a block diagram showing, in greater detail, a Packet Parsing Unit (PPU) of the present invention;
FIG. 5 is a block diagram showing, in greater detail, a Packet Parsing Unit with an external interface to a CAM/RAM (PPUX) of the present invention;
FIG. 6 is a block diagram showing, in greater detail, a Packet Modification Unit (PMU) of the present invention;
FIG. 7 is a block diagram showing, in greater detail, the Decision and Forwarding Unit (DFU) of the present invention; and
FIG. 8 is a block diagram showing a sample packet processor design for determining the queuing precedence of a VLAN/non-VLAN frame. DETAILED DESCRIPTION OF THE INVENTION
Referring to FIG. 1, a process according to the present invention for designing packet processing products is shown. The present invention allows a user to design packet processing products using a high-level programming language which generates a NETLIST for generating a hardware design specification of a digital circuit. A NETLIST describes the connectivity of an electronic design. The design process begins at step 1 , wherein a set of user requirements and specifications are received, which may be in the form of a packet parsing architecture or a packet parsing and classification algorithm. Typically, these requirements are in the form of a text description of the system to be generated. At step 2, the description is translated by the user or provider into a textual or graphical design using packet processing blocks which include Packet Parsing Units (PPU), Packet Parsing Units with an external interface to a CAM/RAM (PPUX)1 Packet Modification Units (PMU), and Decision and Forwarding Units (DFU), which will be described hereinbelow with reference to FIGS. 3-7.
As an example of step 2, if the customer needs a firewall that accepts TCP packets and rejects UDP packets, then three PPUs and one DFU are required. One of the PPUs is devoted to determining a source IP address; a second PPU is devoted to extracting a destination IP address; and a third PPU is devoted to distinguishing between TCP and UDP packets. The three PPUs are connected in parallel (since the information can be extracted simultaneously from the same packet), and the "match" outputs of the PPUs (to be described with reference to FIG. 4) and a source packet is forwarded to a DFU. Once the source and destination addresses are extracted from the packet and the type of packet is extracted, the DFU takes each match input and the packet and makes a decision: If the packet is a TCP packet and the source and destination addresses are allowed, then the packet is passed on, otherwise the packet is to be dropped. Thus, in step 2, the user can select the required number and combination of packet processing blocks to be used in the design. At step 3, the packet processing block requirements, including their required inputs and outputs, are entered into a connection document, which can be a text based EXCEL™ spreadsheet or a VISIO™ block diagram. Typical inputs to the connection document include entries for each PPU and DFU block, which may include an index representing the point of entry into a packet to be processed, and whether a lookup in an internal table of data in a PPU is required.
Once the connection document has been completed, then, at step 4, packet processing blocks, e.g., each PPU and DFU, can be configured. Configuring a packet processing block involves taking a "default" packet processing block file, such as a generic PPU or DFU file, and modifying portions of it and setting variables within each file. Code for the packet processing blocks to be described in FIGS. 4-7 (written in pseudo-code) can be found in Appendices A-E and G-L attached hereto. In particular, the pseudo-code for the PPU calls code found in the following appendices: a file for describing a generic header extraction block called a Hardware Lookup Unit (HLU) (see Appendices D and K), and a file for describing a generic Match/Lookup Unit (MLU) (see Appendices E and L). Both the HLU and MLU will be described hereinbelow as part of the description of the PPU. The packet processing blocks are implemented in a hardware design language (HDL) which models digital circuits, with gates, flip flops, counters, and other logic in a C- like software language. In some implementations, the "pruning" process can be performed by manually copying and editing a maximally configured processing block file, or by applying a preprocessor in the form of shell scripts to cull code from and substitute variables within a maximally configured processing block files. Preprocessing shell scripts, as is known in the art, can include textual or graphically-based user prompts for answering questions about specific parameters desired by the user for a particular block.
FIGS. 2A-2D show one possible example of graphical user interface (GUI) which can be used to enter parameters for packet processing blocks. A Main generation GUI window 13 is presented to the user, as shown in FIG. 2A. One of a number of radio buttons 13 is selected by the user to indicate the type of processing block to be configured. Depending on the processing block chosen, a configuration window 15 is displayed, one for each type of processing block (i.e., PPU/PPUX (see FIG. 2B); PMU (see FIG. 2C); and DFU (see FIG. 2D)). Each configuration window 15 contains a field 16 for naming the processing block. A series of configuration screen elements 17 are presented to the user for allowing parameters of each processing block to be specified by the user (including, e.g., data bus width, start of packet width, end of packet width, maximum header words, qualifier width, result width, result .expression, external memory parameters, number of interfaces, etc.), and which may vary according each type of processing block. Finally, the user can click on either a "Generate" button 18 to cause the particular processing block code to be generated, or a "Cancel" button 19.
The GUI code can pass the input parameters to a preprocessor, such as a preprocessor called "veriloop2." The pseudo-code for veriloop2 can be found in Appendix F. Veriloop2 first performs substitutions into appropriate variables using the parameters passed from the GUI. Veriloop2 then searches for constructs such as name-value pairs, conditional constructs, and loops having a particular syntax, and then culls the maximally configured packet processing block file to produce a preprocessed header-like library files, each containing a function or class representing a particular PPU, DFU, etc. Pseudo code for types of preprocessor constructs can be found in Appendix G. Pseudo code for sample pre-processed files of FIG. 8 can be found in Appendices H-L. Note that there is only one PPU/MLU/HLU file for all three PPUs, which share the same number of inputs/outputs and share the same general structure. The number of PPUs that need to be generated depends upon the degree of parallelism needed for a particular design. If all the operations for a number of PPUs can be performed in series, then one PPU is needed, since all that changes between instances of PPUs is the input parameters (e.g. opcode, mask, etc.). There is one generated PPU for each parallel operation. There are separate DFU Appendices (i.e., Appendices B, H1 and I because each DFU can have a different number of inputs/ outputs). The present invention distils the implementation of maximally configured processing blocks into common sub-blocks which have unique names (e.g., PPlM , DFU_2) or modules which have inputs and outputs that can be interconnected in such a way as to perform all of the functions necessary for implementing a desired packet processing product. The common blocks described herein are preferably instantiations of packet processing blocks written in VHDL1 Verilog, or System C, but other suitable hardware description languages can be used. The software implementation of packet processing blocks is platform independent, and can be written in a platform independent language such as JAVA. As such, packet parser/classifier functionality of the present invention can run both in Windows and in different versions of the Unix operating system, as well as others. In a GUI, the programmer/designer can invoke instances of these common modules using a C-like application programming interface (API) surrounded by other C-like code for interconnecting the sub-blocks.
At step 5, integration is performed. Integration involves declaring instantiations of each processing block by name, and making connections between instantiated packet parsing blocks in a top-level main program file (the top-level main program file is similar to the file containing the main() function call in C language). These connections are called "wires" or "signals" which are declared like variables, and associations are made between two processing block instances which have a common wire. For example, signal "x" in PPU1 ties to signal "y" in the top level file. Signal "z" of DFU 1 also ties to signal "y" in the top level file. In this way, signal "x" of PPU1 is tied to Signal "z" of DFU1 which may also be tied to one or more other signals. Certain input parameters can also be "hard-coded" within the top-level file.
At this point, all source HDL code has been generated which together can constitute a fully designed product. At step 6, if the customer desires only the design, then at step 7, the generated packet processing block files and the top level file can be delivered to the customer. If the customer desires to have a NETLIST, then at step 8, the generated files are run through a commercially-available synthesis tool, as is known in the art. Sample synthesis tools include Design Compiler from Synopsis, Precision Synthesis from Mentor Graphics, Sinplify from Synplicity, or XST from Xilinx. The synthesis tool behaves like an optimizing compiler which produces a NETLIST for producing an electrical schematic for a custom integrated circuit which is implemented with a minimum number of logic gates, flip-flops, counters, etc. The type of NETLIST generated depends on whether the customer desires to have a foundry-specific device, e.g. a Xilinx FPGA or a generic ("virtual") NETLIST which is not specific to a particular vendor's product. Customers which are EDA (electronic design automation) vendors desire a non-specific NETLIST. The NETLIST could be a foundry-specific or "virtual" bitstream or binary file that is delivered to customer.
At step 9, if the customer does not desire to have a digital integrated circuit delivered to them, then at step 10, the NETLIST is delivered to the customer, otherwise, at step 11 , the NETLIST is run through a place and route program, which physically constructs the gates defined in the NETLIST on a silicon die and interconnects them. The choice of a place and route tool depends on whether the packet parser/classifier is to be implemented as an ASIC (fixed logic) or an FPGA (programmable logic). Sample place and route programs include Quartus Il from Altera and ISE from Xilinx. At step 12, the integrated circuit is delivered to the customer.
With reference to FIG. 3, a block diagram of a graphical design environment using packet processing blocks according to the present invention for designing a packet processing product, indicated generally at 20, is depicted. The blocks 20 can be implemented in a text-based or graphical design environment. The environment 20 includes combinations of any number of Packet Parsing Units (PPUs) 22, PPUXs 24 (which are PPUs that can access CAM/RAM memory 26), Packet Modification Units (PMUs) 28, and Decision and Forwarding Units (DFUs) 30. The PPUs 22, PPUXs 24, PMUs 28, and DFUs 30 can be connected by a designer in a variety of ways to create parsing/classification logic for any desired packet processing algorithm. The PPUs 22 operate on packet headers 21. The packet itself can be passed through the environment 20 intact. Alternatively, only the packet header 21 is passed through the environment, which requires the creation and passing of a pointer to the packet data to be output after the DFUs 30. The packets are stored in memory upon arrival and retrieved from memory upon departure. A copy of the header 21 and a pointer to the packet location is passed to the development environment 20. The length of the copied header 21 is variable. It starts at a programmable position in the header 21 and ends at the last field that must be processed. A PPU takes a header 21 and can seek, i.e., locate, any field of constant or variable length. Once the field is found in the header 21, the PPU 22 can perform a check on that field, such as whether the field is equal to or greater than a given value, or matches a particular value, and then output that value depending on the operation performed.
PPUXs 24 are PPUs that can perform lookups or searches using external random-access memories (RAMs) or CAMs (a CAM is defined as a RAM-like memory which can determine whether an input value is present in the memory device). A PMU 28 is a PPU which allows fields in the header of a packet or the packet itself to be modified by means of insertions, deletions, or substitution of bytes. In contrast, the PPUs 22 and PPUXs 24 only allow the fields of a packet header to be examined. Any number of PPUs 22, PPUXs 24, and PMUs 28 can be chained together in series or in parallel to implement complex expressions. The DFUs 30 combine the output of one or more PPUs 22 and/or PPUXs 24 and/or PMUs 28 using a programmable condition, and then forward the header to one of a plurality of outputs. The outputs can represent Boolean True and False values, and decisions as to whether to drop, forward, or queue the packet. The DFUs 30 make decisions to forward, drop, or enqueue packets based on the results from the. PPUs 22. For example, the output of the last DFU in the chain, such as the DFU labeled "A", can be a queue ID, i.e. of the queue implemented in an external traffic manager 31.
The traffic manager 31 is a device which performs a set of actions and operations for a network to guarantee the operability of the network. Traffic Management (TM) is exercised in the form of traffic control and flow control. In the context of the present invention, the traffic manager 31 operates on a packet stream once the classification & processing is done on a packet (i.e. once it passes from PPU/DFU blocks). For example, PPU/DFU blocks are used to figure out the priority number of a packet. The traffic manager is given that priority number and the packet to do a traffic control operation to guarantee that high priority packets pass before low priority packets.
With reference to FIG. 4, a block diagram of the PPU 22 is depicted. The PPU 22 performs basic parsing of the packet header 21 and may perform mathematical/logical operations on the parsed fields of packet header 21. The PPU 22 includes a plurality of inputs and outputs 32-83. The function of each input and output 32-83, as well as the values that each input or output handle, are described with reference to Table 1 hereinbelow.
Figure imgf000014_0001
Figure imgf000015_0001
Figure imgf000016_0001
TABLE 1
The terms in brackets in FIG. 4 accompanying a specific input or output represents the bit width of the input or output, in standard HDL syntax. For example, if the input Dataln 36 is to be 32 bits wide, then the variable DW is set to 32 such that Dataln 36 is expressed in an HDL file as "Dataln[DW-1 :0] = Dataln[32- 1 :0] = Dataln[31 :0]", where "31" represents the last bit and "0" represents the first bit.
The input CIk 32 is supplied from external hardware, such as the clock of a microprocessor. The Input Rst 34 is used to cause the PPU to go into a predefined state where most internal variables and outputs are set to an initial value. This condition is usually needed at power-up of the hardware in logic systems to stabilize the system before execution of a packet processing algorithm. The system is initially Reset. A predetermined amount of time later, when it is known that all circuits have stabilized, then the circuit is put into operation by toggling Rst 34.
The PPU 22 includes a Hardware Lookup Unit (HLU) 84, a Delay/FIFO module 86 containing an optional Delay Line 88 or a FIFO 90, a Match and Lookup Unit (MLU) 92, Result Generation (process) 94, Sequence Generation (process) 96, an Output Alignment (process) 98, interconnected as shown. The sub-blocks 84-98 are implemented as modules or processes. A module is similar to a class or subclass in an object-oriented language like C++, while a process is similar to a function. The PPU also contains (not shown) a predetermined but limited number of internal general-purpose registers for storing and retrieving values for comparisons, lookups, etc.
A stream of data is continuously presented to the input Dataln 36 of the HLU 84. No data of the input stream is stored in a memory. In such circumstances, it is the job of the HLU 84 to extract information from a packet and present that information to the other blocks of the PPU 22. The HLU 84 takes a snapshot of the data stream according to the location in the data stream specified by the inputs Index 56 and Width 58. The inputs SOHIn 38, EOHIn 40, and InVaI 42 allow for fine tuning of locating data from the output of other PPUs, PPUXs, PMUs, or external hardware. SOHIn 38, EOHIn 40, and InVaI 42 tell the PPU 22 how to delimit data a packet header. SOHIn 38 tells the hardware where packet starts and EOHIn 40 tells the hardware when a packet header ends. Once the packet starts, then at every clock cycle, the data presented at Dataln 36 is either valid or invalid, as indicated by the input InVaI 42. The extracted header bits are present as an output CompDat 100 and as an input to the MLU 92. CompDat 100 stands for the data that needs to be compared in the MLU 92.
The Delay/FIFO module 86 is used to synchronize the outputs of the PPU 22 to be presented to a subsequent block, such as a DFU. The Delay/FIFO module 86 is needed because the inputs to the PPU, such as Dataln 36, along with the control input signals SOHIn 38, EOHIn 40, and InVaI 42, need to be aligned in time in the Output Alignment process 98 with intermediate outputs of other sub-blocks of the PPU 22, such as the Match output 110 of the MLU 92, which may be delayed relative to the inputs due to delays in processing within the MLU 92. The MLU 92 performs its decision making (e.g., a comparison of a bit within Dataln 36 with a user specified parameter (Parami )) without full packet storage. Therefore, Dataln 36 along with the control input signals SOHIn 38, EOHIn 40, and InVaI 42 are pipelined to the Result Generation process 94 and the Output Alignment process 98 by way of intermediate I/O Val_i 102, SOH_i 104, EOH_i 106, and Data_i 108. There are fixed delays (measured in clock cycles) associated with processing in the in Result Generation process 94 and the MLU 92. There is a variable delay associated with the HLU 84 depending upon value of Index 56. The inputs described above must be delayed in the Output Alignment process 98 by the sum of the aforementioned individual delays. For example, if Index 56 is 8, then CompDat 100 is received at the MLU 92 eight clock cycles after Dataln 36 arrives at the PPU 22. If the MLU 92 processes CompDat 100 in three clock cycles, then the PPU 22 inputs need to be delayed by 8 + 3 clock cycles in the Output Alignment process 98. The choice of the optional Delay Line 88 or the FIFO 90 depends on the size of the delay needed. A FIFO always works but requires using scarce memory in the PPU 22. Thus, if only a few clock cycles worth of delay up to about 16 clock cycles are needed, then the Delay Line 88 is used, otherwise the FIFO 90 is used. The MLU 92 performs the bulk of the packet parsing and classification operation to be performed on one unit of a packet processing algorithm. The MLU 92 is programmable, i.e., it can compare the data/fields extracted in the HLU 84 with values stored in internal registers by means of the inputs Opcode 62, Parami 64, Param2 66, and Mask 68 and declares a match or no match which appears on the internal output Match 110, which, in turn, appears as an output of the Result Generation process 94. The inputs QualEnb 52 and QualCond 54 enable or disable the MLU 92 depending on certain conditions. The operation to be performed in the MLU 92 are enabled if the result of the check of the QualEnb 52 using the QualCond 54 is true. QualEnb 52 is a value stored in a qualEnb register (not shown) which is user programmable through an address map. The Qualifier Condition 44 can be: Always True, Equal, Less Than, Less Than or Equal, Greater Than, Greater Than or Equal, etc.
For example, if the user desires only to allow IPV6 packets, then QualEnb 52 can be programmed through the qualEnb register (not shown) to be the value 6. QualCond 54 is set to Equal To (EQ). The packet type is retrieved from a mode register from an external CPU. If the packet type is 6 (IPV6), then the MLU 92 is enabled; if the packet type is 4 (IPV4), then the MLU 92 is disabled, and no comparison takes place. If it is desired to have all types of IP packets, then QualCond 54 is set to Less Than or Equal (LE) or Always True.
The match/no-match functionality of the MLU 92 is performed on the portion of the Dataln 36 packet header pointed to by Index 56 and Width 58. Additional inputs Mask 60, Opcode input 62, Parami 64, and optionally Param2 66 are needed to perform the comparison/match/no-match operation. The MLU 92 performs a seek and operation function.
The seek function finds a data field in a packet header (not shown) based on an offset from the start of the packet header indicated by the input Index 56. If Index 56 is 0, then the first byte of the packet header is indicated. An Index 56 of six indicates the seventh byte from the beginning of the packet header. The interconnections that can be made to the Index input 56 include a fixed value (e.g. 4), a value stored in an internal user defined control register, or the result output 70 of another PPU, PMU, or DFU. If the Index input 56 is driven from another PPU, PMU, or DFU1 the value placed on the Index input 56 is variable, depending on the condition(s) evaluated in the previous PPU, PMU1 or DFU.
The operation function performs a check, an extraction, or a lookup on "Data Field", which is the contents of the packet header pointed to by the Index input 56 of width equal to the value in bits placed on the Width input 58. The general expression of the operation is
Op(Data_Field AND Mask, Parami, Param2)
The Data_Field may be filtered (AND'ed) with the Mask input 60. Op" is one of the opcodes placed on the Opcode input 62 given the Parami input 64, and optionally the Param2 input 66. The types of operations are shown in Table 2 below:
Figure imgf000020_0001
TABLE 2 For example, a single MLU can be programmed to check if an IP address less than 224.XX.XX.XX, by specifying the following values:
Opcode = LT
Parami = 224
Index = Points to IP DA or SA and can be adjusted automatically for VLAN tagging using a PPU.
As another example, to point to the beginning of an Ethernet frame payload for both untagged and VLAN tagged frames:
Index = 14 (Type/Length)
Opcode: EQ
Parami: 0x8100
QualCond = True
Match (True): Index = 20
Match (False): Index = 16
The inputs MapWrRd_n 76, MapAddr 78, and MapWrData 80, and the output MapRdData 81 are used as the interface between an external microprocessor and the internal registers of the PPU 22 to allow for reading of and writing to the registers.. The PPU 22, PPUX 24, PMU 28, and DFU 30 can contain a user defined number of internal registers for packet header manipulation either internally or via an external microprocessor. The opcodes LUP and SPCL can be used to directly manipulate data in internal registers.
The output Match 110 of the MLU 92 is fed to the input of the Result Generation process 94 to be described hereinbelow. The Match output 110 is True if the operation performed in the MLU 92 is True, or False otherwise. The Result Generation process 94 takes the Match output 110, the outputs of the Delay/FIFO module 86, and optionally a tag value present on TAG 83 and produces the result output iResult 112, which is fed as an input to the Output Alignment process 98 and ultimately is the output Result 70 of the PPU 22. The Result Generation process 94 also outputs iResVal 114, which indicates when iResult 112 is valid. This is needed as a handshaking device, since result generation can take more than a single clock cycle. iMatch 116 is the value of Match 110 passed along from the MLU 92. Assuming the MLU 92 was enabled, iResult 112 can take on two values corresponding to the True or False evaluation of the operation performed in the MLU 92. The True/False result values can be fixed or an arithmetic or logical function of any of the PPU 22 inputs. The iResult output 112 is later passed through the Output Alignment process 98 to be described hereinbelow as Result 70, which can be used to drive a DFU input or any input of another PPU or a PMU. Result 70 can also be a complex expression that the user may want to program. This allows the Index 56, QualEnb 52, Opcode 62, or Param<1,2> 64, 66 inputs of a PPU to be driven with different values depending on the Result 70 output of other PPUs.
The PPU 22 generates or forwards a sequence number using the Sequence Generation process 96. The sequence number can optionally come from an external process/hardware via the input Seqln 82 and passed along to a DFU; otherwise sequence numbers are internally generated within a PPU 22 using the Sequence Generation process 96. The sequence number, which appears as an internal output iSeq 118, is passed through the Output Alignment process 98 to a DFU through the PPU output SeqOut 74. Sequence numbers are incremented sequentially for each use of a PPU and are used for internal synchronization of all the inputs of a DFU. Sequence numbers are needed because different PPUs can present their output packet header data, match data, and results at different times. For example, one PPU may index at bit 0 of an incoming packet, in which case match output may appear at an input to a DFU after three clock cycles. If another PPU indexes on a VLAN type field, then index is set to block 5 or 6, which gives its results to the same DFU after 6 + 3 clock cycles. The DFU takes the matches packet headers, and sequence number from each of the PPUs and arranges them in correct sequence to be described hereinafter.
The Output Alignment process 98 aligns all outputs to the start of packet (SOP) or the end of packet (EOP). This is done in order to provide proper delineation of the output signals of one PPU to the next PPU/PPUX/PMU/DFU. For example, if PPU1 is connected to PPU2, and PPU1 operates either on an 802.3 Ethernet frame or an Ethernet type 2 frame, then PPU 1 examines a byte field which is either 20 bytes or 40 bytes from the beginning of a packet header. Therefore, all outputs of PPU 1 need to be aligned on SOP as a requirement for input to PPU2. As another example, some protocols use trailer insertion, e.g., inserting a checksum at the end of a packet. Therefore, outputs are aligned at EOP.
With reference to FIG. 5, a block diagram of a PPUX 24 is depicted. A PPUX 24 has the same I/O signals and sub-blocks as the PPU 22 except for additional I/O needed to access an external CAM/RAM 220. Elements illustrated in FIG. 5 which correspond to the elements described above in connection with the PPU 22 of FIG. 5 have been identified by corresponding reference numbers increased by one hundred. Unless otherwise indicated, both the PPU 22 and the PPUX 24 have the same construction and operation.
In a PPU, as mentioned earlier, there is a predetermined number of internal registers/memory which can be programmed by a user. A typical need for programmed memory is for performing a lookup of values by MLU 192. For example, if there is a need to compare Parami 164 to one hundred IP addresses, then internal memory is used. However, if the number of lookups and hence values to be stored in memory is on the order of thousands of bytes or more, then it may be necessary to store and retrieve these values to/from an external CAM/RAM 220.
Figure imgf000024_0001
TABLE 3
With reference to FIG. 6, a block diagram of a Packet Modification Units (PMU) 28 is depicted. A PMU allows for modification, i.e., insertion, deletion, or replacement, of bytes in a packet, including both the header and payload data. The PMU 28 includes a Delay/FIFO module 300 containing an optional Delay Line 302 or a FIFO 304, a Modification Unit (MU) 306, a Result Generation process 308, a Sequence Generation process 310, and an Output Alignment process 312, interconnected as shown. These sub-blocks 300-312 are implemented as software modules or processes.
The inputs InVaI 314, SOHIn 316, EOHIn 318, Dataln 320, Tagln 322, Rst 324, and CIk 326 have the same functionality as is found in the PPU 22 and the PPUX 24. The delay/FIFO module 300 can be used to synchronize the inputs InVaI 314, SOHIn 316, EOHIn 318, Dataln 320, and Tagln 322 with the outputs of the Result Generation Process 308 and the outputs of the Modification Unit (MU) 306 as is done in the PPU 22, but it also provides a second function: to delay incoming packet data by an amount equal to the number of bytes that may be inserted into a packet in the Modification Unit 306. This delay is not needed for removing or overwriting data in a packet. As with the PPU 22, the choice of the optional Delay Line 302 or the FIFO 304 depends on the size of the delay needed. If only a few clock cycles worth of delay (a few words to be inserted) are needed, then the Delay Line 302 is used, otherwise the FIFO 304 is used. As with the PPU 22, InVaI 314, SOHIn 316, EOHIn 318, and Dataln 320 are pipelined to the a Modification Unit (MU) 306 as the intermediate outputs VaI i 328, SOH_i 330, EOH_i 332, and Data_i 334.
Val_i 328 is also directed to the Result Generation Process 308. The Result Generation Process 308 has a different purpose from the one found in a PPU 22. The intermediate outputs iResVal (result valid) 358 and iResult (the result) 360 are not based on a field value, but reflect the number of bytes inserted. Like a PPU 22, iResult 360 becomes the output Result 378 which can be used as an input to another PPU/PPUX/PMU/DFU. It can also be a complex expression that the user may want to program. The Sequence Generation Process 310 with the optional Seqln input 362 has the same functionality as in the PPU 22.
The Modification Unit (MU) 306 inserts/modifies/removes data as specified by a user. The MU 306 is specified at preprocessing time as one of an inserting type, modifying type, or removing type PMU. The type of operations performed by the input signals ByteOffset 336, ByteValid 338, and ByteData 340 are shown in Table 4 below:
Figure imgf000025_0001
TABLE 4 The inputs MapWrRd_n 342, MapAddr 344, and MapWrData 346, and the output MapRdData 348 provide a future programming interface for an external microprocessor to allow for the reading and writing from/to internal registers of the PMU 28 to, for example, dynamically program an MU to either insert, delete, or modify a packet at run time. VaM 350, SOH_i 352, and EOH_i 354 are passed after a delay intact from their corresponding inputs to the MU 306 to the Output Alignment process 312. The modified packet, represented as the intermediate input/output Data_i 356 is also presented to the Output Alignment process 312. The Output Alignment process 312 has the same purpose and functionality as found in the PPU or PPUX, i.e., aligning all intermediate outputs iSeq 362, iResVal 358, iResult 360, VaIM 350, SOH_i 352, EOH_i 354 and Data_i 356 on either the start of packet (SOP) or the end of packet (EOP) to become the aligned outputs SeqOut 366, OutVal 368, SOHOut 370, EOHOut 372, DataOut 374, ResVal 376, Result 378, and TagOut 380.
With reference to FIG. 7, a block diagram of a Decision and Forwarding Unit (DFU) 30 is depicted. The DFU 30 performs drop, queue, or forward operations based on input from 1 to N PPUs, PPUXs, PMUs, or other DFUs. The DFU 30 includes a plurality of inputs and outputs 400-444. The function of each input and output 400-444, as well as the values each input or output can take on, are described with reference to Table 5 hereinbelow.
Figure imgf000027_0001
Figure imgf000028_0001
TABLE 5
Referring again to FIG. 7, the DFU 30 includes sub-blocks Latch 445a-445n, Data Selection MUX 446, Result Generation process 448, and Output Alignment process 450. The triangles within FIG. 7 are for blocking together intermediate outputs and do not themselves have inherent functionality. All sub-blocks are processes. Latch 445a-445n latches the incoming results, data, and other output signals coming from 0 to N-1 PPUs/PPUXs/PMUs to be processed at a later time inside the DFU 30. The Latch 445a-445n are necessary since each PPU/PPUX/PMU may present packet data at different times. Four signals from each Latch 445a-545n, namely iDValln 452a-552n, iSOH 454a-554n, iEOH 456a- 456n, and iData 458a-458n, corresponding to the latched inputs DVaIIn 418a- 418n, SOH 420a-420n, EOH 422a-422n, and Data 424a-424n, respectively, and representing together data signals from each PPU/PPUX/PMU, belong to groups, which are fed together to the Data Selection MUX 446. Likewise, four signals from each Latch 445a-445n, namely iRlnVal 459a-459n, iMln 460a-460n, iRln 462a- 462n, and iRlnSeq 464a-464n corresponding to the latched inputs RInVaI 406a- 406n, MIn 402a-402n, RIn 400a-400n, and RlnSeq 404a-404n, respectively, and representing together control/result signals from each PPU/PPUX/PMU, belong to groups, which are fed together to the Result Generation process MUX 448. The Data Selection MUX 446 selects one of the sets of N-1 data groups and forwards the data group to the output group which includes iDValOut 466, iSOHOut 468, iEOHOut 470, and iDOut 472 as inputs to the Output Alignment Process 450. The Result Generation Process 448 has a similar purpose to that found in the PPU/PPUX, namely, generating a result iRout 482 which depends on the evaluation of a programmable logical expression which may depend on the value of the inputs RIn[O - (N-1)] 400a-400n and/or Min [ 0 - (N-1 ) ] 402a-402n. In addition, the evaluation of this complex logical expression can determine an output port to which the packet is to be routed, i.e., the pass along/queue outputs A and B, or the drop port D, represented as active high enabling intermediate outputs iROutAVal 476, iROutBVal 478, and iROutDVal 480. These outputs are passed along to the Output Alignment Process 450, which has the same purpose and function as the PPU 22, PPUX 24, and PMU 28. The intermediate outputs 466-482 become the DFU outputs DValOut 426, SOHOut 428, EOHOut 430, DOut 432, SeqOut 416, ROutAVal 408, ROutBVal 410, and ROutDVal 412. and Rout 414, respectively.
With the addition of a group of external AND gates and control outputs ROutAVal 408, ROutBVal 410, and ROutDVal 412, the output DOut 432 is routed to one of three output ports: DOutA 484, DOutB 486, or DOutD 488. Typically, DOutA 484 and DOutB 486 can be used for normal output and DOutD 478 can be used for dropping a packet (not shown). Alternatively, DOutD 488 can be used as a third routing output port. For the normal ports DOutA 484 and DOutB 486, the packet is either forwarded to a destination, or another chain of PPUs/PPUXs/PMUs, or sent to a queue of a traffic manager. As an example of the operation of the Data Selection MUX 446 and Result Generation process 448, if the DFU 30 has two PPU inputs DIn[O] and DIn[I], and two match inputs Min[0] and Min[1], then the following conditions exist:
Output packet to Port DOutA if MIn[O] is True and Min[1] is True; Output packet to Port DOutB if MIn[O] is True and Miπ[1] is False; and Output packet to Port DOutD if MIn[O] is False and Min[1] is False. The design environment of the present invention can be connected to a set of internal PPU/PPUX/PMU/DFU registers and programmed through a microprocessor interface. The operations that the microprocessor would perform are reads and writes to/from the registers. Table 6 below shows a sample interface for a microprocessor manufactured by Freescale, Inc. (formerly Motorola):
UP CLK In Clock: This is the clock for the μP interface.
Chip Select: This active low signal enables the
UP CS In core to respond to microprocessor cycles.
UP RWn In Read/Write: Read (high) / Write (low) signal
Ready: Active low signal asserted by the core
UP READY Out to indicate the successful transfer of read or write data.
UP_A[15:0] In Address Bus: 16-bit address driven by the microprocessor to address the core registers.
UP_D[15:0] In/Out Data Bus: Bi-directional 16-bit data
Interrupt Request: Active low signal asserted
UP IRQ Out by the core to indicate that an event was detected.
TABLE 6 The possible types of interconnections between DFUs and PPUs are numerous. Depending on the application, the control inputs of the PPUs or DFUs can be driven with fixed values (hardwired), from programmable registers, or from the outputs of other PPUs or DFUs. Table 7 shows the options for control signal connections, with some typical examples of standard packet processing:
Figure imgf000031_0001
Figure imgf000032_0001
TABLE 7
Each PPU/PPUX/PMU/DFU is configurable at synthesis time using the parameters shown in Table 8:
Figure imgf000033_0001
TABLE 8
With reference to FIG. 8, a block diagram is depicted showing a sample packet processing algorithm design using the present invention. In this example, the packet processing algorithm relates to extracting the precedence field of an IP packet for a VLAN/Non-VLAN frame from a packet header 500 belonging to a packet 499. Pseudo code which implements the two DFUs and the three PMUs of FIG. 8 can be found in Appendix H-L. A top-level file for the example of FlG. 8, expressed in pseudo code, can be found in Appendix M. The precedence field is used as the QID of the queue into which the packet is to be stored in a traffic manager. The packet header 500 is fed to a Dataln input 502 of a PPU 504. The PPU 504 determines first whether the inputted packet header 500 belongs to a virtual LAN (VLAN) frame or a non-VLAN frame by pointing to byte 12 of the header (Index = 12) with a field width of 2 bytes. The operation to be performed is:
EQ(Data_Field(byte 12, width 2) AND Mask = OxFFFF, Parami = 0x8100, Param2 = 0)
If packet header 500 points to a VLAN frame, then the Result output 506 of the PPU 504 is set to point to the location or offset in the packet header 500 of the IP address in a VLAN type frame, otherwise it points to the location in the packet header 500 of the IP address in a non-VLAN frame. This IP address is fed to the Index input 508, along with the header 500 to a second PPU 510. In the PPU 510, the most significant byte is checked and must be less than 224, signifying that the input IP address is valid. The operation to be performed is:
GE(Data_Field(byte = MSB of IP address, width = 1 ) AND Mask = OxFF, Parami = 224, Param2 = 0)
The packet header 500 is then passed to the Din[0] input 512 of a DFU 514. If the DA field of the IP address is >= 224.0.0.0, then the packet is to be dropped by placing the header on the DOutD output 516 of an AND gate 518 connected to the DFU 514. Otherwise, the packet 499 is forwarded to a third PPU 520 with the Index input 522 of the PPU 520 pointing to the "type of service" field (ToS) in the header 500 based on whether the packet 499 belongs to a VLAN or non-VLAN frame. The ToS tells the application how a datagram should be used, e.g. delay, precedence, reliability, minimum cost, throughput etc. Depending on the value of the ToS field, one can change a priority assigned to a packet which is then sent to a traffic manager which processes the packet based on the set priority.
In the PPU 520, the IP precedence field is extracted from the header 500 with the following operation:
EXTR(Data_Field(byte = ToS field location, width = 1) AND Mask = OxFF, Parami = 2 (start), Param2 = 3 (len))
The IP precedence field is fed to the Din[0] input 524 of a second DFU 526. The DFU 526 places the packet header on the DOutA output 528 of an AND gate 530 for queueing, and the precedence field is placed on the DOutB output 532 of an AND gate 534. The precedence field functions as the Queue Identifier (QID) for the packet to be queued and both inputs 536, 538 are fed to a traffic manager 540. The traffic manager 540 outputs the classified packet on output 542 and the QID on output 544.
The present invention is subject to numerous variations and modifications. For example, the packet processing blocks having other types of functionality can be provided, such as:
• checksum or CRC generation and/or checking
• packet content modification/editing
• packet header removal
• packet header or trailer addition (e.g., for downstream processing)
• per flow rate control
As an alternative to a textual programming interface for implementing a given packet parser/classifier, the programmer/designer can use a graphical design program such as OrCAD or Microsoft Visio to draw and interconnect sub-blocks with input windows for entering interconnecting expressions and entering program inputs.
The present invention has several advantages over prior art packet processing products. The present invention can be used to produce an inexpensive piece of digital hardware, while the prior art products are limited to programs running on a microprocessor. The present invention is scalable to handle simple to complex classification tasks, and software modules can be connected and configured in a variety of ways.
It will be understood that the embodiment described herein is merely exemplary and that a person skilled in the art may make many variations and modifications without departing from the spirit and scope of the invention. All such variations and modifications are intended to be included within the scope of the present invention as defined in the appended claims.
APPENDIX A
//
// This file contains generic Packet Processing Unit (PPU) // -
Obj ect PPU
PARAMETERS (
// configuration constant
DataWidth = OEQ (DataWidth) ;
Log2DataWidth = ΘL2 (DataWidth) ;
MaxHdrWords OEQ (MaxHdrWords) ;
Log2MaxHdrWords = ΘL2 (MaxHdrWords) ;
QualifierWidth = OEQ (QualifierWidth) ;
ResultWidth = OEQ (ResultWidth) ;
FieldWidth OEQ (FieldWidth) ;
Log2FieldWidth = ©1,2 (FieldWidth) ,-
LookupAvaliable = OEQ (LookupAvaliable) ;
LookupDepth = OEQ (LookupDepth) ;
Log2LookupDepth = ®L2 (LookupDepth) ;
TAGBits OEQ (TAGBits) ;
SobBits OEQ (SObBitS) ;
EobBitS OEQ(EθbBits) ;
UseFIFOStorage OEQ (UseFIFOStorage) ,-
PPUID OEQ(PPUID) ;
IndexWidth Log2DataWidth+Log2MaxHdrWords ;
)
INTERFACES (
// Result output expression is ®Result_Exρression
// clock & reset input Rst; input CIk;
// input packet header input InVaI ; input [SobBits-l:0] SOHIn; input [EobBitS-l:0] EOHIn; input [DataWidth-l:O] Dataln,-
// output packet header output OutVal ,- output [SobBits-l : 0] SOHOut ; output CEobBits-l:0] EOHOut ; output [DataWidth-l:O] DataOut,-
// control signal input [QualifierWidth- 1 : 0] QualEnb; input (2:0] QualCond input [IndexWidth-1 : 0] Index; input [Log2FieldWidth: 0] Width; input [2:0] Opcode,- input [FieldWidth-l:O] Paraml ; input [FieldWidth-1: O]- Paratn2 ; input [FieldWidth-1: 0] Mask;
//if_generate (TAGBits !=0> input [TAGBits-l:0] TAG,-
//end_generate
// Micro-processor bus input MapWrRd_n; input [3:0] MapAddr ; input [31:0] MapWrData,- output [31:0] MapRdData;
// Result values output Match; output ResVal ; output [ResultWidth-l:0] Result;
//if_generate (GenSeq==0) input [7 : 0] Seqln;
//end_generate output [7 : 0] SeqOut ;
) ;
{ *****************************
// internal signal decleration //**************.************** signal ExtOutVal; signal [SobBits-1 : 0] ExtOutSob; signal [EobBits-l : 0] ExtOutEob; signal [DataWidth-l : 0] ExtOutDat; signal [FieldWidth- 1 : 0] ExtHdrValue ; signal [IndexWidth-l : 0] Adjustedlndex = Index + Width; signal ExtHdrValid; signal ExtHdrValid [NoOf Lookups- 1 : 0] ; signal [DataWidth-l:0] ExtHdrValue [NoOf Lookups -1 : 0] ; signal [Log2MaxHdrWords : 0] WordNo [NoOfLookups-1 : 0] ; signal {Log2DataWidth-l : 0] Startlndex [NθOfLθθkupS-1 : 0] ; signal [Log2DataWidth-l : 0] DataShif t; constant FieldWidthMultiple =
(FieldWidth+DataWidth-1) /DataWidth;
FOR i from 1 to (NoOfLookups-1) increment 1
WordNo[i] = (Index [IndexWidth-1 : Log2DataWidth] +i)
Startlndex [i] = 0,- ENDLOOP signal iOutVal ; signal iOutMatch; signal QEnable,- signal [QualifierWidth-1 : 0] QualEnbCondition;
signal LkWrEnb ; signal [Log2LookupDepth- 1 : 0] LkWrAddr; signal [FieldWidth-l:O] LkWrData ; signal [FieldWidth-l:O] LkWrRBData;
// Special purpose registers for the MLU Blocks for custom instruction signal [31:0] SPReg_0,- signal [31:0] SPReg_l signal [31:0] SPReg_2 signal [31:0] SPReg_3 signal Match; constant Idle = 0; constant Xfr = 1;
// TEMPLATE CLASS instantiation for HLU HLU (
OEQ(NoOfLookups) , SobBits, EobBits, DataWidth, Log2DataWidth,
MaxHdrWords , Log2MaxHdrWords, DataWidth )
HeaderExtract_NameInstance (
// reset and clock Rst, CIk, // decoding rules WordNo , Startlndex, // input data
InVaI, SOHIn, EOHIn, Dataln, // Decoded Header Values ExtHdrValid, ExtHdrValue, // output data
ExtOutVal, ExtOutSob, ExtOutEob, ExtOutDat ) ;
// data byte alingment
PROCESS (on change to ExtHdrValue)
//if_generate (Custom_Shift == 0)
Adjust ExtHdrValue with DataWidth, FieldWidth & AdjustedIndex[Log2DataWidth-l:0] to have\ the correct byte alingment at byte zero
//else_generate
Adjust ExtHdrValue with custom shift to have the correct byte alingment defined by the user
//end_generate ENDPROCESS
// extract qualifier enable PROCESS (on rising edge of CIk)
IF Rst is high assign inittal value to QEnable
ELSE 47 set QEnable to disabled state Zero to begin // get the case statement for enable CASE (QualCond)
3 'do : set QEnable to enable state One to Unconditional 3'dl : set QEnable to enable state One to if QualEnbCondition Equal to QualEnb
3 ' d2 : set QEnable to enable state One to if QualEnbCondition Greater than QualEnb
3'd3 : set QEnable to enable state One to if QualEnbCondition Greater than or Equal to QualEnb
3 ' d4 : set QEnable to enable state One to if QualEnbCondition Less than QualEnb
3'd5 : set QEnable to enable state One to if QualEnbCondition not Equal to QualEnb default : QEnable = 0; ENDCASE ENDIF ENDPROCESS
// MLU Template class instance MLU
( FieldWidth, Log2FieldWidth, LookupAvaliable, LookupDepth, Log2LookupDepth ) MLU_NameInstance (
// reset and clock
RSt, CIk,
// control signals
QEnable, Opcode, Width, Paraml , Param2 , LkWrEnb, LkWrAddr, LkWrData, LkWrRBData,
SPReg_0, SPReg_l , SPReg_2 , SPReg_3 ,
// input check values
ExtHdrValid, ExtHdrValue, Mask,
// results iOutVal, iOutMatch );
//
// I/O out signal
//if_generate (LookupAvaliable==l | | UseFIFOΞtorage == 1) if LookupAvaliable is avaliable or UseFIFOStorage option is enabled // internal signals signal iEnbOut; signal iValOut; signal [SobBits-l:0] iSOHOut; signal [EθbBitS-l:0] iEOHOut; signal [DataWidth- l : O] iDataOut ; signal OutState,-
// assign the fifo output to be the output OutVal = iValOut; SOHOut = iSOHOut; 48
EOHOut = iEOHOut;
DataOut = iDataOut;
Result = @Result_Expression;
ResVal = (iSOHOut [SobBits-1] & iValOut) ;
// Packet FIFO template class
PacketFifo
(
Log2MaxHdrWords , (DataWidth+SobBits+EobBits- 1 ) )
PktSCFi fo_NameInstance (
// wri te I/F
(Rst ) , (CIk) ,
(ExtOutVal) , (ExtθutEob[EobBits-l] ) ,
({ExtθutSob,ExtθutEob [EobBits-2 : 0] , ExtOutDat}) ,
// read I/F
(Rst) , (CIk) ,
(iEnbOut) , (iValOut) , (iEOHOut [EobBits-l] ) ,
( { iSOHOut , iEOHOut [EobBits-2 : 0] , iDataOut } ) ,
//flags
(open) , (open) , (open) ) ,-
// transfer from the fifo out to the user // after the result is declared
PROCESS (on rising edge of CIk) : OUTPUT_ALINGEMENT_PROCESS IF Rst is high assign reset values to iEnbOut assign reset values to OutState assign reset values to Match ELSE
// transfer the data when lookup is done CASE (OutState) Idle : turn off iEnbOut to the fifo // check if valid received form he lookup if iOutVal from MLU is received high give iEnbOut to the fifo iEnbOut = 1 ,- declare Match based on iOutMatch value set OutState to goto next stage Xfr ENDIF
Xfr : keep reading the fifo if Valid eop received from the FIFO go back to IDLE stage
ENDCASE ENDIF ENDPROCESS
ELSE // No Lookup Available //end_generate
// reclock signals 49 signal [Log2MaxHdrWords+2 : 0] Sellndex; signal ExtOutVal_ r [MaxHdrWords+2 : 0] signal [SobBits-1 : 0] ExtOutSob_ r [MaxHdrWords+2 : 0] signal [EobBits-1 : 0] ExtOutEob_ r [MaxHdrWords+2 : 0] signal [DataWidth-1:0] ExtOutDat r [MaxHdrWords+2 : 0] signal iResVal ,•
OutVal = ExtθutVal_r [Sellndex] ,- SOHOut = ExtOutSob_r [Sellndex] EOHOut = ExtOutEob_r [Sellndex] DataOut = ExtθutDat_r [Sellndex] ; Result = @Result_Expression,- ResVal = iResVal ; // on reclock will do for no lookup
PROCESS (on rising edge of CIk) : OUTPUT_ALINGEMENT_PROCESS IF Rst is high assign reset values to iResVal assign reset values to Match assign reset values to Sellndex ELSE
Reclock the signals ExtOutVal_r, ExtOutSob_r, ExtOutEob r & ExtOutDat_r\ to generate a MaxHdrWords+3 word pipline for future usage
// default set iResVal to Zero
IF valid SOHIn is recieved
// word lookup + 1 match clock
Calculate Sellndex based on WordNo of the last lookup ENDIF
// gen expression IF iOutVal is set to high declare iResVal to be valid set Match as iOutMatch ENDIF ENDIF ENDPROCESS
//if_generate (LookupAvaliable==l I UseFIFOStorage == 1)
ENDIF
//end_generate
//
// sequence out process
PROCESS (on rising edge of CIk) : SEQUENCE_GENERATION
IF Rst is high assign reset values to SeqOut
ELSE 50
//if_generate (GenSeq==O) IF valid SOHIn is recieved set SeqOut equal to Seqln ENDIF
//else_generate
// increment the sequence no if valid InSOH is detected IF valid SOHIn is recieved increment SeqOut by one ENDIF
//end_generate ENDIF ENDPROCESS //
// Register map //
// register map process
PROCESS (on rising edge of CIk) : REGISTER_MAP IF Rst is high assign reset values to MapRdData, QualEnbCondition, LkWrEnb, LkWrAddr, \
LkWrData, SPReg_0, SPReg_l, SPReg_2 , SPReg_3 , ELSE
// case address decoding IF user requested a write operation CASE (MapAddr)
// PPU ID NO Write
// Enable Condition
1: latch QualEnbCondition from MapWrData // map address
2: latch LkWrAddr from MapWrData // map data
3 : issue write to user look table using LkWrEnb latch LkWrData from MapWrData
4 : latch SPReg_0 from MapWrData 5: latch SPReg_l from MapWrData 6: latch SPReg_2 from MapWrData 7: latch SPReg_3 from MapWrData ENDCASE // if reading ELSE
CASE (MapAddr) // PPU ID read only 51
0: set MapRdData to PPUID // Enable Condition 1: set MapRdData to QualEnbCondition // map address // address needs to be set // for reading data // map data
3 : set MapRdData to LkWrRBData
4 : set MapRdData to SPReg_0 5: set MapRdData to SPReg_l 6: set MapRdData to SPReg_2 7: set MapRdData to SPReg_3 ENDCASE ENDIF
ENDIF ENDPROCESS
}
52
APPENDIX B
//
// This f ile contains decis ion forwarding unit (DFU)
//
//
Object DFU PARAMETERS (
// configuration constant
DataWidth = OEQ(DataWidth) ;
InResultWidth = OEQ (InResultWidth) ;
OutResultWidth = @EQ (OutResultWidth) ;
NumPPUConnects = ®EQ (NumPPUConnects) ;
Log2NumPPUConnects = ®Lι2 (NumPPUConnects) ;
DFUID = OEQ(DFUID);
SopBits = OEQ (SopBits) ,-
EopBitS = OEQ (EopBits) ;
)
INTERFACES (
// clock & reset input Rst; input CIk,-
// Micro-processor bus input MapWrRd_n; input [2:0] MapAddr; input [31:0] MapWrData,- output [31:0] MapRdData,-
// input buses input [NumPPUConnects -1 : 0] RInVaI; input [NumPPUConnects -1 : 0] MIn;
//for_generate ®i (NumPPUConnects) input [InResultWidth-1 :0] RIn_@EQ(i); input [7:0] RInSeq_@EQ (i ) ;
//end_generate // in packet header input [NumPPUConnects-1 : 0] DVaIIn,-
//for_generate @i (NumPPUConnects) input [SopBits-1 :0] SOHIn_@EQ (i) ; input [EopBits-1 :0] EOHIn_@EQ(i) ; input [DataWidth-l: 0] DIn_®EQ(i);
//end_generate // out result output ROutAVal ; output . ROutBVal ; output ROutDVal ; output [OutResultWidth-l:0] ROut; output [7:0] SOUt;
// out packet header common output DValOut; output [SopBits- 1 :0] SOHOut; output [EopBits- 1 :0] EOHOut; output [DataWidth-l: 0] DOut,- output OutOfSeqErr;
) ;
{
// internal signals 53 signal StartOutputData ; signal SequenceCheck ; signal [NumPPUConnects- 1 : 0] iRInVal; signal [NumPPUConnects- 1 : 0] iMIn; signal [InResultWidth-1 : 0] iRIn [NumPPUConnects-1 :0] signal [7:0] iRInSeq [NumPPUConnects-1 : 0]
signal [InResultWidth-1 : 0] RIn [NumPPUConnects-l^] ; signal [7:0] RInSeq [NumPPUConnects-1 :0] ; signal [SσpBits-1 : 0] SOHIn [NumPPUConnects-ltO] ; signal [EopBits-1 : 0] EOHIn [NumPPUConnects-l:0] ; signal [DataWidth-1 : 0] DIn [NumPPUConnects-1 :0] ; //for_generate ®i (NumPPUConnects) RIn [®EQ(i>] RIn_@EQ(i) ; RlnSeq[®EQ(i) ] RInSeq_ΘEQ(i) SOHIn [@EQ(i)] SOHIn_@EQ (i) ; EOHIn [OEQ (i)] EOHIn_@EQ(i) ; DIn [OEQ(D] DIn @EQ(i) ; //end_generate signal [NumPPUConnects- 1 : 0] Match = MIn OR iMIn,- signal [InResultWidth-1 : 0] Result [NumPPUConnects- 1 : 0] signal [7:0] Sequence [NumPPUConnects- 1 : 0] signal [Log2NumPPUConnects : 0] LastArrivalData ; signal [Log2NumPPUConnects : 0] LastArrivalResult ; FOR i from 1 to (NumPPUConnects-1) increment 1 Result [i] = (RInVaI [i] ) ? RIn [i] : iRIn[i]; Sequence [i] = (RInVaI [i] ) ? RInSeq[i] : iRInSeq [i]; ENDLOOP
PROCESS (on rising edge of CIk) : VAIΛJE_LATCH_PROCESS IF Rst is high assign initial value to iRInVal assign initial value to iMIn ELSE
// latch value for I/F
FOR i from 1 to (NumPPUConnects-1) increment 1 IF(RInVaIIi] == 1) iRInVal [i] <= RInVaI [i] ; iMIn[i] <= MIn [i] ; iRIn[i] <= RIn [i] ; iRInSeq [i] <= RInSeq [i] ; ENDIF ENDLOOP ENDIF
ENDPROCESS
// latch all the signals to start processing // process if all I/F have given valid data
PROCESS (on rising edge of CIk) : DATA_SELECTION_MUX_PROCESS IF Rst is high assign initial value to ROutAVal assign initial value to ROutBVal assign initial value to ROutDVal assign initial value to DValOut assign initial value to SOHOut 54 assign initial value to EOHOut assign initial value to DOut assign initial value to StartOutputData ELSE
//default assign default value to ROutAVal assign default value to ROutBVal assign default value to ROutDVal assign default value to DValOut assign default value to SOHOut assign default value to EOHOut
// if all I/F have given valids
IF (RInVaI OR iRInVal) expression is all ones
IF user defined @MATCH_Expression_A is true
ROutAVal <= 1; ELSE IF user defined @MATCH_Expression_B is true
ROutBVal <= 1; ELSE
ROutDVal <= 1; ENDIF
Clear all latched values on iRInVal, iMIn // start the transfer of data set StartOutputData & SequenceCheck to One Latch the port which gave the data last ENDIF
// start ouptuting data
IF (RInVaI are received by each connected PPU) OR ( StartOutputData is set )
IF (RInVaI OR iRInVal) expression is all ones set LastArrivalData to the channel which gave last RInVaI ENDIF // output data
DValOut DVaIIn [LastArrivalData] ; SOHOut SOHIn [LastArrivalData] ; EOHOut EOHIn [LastArrivalData] ; DOut DIn [LastArrivalData] ; endif
// clear the StartOutputData on EOP IF valid EOHOut is received reset StartOutputData to zero ENDIF ENDIF ENDPROCESS
// latch all the signals to start processing // process if all I/F have given valid data
PROCESS (on rising edge of CIk) : RESULT_GENERATION_PROCESS IF Rst is high assign initial value to ROut assign initial value to SOut assign initial value to OutOfSeqErr 55 assign initial value to ΞequenceCheck ELSE
//default assign default value to OutOfSeqErr assign default value to SequenceCheck // start ouptuting data IF (RInVaI are received by each connected PPU) OR ( StartOutputData is set )
IF (RInVaI OR iRInVal) expression is all ones set LastArrivalResult to the channel which gave last RInVaI ENDIF
IF (@MATCH_Expression_A> ROut ®Result_Expression_A; ELSE IF(@MATCH_Expression_B) ROut @Result_Expression_B; ELSE ROut ΘResult_Expression_D;
ENDIF
SOut = RInSeq [LastArrivalResult] ,- ENDIF
//OutOfSeqErr IF ( (SequenceCheck ACTIVE) AND
(ANY OF THE RECEIVED SEQUENCE NUMBER DO NOT MATCH) ) declare OutOfSeqErr to be active ENDIF ENDIF ENDPROCESS
// out alingment process
PROCESS (on rising edge of CIk) : OUTPUT_ALINGMENT_PROCESS_PR0CESS IF Rst is low
IF Result output delay is more reclock to align with the result outputs ROutAVal reclock to align with the result outputs ROutBVal reclock to align with the result outputs ROutDVal reclock to align with the result outputs DValOut reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DOut ENDIF IF Data output delay is more reclock to align with the result outputs ROut reclock to align with the result outputs SOut ENDIF ENDIF ENDPROCESS // register map
PROCESS (on rising edge of CIk) : REGISTER_MAP_PROCESS IF Rst is high assign initial value to MapRdData ELSE
// case address decoding
IF user requested a read (MapWrRd_n == 0) CASE (MapAddr) // PPU ID read only 0: MapRdData = DFUID; ENDCASE ENDIF 56
ENDIF ENDPROCESS
57
APPENDXX C
//
//
// This file contains generic packet modification unit block (PMU) //.
//
Object PMU PARAMETERS (
// configuration parameter DataWidth = ®EQ (DataWidth) ; Log2DataWidth = @L2 (DataWidth) ,• MaxHdrWords = OEQ (MaxHdrWords) ,- Log2MaxHdrWords = OL2 (MaxHdrWords) ; ResultWidth = ®L2 (MaxHdrWords* (DataWidth/8) ); FieldWidth = @EQ (MaxHdrWordε*DataWidth) ; FieldValidWidth = @EQ (MaxHdrWords*DataWidth/8) ; TAGBits = @EQ (TAGBits) ,-
SobBits = OEQ (SobBits) ;
EobBits = OEQ (EobBits) ;
MaxPktSize = 16; PIUID = OEQ(PIUID);
IndexWidth = Log2DataWidth+Lιog2MaxHdrWords; ) INTERFACES (
// clock & reset input Rst; input CIk;
// input packet header input InVaI; input [SobBits - 1 : 0] SOHIn ,- input [EobBits- 1 : 0] EOHIn ; input [DataWidth- 1 : 0] Dataln ,-
// output packet header output OutVal; output [SθbBitS-1 :0] SOHOut; output [EobBits-l :0] EOHOut; output [DataWidth-1: 0] DataOut ; input [MaxPktSize-1: 0] HdrByteOffset ; input [FieldValidWidth-1 :0] HdrByteVal ; input [FieldWidth-l:O] HdrData,-
//if_generate (TAGBits !=0) input [TAGBits-1 :0] TAGIn,-
Output [TAGBitS-1 :0] TAGOut;
//end_generate
// Micro-processor bus input MapWrRd_n,- input [3:0] MapAddr; input [31:0] MapWrData,- output [31:0] MapRdData;
// Result values output ResVal; output [ResultWidth-l:O] Result; //if_generate (GenSeq==0) input [7 : 0] Seqln; 58
//end_generate output [7:0] SeqOut ;
) ;
{
// internal signals signal [DataWidth-1 : 0] HdrWord [MaxHdrWords - 1 : 0] ; signal [ (DataWidth/8) -1 : 0] HdrWordMod [MaxHdrWords- 1 : 0] ; // full mod i.e. Zero == No bytes valid
FOR i from 0 to (MaxHdrWords-1) increment by One j = (MaxHdrWords-i-1) ;
HdrWord [j] = HdrData [ ( (i+1) *DataWidth) -1 : (i*DataWidth) ] ;
HdrWordMod[j] = HdrByteVal [ ( (i+l) * (DataWidth/8) ) -1 :
(i* (DataWidth/8) ) ] ,- ENDLOOP
111//111n/111//mI//1111if1111111111π11/π11nπ111/πI
Il internal signals
////////Il/IlIl11IlIII1111/1Ill/IllIIlIIlIIlI/III/fill/Ill signal [ (DataWidth/8 ) - 1 : 0] InVal_i ; signal [SobBitS - 1 : 0] InSop_i ; signal InSop_l ; signal [EobBits - l : 0] InEop_i ; signal [DataWidth- l : O ] InDat_i ; signal InVal_r [MaxHdrWords -1:0] ; signal [SobBits - l : 0] InSop_r [MaxHdrWords- 1 : 0] ; signal [EobBits - l : 0] InEop_r [MaxHdrWords- 1 : 0] ; signal [DataWidth- l : 0] InDat_r [MaxHdrWords -1:0] ; signal [Log2MaxHdrWords : 0] HdrExtState,- signal [MaxHdrWords : 0 ] HdrState; constant XfrData = MaxHdrWords; signal [EobBits-1 : 0] InEop_r_MaxHdrWords
InEop_r [MaxHdrWords-1] ; signal [®L2 (DataWidth/8 > :0] BytesInWord = 1<<@L2 (DataWidth/8) ; signal [MaxPktSize-1 : 0] CurPktIndex;
// Special purpose signalisters for the MLU Blocks for custom instruction signal [31:0] SPsignal_0; signal [31:0] SPsignal_l,- signal [31:0] SPsignal_2,- signal [31:0] SPsignal_3 ;
FOR i from 0 to (MaxHdrWords-1) increment by One
HdrState [i] = i; ENDLOOP enum OpCodesType { Opr_Ins, Opr_Mod, Opr_Rem }; OpCodeaType OpCode,- PROCESS (on rising edge of CIk) IF Rst is high assign reset values to HdrExtState assign reset values to InVal_i assign reset values to lnSop_i assign reset values to InSop_l assign reset values to InEop_i assign reset values to InDat_i assign reset values to Result assign reset values to CurPktlndex
//if_generate (TAGBits!=0) assign reset values to TAGOut 59
//end_generate ELSE
// counter for word positions
Reset, Increment & Clear CurPktlndex based on InSop. InEop & InVaI
IF InVaI is active CASE (Opcode)
Opr_Ins • // Insertion IF CurPktlndex < HdrByteOffset InVal_i = InVal; InDat_i = InDat ,
ELSE IF CurPktlndex == HdrByteOffset OR m range of the bytes to be inserted
InVal_i = HdrByteVal , InDat_i = HdrData; ELSE
InVal_i = InVal_r [Adjusted OFFSET based on no of bytes inserted] ,
InDat_i = InDat_r [Adjusted OFFSET based on no of bytes inserted] ;
ENDIF
Opr_Mod : // Modification IF CurPktlndex < HdrByteOffset InVal_i = InVal ; InDat_i = InDat ;
ELSE IF CurPktlndex == HdrByteOffset OR in range of the bytes to be modified
InVal_i = HdrByteVal, InDat_i = HdrData; ELSE lnVal_i = InVal ; InDat_i = InDat ; ENDIF
Opr_Retn . // Removal IF CurPktlndex < HdrByteOffset InVal_i = InVal ; lnDat_i = InDat ,
ELSE IF CurPktlndex == HdrByteOffset OR in range of the bytes to be deleted
InVal_i = 0, ELSE lnVal_i = InVal , InDat_i = InDat ; ENDIF ENDCASE ENDIF
ENDIF // clock ENDPROCESS // always // Pipe-Line Stage PROCESS (on rising edge of CIk) create piplme relcock for signals InVal_r create pipline relcock for signals InSop_r create pipline relcock for signals InEop_r create pipline relcock for signals InDat_r ENDPROCESS 60
OutVal = InVaI i,- SOHOut = InSop_i,- EOHOut = InEop_i,- Dataθut= InDat_i; ResVal = OutVal AND SOHOut ; // out alingment process
PROCESS (on rising edge of CIk) : OUTPUT_ALINGMENT_PROCESS_PROCESS IF Rst is low
IF Result output delay is more reclock to align with the result outputs OutVal reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DataOut ENDIF IF Data output delay is more reclock to align with the result outputs ResVal reclock to align with the result outputs Result ENDIF ENDIF ENDPROCESS //
* **********
// sequence out process
PROCESS (on rising edge of CIk) : SEQUENCE_GENERATION IF Rst is high assign reset values to SeqOut ELSE
//if_generate (GenSeq==0) IF valid SOHIn is recieved set SeqOut equal to Seqln ENDIF
//else_generate
// increment the sequence no if valid InSOH is detected IF valid SOHIn is recieved increment SeqOut by one ENDIF
//end_generate ENDIF ENDPROCESS
//
********************************************************** *********
// Register map
//
**************************************************************************
***********
// register map process
PROCESS (on rising edge of CIk) : REGISTER_MAP IF Rst is high assign reset values to MapRdData assign reset values to Opcode (i.e. Opcode present) assign reset values to SPsignal_O assign reset values to SPsignal_l assign reset values to SPsignal_2 61 assign reset values to ΞPsignal_3 ELSE
// case address decoding IF map write is requested MapWrRd_n==l CASE (MapAddr) // PIU ID NO Write 1: OpCode = MapWrData; 4: SPsignal_O = MapWrData,- 5: SPsignal_l = MapWrData; 6: SPsignal_2 = MapWrData; 7: SPsignal_3 = MapWrData; ENDCASE // if reading else
CASE (MapAddr) // PIU ID read only 0: MapRdData = PIUID; 4: MapRdData = SPsignal_0; 5: MapRdData = SPsignal_l,- 6: MapRdData = SPsignal_2; 7: MapRdData = SPsignal_3; ENDCASE ENDIF ENDIF ENDPROCESS
62
APPENDIX D / /
/ / This f i le contains generic hardware look -up uni t (header extraction block) (HLU)
/ /
/ /
Object HLU PARAMETERS (
NoOfLookups @EQ (NoOfLookups) ;
SobBita ©EQ(SobBits) ,-
EobBits ©EQ(EobBits) ;
DataBits ®EQ (DataBits) ;
Log2DataBits @L2 (DataBits) ;
MaxHdrWords ©EQ(MaxHdrWordε) ;
Log2MaxHdrWords @L2 (MaxHdrWords) ;
PartSize ©EQ(PartSize) ; ) INTERFACES (
// reset and clock input Rst; input CIk;
// decoding rules input [Log2MaxHdrWords : 0] WordNo [NoOfLookups -1:0] ; input [Log2DataBits-l : 0] Startlndex [NoOfLookups -1:0] ;
// input data input InVaI ; input [SobBitS-1 : 0] InSob; // Top bit is SOP input [EobBits-1 : 0] InEob; // Top bit is EOP input [DataBits-1 : 0] InDat ;
// Decoded Header Values output HdrValid [NoOfLookups -1:0] output [PartSize - 1 : 0] HdrValue [NoOfLookups -1:0]
// output data output OutVal ; output [SobBits - 1 : 0 ] OutSob, output [EobBits - 1 : 0 ] OutEob,- output [DataBits- 1 : 0] OutDat ;
) ;
{
// configuration parameter
//////////////////////////////////////////////////////////
// internal signals
////////////////////////////////////////////////////////// signal [Log2MaxHdrWords : 0] HdrWordCntr ;
PROCESS (on rising edge of CIk)
IF Rst is high assign initial value to HdrValid assign initial value to HdrValid assign initial value to HdrValue assign initial value to HdrWordCntr // reset 63
ELSE
// get header word counter ticking IF inVal is high
// EOP seen
IF InEob is high set HdrWordCntr to zero
ELSE IF MaxHdrWords is not equal to MaxHdrWords increment HdrWordCntr by one ENDIF
// header extraction loop for Ip = 0 to (NoθfLookups-1) increment Ip by one
//IF header word is reached and data is valid IF current word is valid header word
// issue valid word indication command declare HdrValid [Ip] valid // adjust the data to start index assign HdrValue tip] to (InDat LEFTSHIFT Startlndex tip] ) ENDIF ENDLOOP
ENDIF // clock ENDPROCESS // always // output assingment OutVal = InVal ; OutSob = InSob; OutEob = InEob; OutDat = InDat ; }
64
APPENDIX E
//
// ,
// This file contains generic Match/Lookup Unit (MLU) //
//
Object MLU PARAMETERS {
FieldWidth @EQ (FieldWidth) ;
Log2FieldWidth @L2 (FieldWidth) ;
LookupAvaliable = @EQ (LookupAvaliable) ;
LσokupDepth = @EQ (LookupDepth) ;
Log2LookupDepth = @L2 (LookupDepth) ;
NoParallelLookup @EQ(NoParallelLookup) ;
Log2NoParallelLookup ®L2 (NoParallelLookup) ;
) INTERFACES (
// reset and clock input Rst; input CIk;
// control signals input QEnable,- input [2:0] Opcode; input [Log2FieldWidth:0] Width; input [FieldWidth-l:0] Paraml ; input [FieldWidth-l:0] Param2 ; input LkWrEnb; input [Log2LookupDepth- 1:0] LkWrAddr; input [FieldWidth-l:0] LkWrData ; output [FieldWidth-l:0] LkWrRBData; input [31:0] SPReg_0 ; input [31:0] SPReg_l ; input [31:0] SPReg_2 ; input [31:0] SPReg_3 ;
// input check values input InVaI ; input [FieldWidth-l:0] InData; input [FieldWidth-1: 0] InMask;
// results output OutVal ; output OutMatch;
);
{
// internal block generateion signal [Log2LιθθkupDepth- 1 : 0] LkRdAddr ,-
//if_generate (NoParallelLookup < 2)I signal [FieldWidth-1 : 0] LkRdData,-
//else_generate signal [FieldWidth-1 : 0] LkRdData [NoParallelLookup-1 : 0]
//end_generate signal [FieldWidth-1 : 0] iLkWrRBData; signal LklnStart ; signal LklnProgress ; 65 signal LnkProcessState; constant Idle = 1'bO; constant Check = l'bl;
//if_generate (LookupAvaliable == 1)
//if_generate (NoParallelLookup < 2)
// TEMPLATE CIiASS instantiation for SramRIRWl
SramRIRWl
(
Log2LookupDepth, FieldWidth )
SramRlWl Namelnstance (
// Write I/F with Readback option CIk, LkWrEnb, LkWrAddr, LkWrData, LkWrRBData, // Read only I/F CIk, l'bl, LkRdAddr, LkRdData );
//else_generate
// if no of parallel I/F more than 1 signal [NoParallelLookup-1 : 0] LkWrEnb Θi = LkWrEnb & (LkWrAddr [Log2LookupDepth-l :Log2LookupDepth-Log2NoParallelLookup]
®EQ(i) ) ; signal [FieldWidth- 1 : 0] LkWrRBData [NoParallelLookup-1 : 0] ,- // enable and address upper bits are for the RAM FOR i from 1 to (NoParallelLookup-1) increment 1
LkWrEnb [i] = LkWrEnb & (LkWrAddr [Log2LookupDepth- 1 :Log2LookupDepth-Log2NoParallelLookup] == i ); ENDLOOP
//for_generate Θi (NoParallelLookup) SramRIRWl (
(Log2LookupDepth-Log2NoParallelLookup) , FieldWidth )
SramRlWl_NameInstance_@i (
// Write I/F with Readback option
CIk, LkWrEnb [@EQ (i )], LkWrAddr, LkWrData, LkWrRBData, // Read only I/F
CIk, l'bl, LkRdAddr, LkRdData [®EQ (i) ] ) ;
//end_generate //end_generate //end_generate
// adjust the comparision to the width user requested // The actual process PROCESS' (on rising edge of CIk) IF Rst is high assign reset values to OutVal assign reset values to OutMatch assign reset values to LklnStart assign reset values to LklnProgress assign reset values to LnkProcessState 66
ELSE
//default
Clear OutVal & LklnStart
// if the Match unit is enabled than try to do something IF QEnable is active high CASE (Opcode)
3 ' dO : // EQ: Equal to Paraml IF InVaI observed OutVal = 1; OutMatch = ((InData & InMask) == (Paraml &
InMask) ) ;
ENDIF
3'dl : // LT: Less Than Paraml IF InVaI observed OutVal = 1; OutMatch = ( (InData & InMask) < (Paraml &
InMask) ) ;
ENDIF
3 ' d2 : // LE: Less Than or Equal to Paraml IF InVaI observed
OutVal = 1;
OutMatch = ((InData & InMask) = (Paraml &
InMask) ) ;
ENDIF
3 ' d3 : // GT: Greater Than Paraml IF InVaI observed
OutVal = 1;
OutMatch = ( (InData & InMask) > (Paraml &
InMask) ) ;
ENDIF
3 ' d4 : // GE: Greater Than or Equal to Paraml IF InVaI observed OutVal = 1;
OutMatch = ((InData & InMask) >= (Paraml & InMask) ) ;
ENDIF
3 ' d5 : // RNG: Check if within range <Paraml, Param2> IF InVaI observed OutVal = 1;
OutMatch = ( ((InData & InMask) >= (Paraml & InMask)) && ((InData & InMask) = (Param2 £ InMask)));
ENDIF
3'd6 : // LUP: Look up IF InVaI observed
IF LookupAvaliable is active OutVal = 1; OutMatch = 1 ;
// start lookup if the module is enabled for look up
ELSE
LklnStart = 1; ENDIF end 3'd7: // EXTR: Extract 67
IF InVaI observed OutVaI = 1;
OutMatch = @EXTR_Expression; ENDIF default :
IF InVaI observed OutVaI = 1,- OutMatch = 1 ; ENDIF ENDCASE
//else jut return okay
ELSE IF InVaI observed
OutVal = 1;
OutMatch = 0;
ENDIF
//if_generate (LookupAvaliable == 1) // process a lookup query case (LnkProcessState) Idle:
// clear Look up in progress begin LklnProgress = 0; // start processing if(LkInStart == 1)
LkRdAddr = 0; LklnProgress = 1; LnkProcessState = Check; ENDIF Check:
// declare match and exit the search //if_generate (NoParallelLookup < 2) IF ((InData & InMask) == (LkRdData & InMask) ) //else_generate IF(
//for_generate @i (NoParallelLookup)
©IS (i==0) ? () : ( Il )( (InData & InMask) == ( LkRdData [@EQ (i) ] & InMask))
//end_generate
) //end_generate
OutVaI = 1;
OutMatch = 1; LklnProgress = 0 ; LnkProcessState = Idle; // if lookup limit reached
ELSE IF address limit reached without match OutVal = 1;
OutMatch = 0 ; LklnProgress = 0; LnkProcessState = IdIe,- // lookup address increment ELSE increment LkRdAddr by appropriate value based on NoParallelLookup
ENDIF ENDCASE 68
//end_generate
ENDIF
ENDPROCESS }
69
APPENDIX F Vβriloop2
// global variables declare line_count = 0, input_f ilename, output_f ilenaitie; declare map ParamNum; declare map ParamStr;
// ***************************************************** void error O
{
Print appropriate error message; exit ;
} int Log2 (int n)
{ int 1 ; for(l=l,n--; n>l; n=n>>l, 1++); return 1 ;
} int Expression (string e)
{ declare tmpstr ,- tmpstr = e; for each s variable in string
{
FIND the corresponding value s from ParamNum, ParamStr maps; REPLACE s by value of s in tmpstr;
}
// unix or cygwin environment is required for bash script CALL bash expression evaluator and pass it tmpstr; return result from bash expression evaluator,-
} string VeriLoop2 (FILE in_file)
{ declare buffsize = 1024, buffer [buffsize] , word [buffsize] , frame. Match; declare loop_count ; while END of input file is reached
{
GET LINE from file into buffer,-
GET first word from buffer into word,- if (word == "®L2 (" )
{
GET next word from buffer; Append Log2 (word) to frame; SKIP ")" from input file,-
} if (word — "@EQ(" )
{
GET next word from buffer; Append Expression (word) to frame; SKIP ")" from input file; 70
} if (word == "//if_generate" )
{ get Expression following "//if_generate" into buffer,- if (Expression (buffer, i)==l)
{ do {
GET LINE from file into buffer; frame = frame + buffer ; }while (buffer != "//end_generate" )
}
//Don't store generate this piece of code else
{ do {
GET LINE from file into buffer; if (word == "//elseif_generate" )
{ get Expression following "//if_generate " into buffer,- if (Expression (buffer, i)==l)
{ do {
GET LINE from file into buffer,- frame = frame + buffer ,- }while (buffer != " //end_generate" ) break; } }
// do not add to frame }while (buffer != n//end_generate" ) } } else if (word == "//let_generate" ) //let_generate ©VariableNamel (expression)
//let_generate ®VariableName2 "string"
{
// create new variable
GET name value pair from the following two variables;
Populate ParamNum, ParamStr using the name value pair;
} else if (word == "//for_generate" )
{
// do (expression) number of iterations of the code,
// replacing every occurence of ©index_char with iteration index
// [0; (expression) -1]
GET loop_count from the next word following "//for_generate" ;
GET buffer from lines between "//for_generate" and corresponding " //end_geπerate" ; for(i=0; i<loop_count ; i++)
{
REPLACE reference to loop variable in buffer with i ,-
APPEND buffer to frame; } 71
} else
{
// pass through the line to buffer APPEND buffer to frame;
}
}// while return frame ; }// veriloop2 int main ()
{ declare FILE out_file, in_file, ini_file,- declare buffer, key, i, k,-
Get input filename from global buffer into input_filename;
// open input file in_file = file_open (input_filename) ;
Get input filename from global buffer into output_filename ;
// open output file out_file = file_open (output_filename) ;
// read in parameters from input line for remaining parameter on the command line
{ if string consists of '=•
{
GET name value pair <PARAM>=<VALUE> ;
Populate ParamNum, ParamStr using the name value pair;
} else
{ error ( ) ;
} }
// process source file buf = VeriLoop2 (in_file) ; // output processed string to file output buf to file out_file; // close files file_close (in_file) ; file closelout file);
72
APPENDIX G Supported Constructs
1. Value replace construct specified by: @EQ(RValue) which is replaced by the value of the RValue
2. For generate construct specified by: //for_generate @var (loopvalue) code needed to be generated loopvalue times //end_generate
3. If generate construct specified by: //if_generate (conditioni) code needed to be generated if conditioni is true
//else_if_generate (conditionN) - optional code needed to be generated if conditionN is true
//else_generate - optional code needed to be generated if none of the above condition match
//end_generate
4. Is generate construct specified by: @IS(Condition)?(ValuelFTrue):(ValuelFFalse) which is replaced by ValuelFTrue if Condition is true else replaced by ValuelFFalse
73 APPENDIX H
// --
//
// This file contains decision forwarding unit (DFU_1)
//
//
Object DFU
PARAMETERS (
// configuration constant
DataWidth = 32;
InResultWidth = 8;
OutResultWidth = 8;
NumPPUConnects = l;
Log2NumPPUConnects = 1;
DFUID = 0;
SopBits = 1;
EopBitS = 4 ;
} INTERFACES (
// clock Sc reset input Rst; input CIk;
// Micro-processor bus input MapWrRd_n; input [2:0] MapAddr; input [31:0] MapWrData; output [31:0] MapRdData;
// input buses input [NumPPUConnects -1 : 0] RlnVal ; i npu t [NumP PUConne c t s - 1 : 0 ] MIn; input [InResultWidth- 1 :0] RIn_0 ; input [7:0] RInSeq_0;
// in packet header input [NumPPUConnects-1 : 0] DVaIIn; input [SopBits-1 : 0] SOHIn_0 ; input [EopBits-1 :0] EOHIn_0 ; input [DataWidth-l:0] DIn 0;
// out result output ROutAVal ; output ROutBVal; output ROutDVaI ; output [OutResultWidth-1: 0] ROUt ; output [7:0] SOut ;
// out packet header common output DValOut ; output [SopBits-1 : 0] SOHOut ; output [EopBits-1 : 0] EOHOut ; output [DataWidth-l:0] DOut ; 74
output OutOfSeqErr; ) ;
{ // internal signals signal StartOutputData ; signal SequenceCheck ; signal [NumPPUConnects-1 : 0] iRInVal ; signal [NumPPUConnects-1 : 0] iMIn,- signal [InResultWidth-1 : 0] iRIn [NumPPUConnects-1 :0] signal [7 : 0) iRInSeq [NumPPUConnects- 1 : 0]
signal [ InResultWidth- 1 : 0 ] RIn [NumPPUConnects-1 : 0] ; signal [ 7 : 0] RInSeq [NumPPUConnects-1 : 0] ; signal [SopBi t s - l : 0 ] SOHIn [NumPPUConnects-1 : 0] ; signal [EopBi t s -1 : 0 ] EOHIn [NumPPUConnects-1: 0] ; signal [DataWidth- 1 : 0] DIn [NumPPUConnects-1 : 0] ;
RIn [ 0] RIn_0 ; RInSeq CO] RInSeq_0; SOHIn [ 0] SOHIn_0 ; EOHIn [ 0] EOHIn_0 ; DIn [ 0] DIn 0;
signal [NumPPUConnects-1 : 0] Match = MIn OR iMIn; signal [InResultWidth-1 : 0] Result [NumPPUConnects-1: 0] ; signal [7:0] Sequence [NumPPUConnects -1 : 0] ; signal [Log2NumPPUConnects : 0] LastArrivalData ; signal [Log2NumPPUConnects : 0] LastArrivalResult ;
FOR i from 1 to (NumPPUConnects-1) increment 1 Result [i] = (RInVal [i] ) ? RIn [i] : iRIn [i] ,- Sequence[i] = (RInVal [i] ) ? RInSeq[i] : iRInSeq[i]; ENDLOOP
PROCESS (on rising edge of CIk) : VALUE_LATCH_PROCESS IF Rst is high assign initial value to iRInVal assign initial value to iMIn
ELSE
// latch value for I/F FOR i from 1 to (NumPPUConnects-1) increment 1 IF (RInVaICi] == 1) iRInVal [i] <= RInVal [i] ; iMIn[i] <= MIn [i] ; iRIn [i] <= RInCi] ; iRInSeq[i] <= RInSeq[i]; ENDIF ENDLOOP 75
ENDIF ENDPROCESS
// latch all the signals to start processing // process if all I/F have given valid data
PROCESS (on rising edge of CIk) : DATA_SELECTION_MUX_PROCESS IF Rst is high assign initial value to ROutAVal assign initial value to ROutBVal assign initial value to ROutDVal assign initial value to DValOut assign initial value to SOHOut assign initial value to EOHOut assign initial value to DOut assign initial value to StartOutputData
ELSE
//default assign default value to ROutAVal assign default value to ROutBVal assign default value to ROutDVal assign default value to DValOut assign default value to SOHOut assign default value to EOHOut
// if all I/F have given valids
IF (RInVaI OR iRInVal) expression is all ones
IF user defined Match [0] is true ROutAVal <= 1;
ELSE IF user defined Match [0] is true
ROutBVal <= 1; ELSE
ROutDVal <= 1; ENDIF
Clear all latched values on iRInVal, iMln // start the transfer of data set StartOutputData & SequenceCheck to One Latch the port which gave the data last ENDIF
// start ouptuting data IF (RInVaI are received by each connected PPU) OR StartOutputData is set )
IF (RInVaI OR iRInVal) expression is all ones set LastArrivalData to the channel which gave last RInVaI
ENDIF
// output data
DValOut = DVaIIn [LastArrivalData] ,-
SOHOut = SOHIn [LastArrivalData] ;
EOHOut = EOHIn [LastArrivalData] ;
DOut = DIn [LastArrivalData] ; endif 76
// clear the StartOutputData on EOP IF valid EOHOut is received reset StartOutputData to zero ENDIF
ENDIF ENDPROCESS
// latch all the signals to start processing // process if all I/F have given valid data
PROCESS (on rising edge of CIk) : RESULT_GENERATION_PROCESS IF Rst is high assign initial value to ROut assign initial value to SOut assign initial value to OutOfSeqErr assign initial value to SequenceCheck
ELSE
//default assign default value to OutOfSeqErr assign default value to SequenceCheck
// start ouptuting data
IF (RInVaI are received by each connected PPU) OR ( StartOutputData is set )
IF (RInVaI OR iRInVal) expression is all ones set LastArrivalResult to the channel which gave last RInVaI ENDIF
IF (Match [O]) ROut = Result [O];
ELSE IF (Match [O]) ROut = Result [OJ ;
ELSE ROut = 0;
ENDIF
SOut = RInSeq [LastArrivalResult] ;
ENDIF
//OutOfSeqErr
IF ( (SequenceCheck ACTIVE) AND
(ANY OF THE RECEIVED SEQUENCE NUMBER DO NOT MATCH) ) declare OutOfSeqErr to be active ENDIF
ENDIF ENDPROCESS 77
// out alingment process PROCESS (on rising edge of CIk) OUTPUT ALINGMENT PROCESS PROCESS IF Rst is low
IF Result output delay is more reclock to align with the result outputs ROutAVal reclock to align with the result outputs ROutBVal reclock to align with the result outputs ROutDVal reclock to align with the result outputs DVaIOut reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DOut ENDIF IF Data output delay is more reclock to align with the result outputs ROut reclock to align with the result outputs SOut ENDIF ENDIF ENDPROCESS
// register map
PROCESS (on rising edge of CIk) REGISTER MAP PROCESS IF Rst is high assign initial value to MapRdData ELSE
// case address decoding
IF user requested a read (MapWrRd_n == 0) CASE (MapAddr) // PPU ID read only 0: MapRdData = DFUID; ENDCASE ENDIF ENDIF ENDPROCESS
78 APPENDIX I
//
//
// This file contains decision forwarding unit (DFU_4)
//
//
Obj ect DFU PARAMETERS (
// configuration constant
DataWidth = 32;
InResultWidth = 8;
OutResultWidth = 8;
NumPPUConnects = 4 ;
Log2NutnPPUConnects = 2;
DFUID = 0;
SopBits = 1;
EopBits = 4 ;
J INTERFACES (
// clock & reset input Rst ; input CIk;
// Micro-processor bus input MapWrRd_n ; input [2 : 0] MapAddr ; input [31 : 0] MapWrData ; output [31 : 0] MapRdData ;
// input buβeε input [NumPPUConnects- 1:0] RInVaI ; input [NumPPUConnects- 1:0] MIn; input [InResultWidth-1 :0] RIn_0 ; input [7:0] RInSeq_0; input [InResultWidth- 1 :0] RIn_l ; input [7:0] RInSeq_l ; input [InResultWidth-1 :0] RIn_2 ; input [7:0] RInSeq_2 ; input [InResultWidth-1 :0] RIn_3 ; input [7:0] RInSeq_3 ;
// in packet header input [NumPPUConnects- 1:0] DVaIIn,• input [SopBits-1 : 0] SOHIn_0; input [EopBits-l:0] EOHIn_0; input [DataWidth- 1 :0] DIn_0 ; input [SopBits-1 : 0] SOHIn-I ; input [EopBitS-1 : 0] E0HIn_l ; input [DataWidth-l:0] DIn_l ; input [SopBits-l:0] SOHIn_2 ; input [EopBits-1 : 0] EOHIn_2 ; input [DataWidth-l:0] DIn_2 ; input [SopBitS-1 : 0] SOHIn 3 ; 79 input [EopBitS-l:0] EOHIn_3 ; input [DataWidth-l:0l DIn_3 ;
// out result output ROutAVaI ; output ROutBVal ; output ROutDVal ; output [OutResultWidth-1 : 0] ROut ; output [7:0] SOut ;
// out packet header common output DVaIOut ; output [SopBits-l:0] SOHOut ; output [EopBitS-l: 0] EOHOut ; output [DataWidth-l:O] DOut ; output OutOfSeqErr
) ;
{
// internal signals signal StartOutputData ; signal SequenceCheck ; signal [NumPPUConnects-l : 0] iRInVal; signal [NumPPUConnects-l : 0] iMIn; signal [InResultWidth-1 : 0] iRIn [NumPPUConnects-l :0] signal [7:0] iRInSeq [NumPPUConnects-l : 0]
signal [InResultWidth- 1 : 0] RIn [NutnPPUConnects-l^] ; signal [7:0] RInSeq [NumPPUConnects-l : 0] ; signal [SopBits-1: :0] SOHIn [NutnPPUConnects-l:0] ; signal [EopBitS-1 : EOHIn [NumPPUConnects-l:0] ; signal [DataWidth- -1:0] DIn [NumPPUConnect s - 1 : 0 ] ,-
RIn [0] RIn_0; RInSeq[0] RInSeq_0; SOHIn [0] SOHIn_0 ; EOHIn [0] EOHIn_0 ; DIn [0] DIn_0 ,- RIn [1] RIn_l; RInSeqtl] RInSeq_l; SOHIn [1] SOHIn_l ; EOHIn [1] EOHIn_l ; DIn [1] DIn_l ; RIn [2] RIn_2 ; RInSeq[2] RInSeq_2 ; SOHIn [2] SOHIn_2 ; EOHIn [2] EOHIn_2 ; DIn [2] DIn_2 ,- RIn [3] RIn_3 ;
RInSeq[3] RInSeq_3 j SOHIn [3] SOHIn_3 ; EOHIn [3] EOHIn_3 ; DIn [3] DIn 3; 80
signal [NumPPUConnects- 1 : 0] Match = MIn OR iMIn,- signal [InResultWidth-1 : 0] Result [NumPPUConnects-l:0] signal [7:0] Sequence [NumPPUConnects-1 : 0] signal [Log2NumPPUConnects : 0] LastArrivalData; signal [Log2NumPPUConnects : 0] LastArrivalResult ;
FOR i from 1 to (NumPPUConnects-1) increment 1 Result [i] = (RInVaICi]) ? RIn[i] : iRInti]; Sequence [i] = (RInVaIIi]) ? RInSeqti] : iRInSeqti] ENDIiOOP
PROCESS (on rising edge of CIk) VALUE LATCH PROCESS IF Rst is high assign initial value to iRInVal assign initial value to iMIn
ELSE
/ / latch value for I/F
FOR i from 1 to (NumPPUConnects- 1 ) increment 1 IF (RInVaI [ i l == 1 ) iRInVal ( i] <= RInVaI [i] ; iMIn [ i ] <= MIn [i] ; iRIn f i] <= RIn [i ] ; iRInSeq [ i ] <= RInSeq [i] ; ENDIF ENDLOOP
ENDIF ENDPROCESS
// latch all the signals to start processing // process if all I/F have given valid data
PROCESS (on rising edge of CIk) : DATA_SELECTION_MUX_PROCESS IF Rst is high assign initial value to ROutAVal assign initial value to ROutBVal assign initial value to ROutDVal assign initial value to DValOut assign initial value to SOHOut assign initial value to EOHOut assign initial value to DOut assign initial value to StartOutputData
ELSE
//default assign default value to ROutAVal assign default value to ROutBVal assign default value to ROutDVal assign default value to DValOut assign default value to SOHOut assign default value to EOHOut 81
// if all I/F have given valids
IF (RInVaI OR iRInVal) expression is all ones IF user defined ScMatch is true
ROutAVal <= 1; ELSE IF user defined |Match is true
ROutBVal <= 1; ELSE
ROutDVal <= 1; ENDIF
Clear all latched values on iRInVal, iMIn // start the transfer of data set StartOutputData & SequenceCheck to One Latch the port which gave the data last ENDIF
// start ouptuting data
IF (RInVaI are received by each connected PPU) OR ( StartOutputData is set )
IF (RInVaI OR iRInVal) expression is all ones set LastArrivalData to the channel which gave last RInVaI ENDIF
// output data
DValOut = DVaIIn [LastArrivalData] ; SOHOut = SOHIn [LastArrivalData] ; EOHOut = EOHIn [LastArrivalData] ; DOut = DIn [LastArrivalData] ; endif
// clear the StartOutputData on EOP IF valid EOHOut is received reset StartOutputData to zero ENDIF
ENDIF ENDPROCESS
// latch all the signals to start processing // process if all I/F have given valid data
PROCESS (on rising edge of CIk) : RESULT_GENERATION_PROCESS IF Rst is high assign initial value to ROut assign initial value to SOut assign initial value to OutOfSeqErr assign initial value to SequenceCheck
ELSE 82
//default assign default value to OutOfSeqErr assign default value to SequenceCheck
// start ouptuting data
IF (RInVaI are received by each connected PPU) OR ( StartOutputData is set )
IF (RInVaI OR iRInVal) expression is all ones set LastArrivalResult to the channel which gave last RInVaI ENDIF
IF (tMatch) ROut = ReSuIt[O];
ELSE IF(|Match) ROut = Result [1] ;
ELSE ROut = 0;
ENDIF
SOut = RInSeq [LastArrivalResult] ;
ENDIF
//OutOfSeqErr
IF ( (SequenceCheck ACTIVE) AND
(ANY OF THE RECEIVED SEQUENCE NUMBER DO NOT MATCH) ) declare OutOfSeqErr to be active ENDIF
ENDIF ENDPROCESS
// out alingment process PROCESS (on rising edge of CIk) OUTPUT ALINGMENT PROCESS PROCESS IF Rst is low
IF Result output delay is more reclock to align with the result outputs ROutAVaI reclock to align with the result outputs ROutBVal reclock to align with the result outputs ROutDVal reclock to align with the result outputs DValOut reclock to align with the result outputs SOHOut reclock to align with the result outputs EOHOut reclock to align with the result outputs DOut ENDIF IF Data output delay is more reclock to align with the result outputs ROut reclock to align with the result outputs SOut ENDIF ENDIF ENDPROCESS
// register map
PROCESS (on rising edge of CIk) : REGISTER_MAP_PROCESS IF Rst is high assign initial value to MapRdData 83
ELSE
// case address decoding
IF user requested a read (MapWrRd_n == 0) CASE (MapAddr) // PPU ID read only 0: MapRdData = DFUID; ENDCASE ENDIF ENDIF ENDPROCESS
84
APPENDIX J
//
// This file contains Packet Processing Unit(s) (PPU) //
Object PPU
PARAMETERS (
// configuration constant
DataWidth 32;
Log2DataWidth 5;
MaxHdrWords 8;
Log2MaxHdrWords = 3;
QualifierWidth 2;
ResultWidth 8;
FieldWidth 16;
Log2FieldWidth 4;
LookupAvaliable = 0;
LookupDepth = 8;
Log2LookupDepth = 3;
TAGBits 0;
SobBits 1;
EobBits 4;
UseFIFOStorage 0;
PPUID 0;
IndexWidth = LLooσg22DDaattaWidth+Log2MaxHdrWords ;
INTERFACES (
// Result output expression is ExtHdrVal
// clock & reset input Rst; input CIk;
// input packet header input InVaI; input [SobBits-l:0] SOHIn; input [EobBits-l:0] EOHIn; input [DataWidth-1 :0] Dataln;
// output packet header output OutVal ; output [SobBits-l:0] SOHOut ; output [EobBits-l:0] EOHOut ; output [DataWidth-l:0] DataOut ;
// control signal input [QualifierWidth-l:0] QualEnb; input [2:0] QualCond; input [IndexWidth-l:0] Index; input [Log2FieldWidth : 0] Width; input [2:0] Opcode ; input [FieldWidth-l:0] Paraml ; input [FieldWidth-l:0] Param2 ; input [FieldWidth-1: 0] Mask;
// Micro-processor bus input MapWrRd_n 85 input [3:0] MapAddr, input [31:0] MapWrData,- output [31:0] MapRdData; // Result values output Match; output ResVal ; output [ReεultWidth-1 :0] Result; output [7:0] SeqOut , ) ,
{ ********************** *****************************************
// internal signal decleration //A****************************************************************** signal ExtOutVal ; signal [SobBits - 1 : 0] ExtOutSob; signal [EobBits - 1 : 0] ExtOutEob; signal [DataWidth- 1 . 0 ] ExtOutDat; signal [ FieldWidth- 1 : 0] ExtHdrValue; signal [ IndexWidth- 1 • 0] Adjustedlndex = Index + Width; signal ExtHdrValid; signal ExtHdrValid [NoOfLookups-1 : 0] ; signal [DataWidth-1 : 0] ExtHdrValue [NoOfLookups -1 : 0] , signa1 [Log2MaxHdrWords : 0] WordNo [NoOfLookups- 1:0] ; signal [Log2DataWidth-l : 0] Startlndex [NoOfLookups-1 : 0] ; signal [Log2DataWidth-l 0] DataShift; constant FieldWidthMultiple
(FieldWidth+DataWidth-1) /DataWidth;
FOR i from 1 to (NoθfLookups-1) increment 1
WordNo[i] = (Index [IndexWidth-1 :Log2DataWidth] +i) ;
Startlndex [i] = 0,-
ENDLOOP signal iOutVal ; signal iOutMatch; signal QEnable ; signal [QualifierWidth-1 : 0] QualEnbCondi11on,-
signal LkWrEnb; signal [Log2LookupDepth-l • 0] LkWrAddr; signal [FieldWidth-1 0] LkWrData,- signal [FieldWidth-1 : 0] LkWrRBData,-
// Special purpose registers for the MLU Blocks for custom instruction signal [31:0] SPReg_0; signal [31:0] SPReg_l; signal [31.0] SPReg_2,- signal [31:0] SPReg_3 ; signal Match; constant Idle = 0; 86 constant Xfr 1 ;
// TEMPLATE CLASS instant iation for HLU HLU (
0 ,
SobBits, EobBits, DataWidth, Log2DataWidth,
MaxHdrWordε , Log2MaxHdrWords , DataWidth )
HeaderExtract_NameInstance (
// reset and clock Rst, CIk, // decoding rules WordNo , StartIndex, // input data
InVaI, SOHIn, EOHIn, Dataln, // Decoded Header Values ExtHdrValid, ExtHdrValue, // output data
ExtOutVal, ExtOutSob, ExtOutEob, ExtOutDat ) ;
// data byte alingment
PROCESS (on change to ExtHdrValue)
Adjust ExtHdrValue with DataWidth, FieldWidth AdjustedIndex[Log2DataWidth-l : 0] to have\ the correct byte alingment at byte zero ENDPROCESS
// extract qualifier enable PROCESS (on rising edge of CIk) IF Rst is high assign inittal value to QEnable ELSE set QEnable to disabled state Zero to begin // get the case statement for enable CASE (QualCond)
3 'do : set QEnable to enable state One to Unconditional 3'dl : set QEnable to enable state One to if QualEnbCondition Equal to QualEnb
3'd2 : set QEnable to enable state One to if QualEnbCondition Greater than QualEnb
3 ' d3 : set QEnable to enable state One to if QualEnbCondition Greater than or Equal to QualEnb
3 ' d4 : set QEnable to enable state One to if QualEnbCondition Less than QualEnb 87
3'd5 : set QEnable to enable state One to if QualEnbCondition not Equal to QualEnb default : QEnable = 0,- ENDCASE ENDIF ENDPROCEΞS
// MLU Template class instance
MLU
( Fieldwidth, Log2FieldWidth, LookupAvaliable, LookupDepth,
Log2LookupDepth )
MLU_NameInstance
(
// reset and clock
Rst, CIk,
// control signals
QEnable, Opcode, width, Parana, Param2 , LkWrEnb, LkWrAddr, LkWrData, LkWrRBData,
SPReg_0, SPReg_l, ΞPReg_2, SPReg_3 ,
// input check values
ExtHdrValid, ExtHdrValue, Mask,
// results iOutVal, iOutMatch ) ;
//
// I/O out signal
// reclock signals signal [Log2MaxHdrWords+2 : 0] Sellndex,- signal ExtOutVal_r [MaxHdrWords+2 : 0] ,- signal [SobBits-1 : 0] ExtOutSob_r [MaxHdrWords+2 : 0] signal [EobBits-1 : 0] ExtOutEob_r [MaxHdrWords+2 : 0] signal [DataWidth-1 : 0] ExtOutDat_r [MaxHdrWords+2 : 0] ; signal iResVal;
OutVal = ExtOutVal_r [Sellndex] ,- SOHOut = ExtOutSob_r [Sellndex] ,- EOHOut = ExtOutEob_r [Sellndex] ; DataOut = ExtOutDat_r [Sellndex] ; Result = ExtHdrValue; ResVal = iResVal ,-
// on reclock will do for no lookup
PROCESS (on rising edge of CIk) : OUTPUT_ALINGEMENT_PROCESS IF Rst is high assign reset values to iResVal assign reset values to Match assign reset values to Sellndex ELSE
Reclock the signals ExtOutVal_r, ExtOutSob_r, ExtOutEob_ ExtOutDat_r\ to generate a MaxHdrWords+3 word pipline for future usage
// default 88 set iResVaϊ to Zero
IF valid SOHIn is recieved
// word lookup + 1 match clock
Calculate Sellndex based on WordNo of the last lookup ENDIF
// gen expression IF iOutVal is set to high declare iResVal to be valid set Match as iOutMatch ENDIF ENDIF ENDPROCESS
//
***********
// sequence out process
PROCESS (on rising edge of CIk) : SEQUENCE GENERATION IF Rεt is high assign reset values to SeqOut ELSE
// increment the sequence no if valid InSOH is detected IF valid SOHIn is recieved increment SeqOut by one
ENDIF ENDIF ENDPROCESS
//
**************************************************************************
// Register map
//
***************************************************************
// register map process
PROCESS (on rising edge of CIk) : REGISTER_MAP IF Rst is high assign reset values to MapRdData, QualEnbCondition, LkWrEnb, LkWrAddr, \
LkWrData, SPReg_0, SPReg_l, SPReg_2 , SPReg_3 , ELSE
// case address decoding IF user requested a write operation CASE (MapAddr) // PPU ID NO Write
// Enable Condition 1: latch QualEnbCondition from MapWrData // map address 2 : 89 latch LkWrAddr from MapWrData // map data
3 : issue write to user look table using LkWrEnb latch LkWrData from MapWrData
4 : latch SPReg_0 from MapWrData 5: latch SPReg_l from MapWrData
6: latch SPReg_2 from MapWrData 7: latch SPReg_3 from MapWrData ENDCASE // if reading
ELSE
CASE (MapAddr) // PPU ID read only 0: set MapRdData to PPUID // Enable Condition
1: set MapRdData to QualEnbCondition // map address // address needs to be set // for reading data // map data
3 : set MapRdData to LkWrRBData
4 : set MapRdData to SPReg_0
5 : set MapRdData to SPReg_l 6: set MapRdData to SPReg_2
7: set MapRdData to SPReg_3 ENDCASE ENDIF
ENDIF ENDPROCESS
90 APPENDIX K
// — -
/ ///
// This file contains header extraction block (HLU)
/ ///
/ /
Object HLU
PARAMETERS (
NoOfLookups = 0;
SobBits = 1;
EobBits = 4;
DataBits = 32;
Log2DataBits = 5;
MaxHdrWords = 8;
Log2MaxHdrWords = 3;
PartSize = 16;
} INTERFACES (
// reset and clock input Rst; input CIk;
// decoding rules input [Log2MaxHdrWords : 0] WordNo [NoOfLookups -1:0]; input [Log2DataBits-l : 0] Startlndex [NoOfLookups -1:0];
// input data input InVaI; input [SobBits-1: 0] InSob; // Top bit is SOP input [EobBits-1: 0] InEob; // Top bit is EOP input [DataBits- 1 :0] InDat;
// Decoded Header Values. output HdrValid [NoOfLookups -1:0]; output [PartSize-1 :0] HdrValue [NoOfLookups -1:0];
// output data output OutVal ; output [SobBits-1 :0] OutSob; output [EobBitS-1 :0] OutEob; output [DataBits- 1 :0] OutDat ;
) ;
{
// configuration parameter
//////////////////////////////////////////////////////////
// internal signals
I urn urn mi Ii Ii in mm urn mm mi mi in I in I in signal [Log2MaxHdrWords : 0] HdrWordCntr; PROCESS (on rising edge of CIk) 91
IF Rst is high assign initial value to HdrValid assign initial value to HdrValid assign initial value to HdrValue assign initial value to HdrWordCntr // reset ELSE
// get header word counter ticking IF InVaI is high // EOP seen IF InEob is high set HdrWordCntr to zero ELSE IF MaxHdrWords is not equal to MaxHdrWords increment HdrWordCntr by one ENDIF
// header extraction loop for Ip = 0 to (NoθfLookups-1) increment Ip by one
//IF header word is reached and data is valid IF current word is valid header word
// issue valid word indication command declare HdrValid [Ip] valid
// adjust the data to start index assign HdrValue [Ip] to (InDat LEFTSHIFT Startlndex [Ip] ) ENDIF ENDLOOP
ENDIF // Clock ENDPROCESS // always
// output assingment OutVal = InVaI; OutSob = InSob; OutEob = InEob ,- OutDat = InDat ;
92 APPENDIX L
//
//
// This file contains Match/Lookup Unit (MLU)
//
//
Object MLU PARAMETERS (
FieldWidth = 16,-
Log2FieldWidth = 4 ;
LookupAvaliable = 0;
LookupDepth = 8 ;
Log2LookupDepth = 3 ;
NoParallelLookup = 0;
Log2NoParallelLookup = 1; ) INTERFACES (
// reset and clock input Rst; input CIk,- // control signals input QEnable,- input [2:0] Opcode; input [Log2FieldWidth: 0] Width; input [FieldWidth- 1: 0] Paraml ; input [FieldWidth-1: 0] Param2 ; input LkWrEnb; input [Log2LookupDepth-l : 0] LkWrAddr,- input [FieldWidth- 1:0] LkWrData,- output [FieldWidth-1: 0] LkWrRBData; input [31:0] SPReg_0 ; input [31:0] SPReg_l ; input [31:0] SPReg_2 ; input [31:0] SPReg_3 ;
// input check values input InVaI ; input [FieldWidth-1: 0] InData; input [FieldWidth-l:0] InMask;
// results output OutVal ; output OutMatch;
{
// internal block generateion signal [Log2LookupDepth-1:0] LkRdAddr; signal [FieldWidth- 1:0] LkRdData,- signal [FieldWidth-l:0] iLkWrRBData,- signal LklnStart ; signal LklnProgress; 93 signal LnkProcessState ; constant Idle = l'bO; coniissstttaaannnttt CCChhheeeccckkk === lll'''bbblll;;; Il adjust the comparision to the width user requested
// The actual process
PROCESS (on rising edge of CIk) IF Rεt is high assign reset values to OutVal assign reset values to OutMatch assign reset values to LklnStart assign reset values to LklnProgress assign reset values to LnkProcessState ELSE
//default
Clear OutVal & LklnStart
// if the Match unit is enabled than try to do something IF QEnable is active high CASE (Opcode)
3 'dO : // EQ: Equal to Paraml IF InVaI observed OutVal = 1 ; OutMatch = ((InData & InMask) == (Paraml
InMask) ) *
ENDIF
3'dl : // LT: Less Than Paraml IF InVaI observed
OutVal = 1;
OutMatch = ( (InData & InMask) < (Paraml
InMask) )
ENDIF
3 'd2 : // LE: LeEB Than or Equal to Paraml IF InVaI observed
OutVal = I,-
OutMatch = ( (InData Sc InMask) = (Paraml &
InMask) )
ENDIF
3'd3 : // GT: Greater Than Paraml IF InVaI observed
OutVal = 1;
OutMatch = ( (InData & inMaek) > (Paraml &.
InMask) )
ENDIF
3'd4 : 11 GE: Greater Than or Equal to Paraml IF InVaI observed
OutVal = 1;
OutMatch = ((InData & InMask) >= (Paraml &
InMask) )
ENDIF
3 ' d5 : // RNG: Check if within range <Paraml , Param2> IF InVaI observed OutVal = 1 ;
OutMatch = ( ((InData & InMask) >= (Paraml & InMask)) && ((InData & InMask) = (Param2 & InMaak) ) ) ,- 94
ENDIF
3 d6 : // LUP : Look up IF InVaI observed
IF LookupAvaliable is active OutVal = 1; OutMatch = 1;
// start lookup if the module is enabled for look up
ELSE
LklnStart = 1 ; ENDIF end
3'd7: // EXTR: Extract IF InVaI observed OutVal = 1; OutMatch = 1;
ENDIF default :
IF InVaI observed OutVal = 1; OutMatch = 1; ENDIF ENDCASE
//else jut return okay
ELSE IF InVaI observed
OutVal = 1;
OutMatch = 0;
ENDIF
ENDIF ENDPROCESS

Claims

36 CLAIMSWhat is claimed is:
1. A system for designing packet processing products, comprising: a user interface for allowing a user to define a desired packet processing algorithm using a plurality of discrete packet processing blocks, each of said blocks corresponding to a portion of said desired packet processing algorithm; means for allowing the user to define connections between said plurality of packet processing blocks; and means for processing said plurality of packet processing blocks and said connections to provide a list of instructions in a hardware description language for producing an integrated circuit capable of executing said desired packet processing algorithm.
2. The system of Claim 1 , further comprising an integrated circuit constructed using said list of instructions.
3. The system of Claim 1 , further comprising a NETLIST generated using said list of instructions.
4. The system of Claim 1, wherein said plurality of packet processing blocks further includes a Packet Processing Unit (PPU) for: extracting a header of a packet; pointing to a portion of the header of a predetermined width using a predetermined index of a bit location in the header; comparing the data represented by the portion of the header with at least one predetermined value; and declaring a match when the result of the comparison is true.
5. The system of Claim 4, wherein said Packet Processing Unit further includes means for accessing an external Content-Addressable Memory (CAM) or Random- Access Memory (RAM). 37
6. The system of Claim 5, wherein said Packet Processing Unit further includes a Hardware Lookup Unit for extracting a desired portion of header of a packet based on determining when a start of header bit is active; determining the number of bits for which the data stream of the packet is valid after said start of header bit is active; and determining when the end of header bit is active; and extracting said header based on said start of header bit, said number of bits, said end of header bit, said index, and said width.
7. The system of Claim 6, wherein said Packet Processing Unit further includes a Delay/FIFO module for delaying the extracted header by the sum of a predetermined number of clock cycles and a variable number of clock cycles based on said predetermined index.
8. The system of Claim 7, wherein said Delay/FIFO module is implemented using a delay line.
9. The system of Claim 7, wherein said Delay/FIFO module is implemented using a FIFO.
10. The system of Claim 7, wherein said Packet Processing Unit further includes a Match and Lookup Unit for determining if a user defined match of a condition is true to generate a match output based on a comparison of said desired portion of header of a packet with one or more user defined parameters and a predetermined logical condition.
11. The system of Claim 10, wherein said Packet Processing Unit further includes a Result Generation process for generating a result output based on one of a fixed expression, an arithmetic expression, and a logical expression.
12. The system of Claim 11, wherein said result output is used as an input to another Packet Parsing Unit.
13. The system of Claim 11, wherein said Packet Processing Unit further includes a Sequence Generation process for generating a sequence number for use by a Decision and Forwarding Unit. 38
14. The system of Claim 13, wherein said Packet Parsing Unit further includes an Output Alignment process for aligning said packet header with said result output and said match output.
15. The system of Claim 14, wherein said packet header, said result output, and said match output are aligned on a start of packet boundary.
16. The system of Claim 14, wherein said packet header, said result output, and said match output are aligned on an end of packet boundary.
17. The system of Claim 14, wherein said Packet Parsing Unit has a plurality of internal programmable registers.
18. The system of Claim 14, wherein said Packet Parsing Unit is programmable from an external microprocessor.
19. The system of Claim 1, wherein said plurality of packet processing blocks further includes a Packet Modification Unit (PMU) for: extracting a packet; pointing to a portion of the packet of a predetermined width using a predetermined index of a bit location in the packet; and modifying the portion of the packet.
20. The system of Claim 19, wherein modifying the portion of the packet further includes means for deleting the portion of the packet.
21. The system of Claim 19, wherein modifying the portion of the packet further includes means for overwriting the portion of the packet.
22. The system of Claim 19, wherein modifying the portion of the packet further includes means for inserting data at the position in the portion of the packet pointed to by index.
23. The system of Claim 19, wherein said Packet Modification Unit further includes a Delay/FIFO module for delaying the packet by the sum of a predetermined number of clock cycles and a variable number of clock cycles based on a number of bytes to be inserted.
24. The system of Claim 23, wherein said Delay/FIFO module is implemented using a delay line. 39
25. The system of Claim 23, wherein said Delay/FIFO module is implemented using a FIFO.
26. The system of Claim 23, wherein said Packet Modification Unit further includes a Modification Unit for modifying said portion of the packet based on a ByteOffset input indicating said index and ByteValid input indicating the number of clock cycles needed for modifying the packet.
27. The system of Claim 26, wherein said Modification Unit further includes a ByteData input for providing bytes to be inserted into the packet.
28. The system of Claim 26, wherein said Packet Modification Unit further includes a Result Generation process for generating a result output based on a number of bytes inserted into the packet.
29. The system of Claim 28, wherein said result output is used as an input to one of another Packet Parsing Unit and another Packet Modification Unit.
30. The system of Claim 28, wherein said Packet Modification Unit further includes a Sequence Generation process for generating a sequence number for use by a Decision and Forwarding Unit.
31. The system of Claim 30, wherein said Packet Modification Unit further includes an Output Alignment process for aligning said packet with said result output.
32. The system of Claim 31, wherein said packet and said result output are aligned on a start of packet boundary.
33. The system of Claim 31, wherein said packet header and said result output are aligned on an end of packet boundary.
34. The system of Claim 19, wherein said Packet Modification Unit is programmable from an external microprocessor.
35. The system of Claim 1, wherein said plurality of packet processing blocks further includes a Decision and Forwarding Unit (DFU) for performing one of drop, queue, and forwarding operations on at least one packet.
36. The system of Claim 35, wherein said Decision and Forwarding Unit performs one of drop, queue, and forwarding operations on at least one packet 40 based on at least one match output and at least one result output of a Packet Processing Unit.
37. The system of Claim 36, wherein said Decision and Forwarding Unit further includes a first Latch for latching said at least one incoming packet and a second Latch for latching a result output associated with said at least one incoming packet.
38. The system of Claim 37, wherein said Decision and Forwarding Unit further includes a Data Selection Multiplexer for selecting one of said at least one incoming packet for output to one of a drop, queue, and forwarding port.
39. The system of Claim 38, wherein said Decision and Forwarding Unit further includes a Result Generation process for selecting a result output and a match output associated with said at least one incoming packet.
40. The system of Claim 39, wherein said match output and said result output determines to which port said packet is forwarded.
41. The system of Claim 39, wherein said result output is based on one of a fixed expression, an arithmetic expression, and a logical expression.
42. The system of Claim 39, wherein said Decision and Forwarding Unit further includes an Output Alignment process for aligning said packet with said result output.
43. The system of Claim 42, wherein said packet and said result output are aligned on a start of packet boundary.
44. The system of Claim 42, wherein said packet header and said result output are aligned on an end of packet boundary.
45. The system of Claim 35, wherein said Decision and Forwarding Unit is programmable from an external microprocessor.
46. The system of Claim 1, further including a packet processing block for checksum or CRC generation or checking.
47. The system of Claim 1, further including a packet processing block for packet header removal.
48. The system of Claim 1, further including a packet processing block for packet header or trailer addition. 41
49. The system of Claim 1 , further including a packet processing block for per flow rate control.
50. A method for designing packet parsing and classification products, comprising the steps of: providing a user interface for allowing a user to define a desired packet processing algorithm using a plurality of discrete packet processing blocks, each of said blocks corresponding to a portion of said desired packet processing algorithm; allowing the user to define connections between said plurality of packet processing blocks; and processing said plurality of packet processing blocks and said connections to provide a list of instructions in a hardware description language for producing an integrated circuit capable of executing said desired packet processing algorithm.
51. The method of Claim 50, further comprising the step of constructing an integrated circuit using said list of instructions.
52. The method of Claim 51, further comprising the step of generating a NETLIST using said list of instructions.
53. The method of Claim 51 , further comprising the step of filling out a connection document based on the plurality of packet processing blocks.
54. The method of Claim 53, wherein said connection document is implemented in a graphical user interface.
55. The method of Claim 54, further including the step of configuring the plurality of packet processing blocks from a plurality of files each containing a different type of packet processing block of maximal functionality.
56. The method of Claim 55, wherein said step of configuring further includes the step of using a preprocessor to perform substitution, looping, and branching to cull a customized packet processing block from said packet processing block of maximal functionality.
57. The method of Claim 56, further including the step of instantiating the plurality of processing blocks and making connections between said packet processing blocks in a top level file. 42
58. The method of Claim 50, wherein said plurality of packet processing blocks further includes a Packet Processing Unit (PPU) for: extracting a header of a packet; pointing to a portion of the header of a predetermined width using a predetermined index of a bit location in the header; comparing the data represented by the portion of the header with at least one predetermined value; and declaring a match when the result of the comparison is true.
59. The method of Claim 58, wherein said Packet Processing Unit further includes means for accessing an external CAM or RAM.
60. The system of Claim 59, wherein said plurality of packet processing blocks further includes a Packet Modification Unit (PMU) for: extracting a packet; pointing to a portion of the packet of a predetermined width using a predetermined index of a bit location in the packet; and modifying the portion of the packet.
61. The system of Claim 60, wherein said plurality of packet processing blocks further includes a Decision and Forwarding Unit (DFU) for performing one of drop, queue, and forwarding operations on at least one packet.
PCT/US2007/012583 2007-05-24 2007-05-24 System and method for designing and implementing packet processing products WO2008143622A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2007/012583 WO2008143622A1 (en) 2007-05-24 2007-05-24 System and method for designing and implementing packet processing products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2007/012583 WO2008143622A1 (en) 2007-05-24 2007-05-24 System and method for designing and implementing packet processing products

Publications (1)

Publication Number Publication Date
WO2008143622A1 true WO2008143622A1 (en) 2008-11-27

Family

ID=40032186

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/012583 WO2008143622A1 (en) 2007-05-24 2007-05-24 System and method for designing and implementing packet processing products

Country Status (1)

Country Link
WO (1) WO2008143622A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010886A1 (en) * 2000-01-18 2002-01-24 Daihen Corporation Recording medium storing a program for constructing scan paths, scan path constructing method, and arithmetic processing system in which said scan paths are integrated
US20030198204A1 (en) * 1999-01-13 2003-10-23 Mukesh Taneja Resource allocation in a communication system supporting application flows having quality of service requirements
US20050058149A1 (en) * 1998-08-19 2005-03-17 Howe Wayne Richard Time-scheduled and time-reservation packet switching
US20060039280A1 (en) * 1999-08-10 2006-02-23 Krishnasamy Anandakumar Systems, processes and integrated circuits for rate and/or diversity adaptation for packet communications
US20060256719A1 (en) * 2002-06-10 2006-11-16 Hsu Raymond T Packet flow processing in a communication system
US20070050603A1 (en) * 2002-08-07 2007-03-01 Martin Vorbach Data processing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050058149A1 (en) * 1998-08-19 2005-03-17 Howe Wayne Richard Time-scheduled and time-reservation packet switching
US20030198204A1 (en) * 1999-01-13 2003-10-23 Mukesh Taneja Resource allocation in a communication system supporting application flows having quality of service requirements
US20060039280A1 (en) * 1999-08-10 2006-02-23 Krishnasamy Anandakumar Systems, processes and integrated circuits for rate and/or diversity adaptation for packet communications
US20020010886A1 (en) * 2000-01-18 2002-01-24 Daihen Corporation Recording medium storing a program for constructing scan paths, scan path constructing method, and arithmetic processing system in which said scan paths are integrated
US20060256719A1 (en) * 2002-06-10 2006-11-16 Hsu Raymond T Packet flow processing in a communication system
US20070050603A1 (en) * 2002-08-07 2007-03-01 Martin Vorbach Data processing method and device

Similar Documents

Publication Publication Date Title
US11677664B2 (en) Apparatus and method of generating lookups and making decisions for packet modifying and forwarding in a software-defined network engine
US7724684B2 (en) System and method for designing and implementing packet processing products
US8726256B2 (en) Unrolling quantifications to control in-degree and/or out-degree of automaton
US9633097B2 (en) Method and apparatus for record pivoting to accelerate processing of data fields
US6671869B2 (en) Method and apparatus for graphically programming a programmable circuit
CN107608750B (en) Device for pattern recognition
JP4558879B2 (en) Data processing apparatus and processing system using table
US7185081B1 (en) Method and apparatus for programmable lexical packet classifier
Nikhil et al. High-level synthesis: an essential ingredient for designing complex ASICs
US7007261B1 (en) Translation of an electronic integrated circuit design into hardware description language using circuit description template
US7990867B1 (en) Pipeline for processing network packets
US7496869B1 (en) Method and apparatus for implementing a program language description of a circuit design for an integrated circuit
US7784014B1 (en) Generation of a specification of a network packet processor
US8874837B2 (en) Embedded memory and dedicated processor structure within an integrated circuit
US7792117B1 (en) Method for simulating a processor of network packets
US7822066B1 (en) Processing variable size fields of the packets of a communication protocol
WO2008143622A1 (en) System and method for designing and implementing packet processing products
US7788402B1 (en) Circuit for modification of a network packet by insertion or removal of a data segment
US7216321B2 (en) Pattern recognition in an integrated circuit design
US7472369B1 (en) Embedding identification information on programmable devices
US8266583B1 (en) Flexible packet data storage for diverse packet processing applications
US8284772B1 (en) Method for scheduling a network packet processor
US7949790B1 (en) Machines for inserting or removing fixed length data at a fixed location in a serial data stream
Keller Programming Model for Network Processing on an FPGA
Soviani High level synthesis for packet processing pipelines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07795398

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07795398

Country of ref document: EP

Kind code of ref document: A1