WO2001075621A1 - Network dma method - Google Patents

Network dma method Download PDF

Info

Publication number
WO2001075621A1
WO2001075621A1 PCT/US2001/009125 US0109125W WO0175621A1 WO 2001075621 A1 WO2001075621 A1 WO 2001075621A1 US 0109125 W US0109125 W US 0109125W WO 0175621 A1 WO0175621 A1 WO 0175621A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
network
computer
buffer
instruction
Prior art date
Application number
PCT/US2001/009125
Other languages
French (fr)
Inventor
Stephen L. Adams
Original Assignee
Baydel Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baydel Limited filed Critical Baydel Limited
Priority to AU2001249337A priority Critical patent/AU2001249337A1/en
Publication of WO2001075621A1 publication Critical patent/WO2001075621A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0008High speed serial bus, e.g. Fiber channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/324Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the data link layer [OSI layer 2], e.g. HDLC

Definitions

  • This invention is directed to application buffer-to-buffer transfers over a network, and more particularly to DMA transfer over a network between application buffers using Fibre Channel.
  • Fibre Channel is a data transport mechanism that includes hardware and a multilayer protocol. Fibre Channel is described in "Fibre Channel Physical and Signaling Interface (FC-PH)" (ANSI X3.230-1994) by the American National Standard for Information Systems, which is incorporated by reference in its entirety. Fibre Channel is used today as a communication path between computers and disks. For example, Fibre Channel is used in Storage Area Networks ("SANs"). When Fibre Channel is used as a communication path between computers and disks, the Small Computer System Interface (“SCSI”) protocol runs on top of the Fibre Channel protocol so that legacy SCSI drivers can still be used to control the data flow. Since a common use of Fibre Channel protocol is to interpret SCSI commands, Fibre Channel adapter cards often have built-in SCSI Assist Hardware to accelerate this process.
  • FC-PH Fibre Channel Physical and Signaling Interface
  • Fibre Channel includes a buffer-to-buffer DMA transfer mechanism. If two computers are connected together with Fibre Channel and the Fibre Channel adapter card in the sending computer is given the address of a sending buffer and the Fibre Channel adapter card in the receiving computer is given the address of a destination buffer, the two adapter cards can transfer data across a Fibre Channel media (e.g., a copper or optical cable) from the sending buffer to the receiving buffer in a single DMA burst.
  • a Fibre Channel media e.g., a copper or optical cable
  • This feature works whether the two nodes are connected point-to-point, through a Fibre Channel hub connecting up to 126 nodes together, or through a series of Fibre Channel switches connecting up to 16 million nodes together.
  • the disk hardware serves as one of the computers and.
  • SCSI Assist Hardware in Fibre Channel adapter cards accelerates the common SCSI disk transactions.
  • SCSI Assist Hardware lets the host driver place the SCSI command containing the SCSI disk request into the card hardware and relieves the host computer from being interrupted until the data has been transferred and the response phase of the SCSI operation completes.
  • SCSI Assist Hardware allows a Fibre Channel adapter card to execute the SCSI command phase, the SCSI data phase, and the SCSI response phase without interrupting the host computer.
  • FIG. 1 shows the layers of a conventional protocol stack based on the Open System Interconnection ("OSI") Seven Layer Reference Model.
  • FIG. 1 compacts layers 5-7 into a single Application layer for ease of reference in relation to the present disclosure.
  • “Application” in this disclosure refers to any program residing above the transport layer, including software that services network requests for file ⁇ data, such as the SRN server module in the Windows NT operating system.
  • the transport layer (e.g., Transmission Control Protocol, or "TCP") provides to an application in a local computer a "virtual circuit" that connects the application to an application in a remote computer even where the remote computer is halfway around the world.
  • TCP Transmission Control Protocol
  • the transport layer maintains this virtual circuit even though the physical network may frequently lose data.
  • the transport layer breaks the application data into "segments" that it gives to the network layer. Segments created by the transport layer may be up to 64 Kbytes. Segments which are not acknowledged by the transport layer on the destination computer are resent.
  • the application data given to the transport layer may have its own application header A (FIG. 1) describing the data.
  • SMB Server Message Block
  • the application may divide the data into units smaller than 64 Kbytes.
  • the transport layer adds its own header T (FIG. 1) and passes the segment down to the network layer.
  • the transport process that creates a virtual circuit requires an acknowledge signal ("ACK") back from the final destination for the data sent. If a specified number of ACKs is not received, the transport layer on the sending side stops sending data. If the missing ACKs are not received in a predetermined time, the data is resent.
  • the transport layer thus, implements both a flow-control mechanism and an error-control mechanism.
  • the network layer breaks the transport segment into datagrams that will fit in the Maximum Transfer Unit (MTU) of the network, which is 1500 bytes for an Ethernet physical layer.
  • MTU Maximum Transfer Unit
  • the network layer attempts to move each of these MTU-size datagrams through the network to the destination.
  • the network layer gives each of these 1500-byte datagrams a network header N (FIG. 1) containing the address of the final destination node.
  • N network header
  • the network layer also adds a Media Access (“MAC”) address to each datagram before passing it down to the data link layer.
  • the MAC address is the physical address, of the very next node in the network path. As the datagram makes its way through the network toward its final destination, the MAC address is replaced at each hop with the address of the next node on the route.
  • the data link layer instructs the network interface card ("NIC") to move the datagram fragment over the physical network to the next node.
  • the data link layer includes the NIC drivers. As FIG. 1 shows, as the application data moves down the protocol stack, it accumulates headers 10. At the data link layer, the first few hundred bytes of the final datagram contain all of headers 10.
  • a read operation can be seen as a write of the read request by a client computer to a server, followed by a write back of the data by the server to the client computer. For example, when a client computer wants to read data from a remote server, the client computer writes a request to the server asking for certain file data. The network is then quiescent with no state maintained about the read operation. When the server locates the data, it writes the data back to the client computer.
  • the transport layer sets up a virtual circuit to the application in the destination computer, or uses a virtual circuit that already exists to this application, and passes a segment of data to the network layer.
  • the application is a remote NT file server
  • the software in the NT server is SRV.
  • the server After receiving the request for file data, the server locates and returns the data.
  • the application source buffer in this case is most likely the cache in the NT server. If the data is already in cache, the cache serves the data directly. If the data is not in the cache, NT reads the data into cache before satisfying the network request.
  • the network layer fragments the segment into MTU-size datagrams which are passed to the data link layer. Since each datagram is a separate entity that may take a different route through the network, the datagrams could arrive at the destination in a different order than they were sent. Because of the possibility of receiving datagrams out of order, the receiving layers below the transport layer in the destination computer buffer and reorder the datagram fragments, if necessary, before passing them to the upper layers. While the chance of datagrams arriving out of order is small on a LAN, LAN datagrams are processed the same way as WAN datagrams.
  • Another reason buffering is required at the receiver is that the datagrams in a conventional network are unsolicited, i.e. the receiving network hardware does not know yet the final destinations for the data in the datagrams.
  • the receiving node puts the unsolicited datagrams into a temporary buffer until the final application buffer is found, at which time the data is copied from the temporary buffer to the application buffer.
  • the receiver buffering moves the data received twice.
  • the transport layer uses a
  • checksum in one of the fields of the transport header T (FIG. 1).
  • the checksum is recalculated at the receiving end as the data arrives and compared with the checksum sent.
  • Computing checksum is a large network overhead.
  • the first datagram received is passed up while succeeding datagrams are placed in temporary buffers.
  • This first datagram contains headers 10, so the upper layers can locate the designated application.
  • the application then passes down an application buffer address and the data link layer begins copying the buffered data to this address, reordering datagrams as necessary.
  • the arriving data is first put into a temporary buffer and later copied to the application buffer.
  • a method for transferring data over a network includes specifying a Maximum Transfer Unit ("MTU") greater or equal to the segment size, sending the network headers of an application data over the network, receiving a start- transfer signal indicating that the destination application buffer is ready to receive application data over the network, and sending the application data from the first application buffer to the second application buffer over the network.
  • MTU Maximum Transfer Unit
  • the network includes a Fibre Channel network.
  • the network includes any network media that allows buffer-to-buffer direct memory access (“DMA”) transfers of data.
  • DMA buffer-to-buffer direct memory access
  • the sending of the network headers, the receiving of the start-transfer signal, the sending of the application data, and the receiving of the transfer status are accomplished using a single hardware SCSI exchange.
  • FIG. 1 illustrates the OSI Seven Layer Reference Model.
  • FIG. 2A illustrates a network in accordance with one embodiment of the present invention.
  • FIG. 2B illustrates a method for buffer-to-buffer transfer over the network of FIG. 2 A in accordance with one embodiment of the present invention.
  • FIG. 3 illustrates a data read process of the method in FIG. 2B.
  • FIG. 2A illustrates a network for transferring data between application buffers 123 and 223 of computers 100 and 200.
  • Computer 100 includes processor 110, memory 120, and network interface card (“NIC”) 130 all coupled to peripheral component interconnect (“PCI") bus 140.
  • Computer 100 is, for example, a client computer such as a "white box” computer using an ASUS P2B mother board with 400MHz Pentium II processor.
  • Memory 120 includes an operating system (“OS”) 121, application 122, application buffer 123, a protocol stack 124, and NIC driver 125.
  • OS 121 is, for example, Windows NT ® Workstation from Microsoft ® Corporation of Redmond, Washington.
  • NIC 130 includes direct memory access controller (“DMA”) 131 and Small Computer System Interface (“SCSI”) Assist Hardware 132.
  • DMA direct memory access controller
  • SCSI Small Computer System Interface
  • NIC 130 is coupled to Fibre Channel cable 133.
  • NIC 130 is, for example, an HHBA-5100A Tachyon TL Fibre Channel Adapter Card available from Agilent Technologies Inc. of Palo Alto, California.
  • Cable 133 is, for example, standard 62.5 micron multi-mode fiber optic cable used commonly with Gigabit Ethernet and Fibre Channel.
  • Computer 200 includes processor 210, memory 220, and network interface card (“NIC”) 230 all coupled to peripheral component interconnect (“PCI”) bus 240.
  • Computer 200 is, for example, a server computer such as a Dell 6300 PowerEdge.
  • Memory 220 includes operating system (“OS”) 221, application 222, application buffer 223, protocol stack 224, and NIC driver 225.
  • OS 221 is, for example, Windows NT ® Server from Microsoft ® Corporation of Redmond, Washington.
  • NIC 230 includes direct memory access controller (“DMA”) 231 and Small Computer System Interface (“SCSI”) Assist Hardware 232.
  • NIC 230 is coupled to a Fibre Channel media 233.
  • NIC 230 is, for example, an HHBA-5100A Tachyon TL Fibre Channel Adapter Card available from Agilent Technologies Inc. of Palo Alto, California.
  • Media 233 is for example, standard 62.5 micron multi-mode fiber optic cable used commonly with Gigabit Ethernet and Fibre Channel.
  • Hub or switch 300 couples cables 133 and 233.
  • Hub/switch 300 is, for example, an LH5000 Digital Fibre Hub from Emulex of Costa Mesa, California.
  • FIG. 2B illustrates a method 40 for transferring data between application buffer
  • NIC drivers 125 and 225 specify a maximum transfer unit ("MTU") greater or equal to the segment size to protocol stacks 124 and 224, respectively, during boot up.
  • MTU maximum transfer unit
  • NIC drivers 125 and 225 specify MTU greater than the segment size.
  • Ethernet requires the network layer in the protocol stack to fragment each 64 Kbyte segment into 40 or more small datagrams because Ethernet has an MTU of 1500 bytes.
  • NIC drivers 125 and 225 overcome this fragmentation by specifying an MTU during system boot up that is large enough for an entire segment. While Ethernet cannot accommodate an MTU this large, Fibre Channel can. All of the segment data from the transport layer of protocol stack 124 and 224 are therefore submitted directly to respective NIC drivers 125 and 225.
  • NIC drivers 125 and 225 can thereafter transfer the complete segment in one DMA burst using buffer-to-buffer mechanisms (described later) in respective NICs 130 and 230.
  • the data received does not need to be saved in a temporary buffer for possible reordering as the Fibre Channel hardware guarantees in-order delivery of the DMA burst.
  • NICs 130 and 230 use a Fibre Channel frame of 2112 bytes. However, this frame size limitation is not visible outside of NICs 130 and 230. Thus, the MTU specified by NIC drivers 125 and 255 during boot up is not limited by the Fibre Channel frame size. Action 402 is followed by action 404.
  • NIC driver 125 receives a read request from application 122 through protocol stack 124.
  • the read request is a request for data stored in computer 200.
  • Action 404 is followed by action 406.
  • NIC driver 125 causes NIC 130 to transmit the read request to NIC 230.
  • the read request is a server message block ("SMB") read request and NIC 130 transmits the SMB read request as a Fibre Channel single frame sequence (SFS) to NIC 230.
  • SMB server message block
  • FSS Fibre Channel single frame sequence
  • NIC 130 transmits the read request to NIC 230 over cable 133 and cable 233 through hub/switch 300.
  • Action 406 is followed by action 408.
  • NIC driver 225 receives the read request from NIC driver 230 and passes the read request to application 222 through protocol stack 224.
  • Action 408 is followed by action 410.
  • application 222 locates the requested data. The requested data may be located on a hard disk or in application buffer 223 (also known as "cache") in memory 220.
  • Action 410 is followed by action 412.
  • NIC driver 225 receives the buffer address (e.g., address of application buffer 223) of the requested data from application 222 through protocol stack 224 and sets up NIC 230 (more specifically, DMA controller 231) as the transmitting end of a DMA transfer between application buffer 223 in computer 200 and some buffer, as yet unknown, in computer 100.
  • Action 412 is followed by action 414.
  • NIC driver 225 causes NIC 230 to transmit headers 10 (FIG. 1) of the requested data to NIC 130.
  • NIC 230 transmits headers 10 to NIC 130 as a SCSI command ("FCP_CMND") in a Fibre Channel SFS. This allows NIC 230 to use SCSI Assist Hardware 232 for the pending DMA data transfer without invoking host software.
  • FCP_CMND SCSI command
  • NIC driver 225 does not continue sending requested data after sending the headers. Instead, NIC driver 225 sets up the sending end of a DMA transfer from the application buffer address received, sends one frame of a couple of hundred bytes containing all of headers 10 (FIG. 1), and then waits for NIC driver 125 to obtain the destination application buffer address from headers 10 and set up the receiving end of the DMA transfer. Thus, other than the headers, NIC driver 225 does not transmit unsolicited data (data without destination buffer address) to computer 100 and cause computer 100 to store the requested data in buffers reserved for unsolicited data and later copy the data to the appropriate application buffer. Action 414 is followed by action 416.
  • NIC driver 125 receives headers 10 from NIC 130 and passes headers 10 to the upper layers of protocol stack 124. NIC driver 125 indicates to protocol stack 124 that there is more data to follow. This action causes protocol stack 124 to return the address of the application buffer 123 to NIC driver 125. Protocol stack 124 believes that the requested data has already been received in memory 120's unsolicited buffers (as in a conventional network described above) and proceeds to locate the application associated with headers 10 (e.g., application 122) and return the associated buffer address.
  • application associated with headers 10 e.g., application 122
  • NIC driver 125 when NIC driver 125 receives headers 10 in an unsolicited Fibre Channel frame from NIC 130, NIC driver 125 looks at two fields in a special structure appended to the data. These fields are "LookaheadSize" and "TotalPacketSize.” LookaheadSize is the amount of data in this frame. TotalPacketSize is the total amount of data in the packet including any data still sitting in application buffer 223 of computer 200. If these two fields are equal, NIC driver 125 knows that computer 200 has sent the entire message.
  • small packets e.g., read requests
  • NIC driver 125 a length too small to justify the overhead of setting up a buffer-to-buffer transfer, e.g., 1024 bytes.
  • NIC driver 125 calls NdisMEthlndicateReceive with "LookaheadBufferSize ⁇ PacketSize.” Protocol stack 124 then finds the designated application (e.g., application 122) and obtains a buffer address for the remainder of the data. If the OS is NT, protocol stack 124 passes this address back down to NIC driver 125 by calling MiniportTransferData:
  • the "Packet" parameter in the MiniportTransferData call contains pointers to the destination buffer (e.g., address of application buffer 123) for the data. Action 416 is followed by action 418.
  • NIC driver 125 receives the address of application buffer 123 from application 122 through protocol stack 124 and sets up NIC 130 (more specifically DMA controller 131) as the receiving end of a DMA transfer between computers 100 and 200.
  • Action 418 is followed by action 420.
  • NIC driver 125 causes NIC 130 to transmit a start-transfer signal to NIC 230 to start the DMA transfer.
  • NIC 130 transmits the start-transfer signal as a SCSI "FCP_XFER_RDY" in a Fibre Channel SFS to NIC 230. This allows NIC 230 to use SCSI Assist Hardware 232 for the pending DMA data transfer without invoking host software.
  • Action 420 is followed by action 422.
  • action 422 DMA controllers 231 and
  • DMA controllers 231 and 131 transfer the requested data from application buffer 223 to application buffer 123 in a single DMA burst.
  • DMA controllers 231 and 131 move the requested data from application buffer 223 to application buffer 123 with no intermediate copies and very little processor overhead.
  • the DMA transfer accrues little processor overhead from processors 110 and 210 because NIC drivers 125 and 225 configure the transport layers in respective protocol stacks 124 and 224 to forego conventional checksums. Instead, NICs 130 and 230 rely on the internal Fibre Channel hardware already performing a data integrity check.
  • each 2112 byte Fibre Channel frame includes a 32-bit cyclical redundancy check ("CRC") that detects all one and two bit errors in the frame and most other errors, including all errors over an odd number of bits.
  • CRC cyclical redundancy check
  • NIC driver 125 causes NIC 130 to transmit a status signal to NIC 230 to acknowledge that the requested data has been received.
  • NIC 130 transmits the status signal as SCSI "FCP_RSP" in a Fibre Channel SFS to NIC 230.
  • the status signal causes NIC 230 to drop out of its SCSI Assist Hardware mode and inform NIC driver 225 that the transfer is complete.
  • Action 424 is followed by action 426, which ends method 40.
  • method 40 does not change the programming interface seen by applications accessing the network.
  • application programs in the network computers using this invention see only the conventional programming interface. Since method 40 does not change this interface, method 40 operates identically to legacy networks and transparently to existing applications (except that method 40 provides significantly faster data transfer than conventional networks).
  • FIG. 3 shows a data read between computers 100 (e.g., client) and 200 (e.g., server) from the viewpoint of NICs 130 and 230.
  • NIC 130 sends the SMB read request to NIC 230.
  • NIC 230 sets up the send end of the DMA and sends the first couple of hundred bytes of the SMB read response.
  • NIC 130 gets the destination address from its application and writes it into DMA controller 131.
  • DMA controllers 131 and 231 send the data by DMA from application buffer 223 to application buffer 123.
  • the phases in FIG. 3 include one or more lettered actions A, B, C, D, E, F, and G, which are now explained in further detail.
  • NIC 130 sends the SMB Read request (e.g., "R_SMB") to NIC 230 in a Fibre Channel SFS.
  • the request goes across the network and is put into an SFS Buffer 504 reserved at computer 200 for unsolicited arriving frames.
  • NIC 130 sends an interrupt 506 to NIC driver 125 to indicate that the SFS (e.g., "R_SMB”) has been sent successfully.
  • NIC 230 sends an interrupt 509 to indicate to NIC driver 225 the arrival of the unsolicited SFS.
  • action B NIC 230 passes the SMB read request up to protocol stack 224 to application 222.
  • Application 222 is, for example, an NT server module SRV.
  • NIC 230 passes the SMB read request up to protocol stack 224 to application 222.
  • Application 222 is, for example, an NT server module SRV.
  • NIC 230 passes the SMB read request up to protocol stack 224 to application 222.
  • Application 222 is, for example, an NT server module SRV.
  • the read response that comes back from computer 200 with the data is a completely independent network event.
  • NIC 230 receives an SMB read response (e.g., "R_SMB_RSP + large data") from protocol stack 224.
  • the SMB read response includes the SMB information and pointers to the requested data.
  • NIC 230 sets up to send by DMA the requested data onto media 233 to computer 100.
  • NIC 230 sends headers 10, which includes the SMB read response header (A in FIG. 1) as a SCSI command ("FCP_CMND”) in a Fibre Channel SFS to NIC 130.
  • FCP_CMND SCSI command
  • This "Lookahead information" goes across the network and is put into an SFS Buffer 505 reserved at computer 100 for unsolicited arriving frames.
  • Treating the Lookahead information as a SCSI command allows NIC 230 to invoke SCSI Assist Hardware 232, which avoids host interrupts for the pending DMA transfer.
  • NIC 130 sends an interrupt 507 to NIC driver 125 to indicate the arrival of an unsolicited SFS (e.g., "FCP_CMND").
  • NIC 130 passes headers 10 up to protocol stack 124 with an indication that more data is available (e.g., this is a partial packet where LookaheadBufferSize ⁇ PacketSize).
  • protocol stack 124 has processed headers 10 (e.g., the partial packet) that was passed up and returns the address of the application buffer (e.g., application buffer 123 ) to receive the requested data.
  • NIC 130 sets up a DMA from media 133 to this buffer (e.g., application buffer 123).
  • NIC 130 sends a SCSI signal ("FCP_XFER_RDY") in a Fibre Channel SFS to the waiting NIC 230 to start the DMA transfer.
  • action F NIC 130 and NIC 230 DMA the requested data from application buffer 223 (FIG. 1; e.g., NT cache) to application buffer 123 in a single burst as a SCSI data transfer ("FCP_DATA").
  • NIC 130 then sends an interrupt 511 to NIC driver 125 to indicate the end of the DMA transfer.
  • action G NIC 130 sends a SCSI signal (e.g., "FCP_RSP") to NIC 230 to return status for the buffer-to-buffer DMA transfer.
  • NIC 130 sends an interrupt 508 to NIC driver 125 to indicate that the SCSI signal (e.g., "FCP_RSP") has been successfully sent.
  • NIC 230 sends an interrupt 510 to NIC driver 225 to indicate that it received a SCSI signal (e.g., "FCP_RSP”) from NIC 130, indicating the DMA completed.
  • SCSI signal e.g., "FCP_RSP
  • any network media can benefit from (1) specifying an MTU during boot up greater than or equal to the segment size to avoid fragmentation of the data by the protocol stack, (2) pre-fetching the destination address on the computer receiving the data by sending over just the network headers while the data to send remains on the sending computer, and (3) sending data from the sending computer directly to this destination address (instead of to an intermediate buffer in the receiving computer) thereby avoiding repeatedly copying the data. If the network media supports a buffer-to-buffer DMA transfer, sending the data in the above step 3 reduces to a single DMA burst.
  • method 40 is not platform specific and can work on other platforms such as Linux, other forms of the Unix operating system, Apple operating systems, or any other operating system that allows, or can be modified to allow, the passing up of the headers and the passing down of the buffer address of the application.
  • Unix operating system such as Unix X, Unix Y, Unix X, or any other operating system that allows, or can be modified to allow, the passing up of the headers and the passing down of the buffer address of the application.
  • Fibre Channel may be used as the network media, other network media may be used and benefit from method 40. Such changes and modifications are encompassed by the attached claims.

Abstract

The invention blends Fibre Channel ('FC') hardware with networking software to produce a network that allows network data to be transferred via direct memory access ('DMA') between two application buffers in computers separated by a network. During boot up, the FC network interface card ('NIC') drivers specify MTUs greater or equal to the segment size to the operating system so that data are not segmented into smaller datagrams during a network data write. During the network write, a first FC NIC sets up the send end of the DMA and sends the network headers of the data to a second FC NIC. The second FC NIC passes the network headers up through the protocol stack. The protocol stack locates and passes the application buffer address to the second FC NIC. The second FC NIC sets up the receive end of the DMA and sends a signal to the first FC NIC to start a buffer-to-buffer DMA transfer of the data. At the end of the buffer-to-buffer DMA transfer, the first FC NIC sends a signal to the second FC NIC indicating the status of the transfer. The first and second FC NICs may treat the entire data transfer as a Small Computer System Interface ('SCSI') disk transaction and use existing SCSI Assist Hardware to reduce the involvement of the host software.

Description

NETWORK DMA METHOD
FIELD OF INVENTION
This invention is directed to application buffer-to-buffer transfers over a network, and more particularly to DMA transfer over a network between application buffers using Fibre Channel.
BACKGROUND
Fibre Channel is a data transport mechanism that includes hardware and a multilayer protocol. Fibre Channel is described in "Fibre Channel Physical and Signaling Interface (FC-PH)" (ANSI X3.230-1994) by the American National Standard for Information Systems, which is incorporated by reference in its entirety. Fibre Channel is used today as a communication path between computers and disks. For example, Fibre Channel is used in Storage Area Networks ("SANs"). When Fibre Channel is used as a communication path between computers and disks, the Small Computer System Interface ("SCSI") protocol runs on top of the Fibre Channel protocol so that legacy SCSI drivers can still be used to control the data flow. Since a common use of Fibre Channel protocol is to interpret SCSI commands, Fibre Channel adapter cards often have built-in SCSI Assist Hardware to accelerate this process.
Fibre Channel includes a buffer-to-buffer DMA transfer mechanism. If two computers are connected together with Fibre Channel and the Fibre Channel adapter card in the sending computer is given the address of a sending buffer and the Fibre Channel adapter card in the receiving computer is given the address of a destination buffer, the two adapter cards can transfer data across a Fibre Channel media (e.g., a copper or optical cable) from the sending buffer to the receiving buffer in a single DMA burst. This feature works whether the two nodes are connected point-to-point, through a Fibre Channel hub connecting up to 126 nodes together, or through a series of Fibre Channel switches connecting up to 16 million nodes together. When used to connect computers to disks, the disk hardware serves as one of the computers and. the buffer-to-buffer DMA transfer simply moves data between an application buffer in the computer and a buffer in the disk. The SCSI Assist Hardware in Fibre Channel adapter cards accelerates the common SCSI disk transactions. SCSI Assist Hardware lets the host driver place the SCSI command containing the SCSI disk request into the card hardware and relieves the host computer from being interrupted until the data has been transferred and the response phase of the SCSI operation completes. Thus, SCSI Assist Hardware allows a Fibre Channel adapter card to execute the SCSI command phase, the SCSI data phase, and the SCSI response phase without interrupting the host computer.
Networks today communicate by breaking application data into smaller units, called datagrams. Each datagram is sent across the network as a separate unit. Breaking long messages into smaller network units is done to share the network resource so that a long message does not dominate the bandwidth.
Network applications uses a protocol stack to interface the application to the physical network. FIG. 1 shows the layers of a conventional protocol stack based on the Open System Interconnection ("OSI") Seven Layer Reference Model. FIG. 1 compacts layers 5-7 into a single Application layer for ease of reference in relation to the present disclosure. "Application" in this disclosure refers to any program residing above the transport layer, including software that services network requests for file ■ data, such as the SRN server module in the Windows NT operating system.
The transport layer (e.g., Transmission Control Protocol, or "TCP") provides to an application in a local computer a "virtual circuit" that connects the application to an application in a remote computer even where the remote computer is halfway around the world. The transport layer maintains this virtual circuit even though the physical network may frequently lose data.
The transport layer breaks the application data into "segments" that it gives to the network layer. Segments created by the transport layer may be up to 64 Kbytes. Segments which are not acknowledged by the transport layer on the destination computer are resent.
The application data given to the transport layer may have its own application header A (FIG. 1) describing the data. File transfers under Windows NT® ("NT") for example, have a Server Message Block ("SMB") header placed before the data. The application may divide the data into units smaller than 64 Kbytes. The file server software SRN that handles remote requests for files in NT, for example, breaks data into units of about 60 Kbytes. The transport layer adds its own header T (FIG. 1) and passes the segment down to the network layer.
The transport process that creates a virtual circuit requires an acknowledge signal ("ACK") back from the final destination for the data sent. If a specified number of ACKs is not received, the transport layer on the sending side stops sending data. If the missing ACKs are not received in a predetermined time, the data is resent. The transport layer, thus, implements both a flow-control mechanism and an error-control mechanism.
The network layer (e.g., Internet Protocol, or "IP") breaks the transport segment into datagrams that will fit in the Maximum Transfer Unit (MTU) of the network, which is 1500 bytes for an Ethernet physical layer. The network layer then attempts to move each of these MTU-size datagrams through the network to the destination. The network layer gives each of these 1500-byte datagrams a network header N (FIG. 1) containing the address of the final destination node. The network layer also adds a Media Access ("MAC") address to each datagram before passing it down to the data link layer. The MAC address is the physical address, of the very next node in the network path. As the datagram makes its way through the network toward its final destination, the MAC address is replaced at each hop with the address of the next node on the route.
The data link layer instructs the network interface card ("NIC") to move the datagram fragment over the physical network to the next node. The data link layer includes the NIC drivers. As FIG. 1 shows, as the application data moves down the protocol stack, it accumulates headers 10. At the data link layer, the first few hundred bytes of the final datagram contain all of headers 10.
The description above for the transport, network, and data link layers applies equally to a Wide Area Network (WAN) that could span the entire globe and pass through numerous routers, as to a local area network (LAN) where the nodes may all be in the same building. In a LAN, each node is often just one hop away. That is, the MAC address also points to the final destination. In a conventional network, a read operation can be seen as a write of the read request by a client computer to a server, followed by a write back of the data by the server to the client computer. For example, when a client computer wants to read data from a remote server, the client computer writes a request to the server asking for certain file data. The network is then quiescent with no state maintained about the read operation. When the server locates the data, it writes the data back to the client computer.
In the write back operation, the transport layer sets up a virtual circuit to the application in the destination computer, or uses a virtual circuit that already exists to this application, and passes a segment of data to the network layer. For example, if the application is a remote NT file server, the software in the NT server is SRV. After receiving the request for file data, the server locates and returns the data. The application source buffer in this case is most likely the cache in the NT server. If the data is already in cache, the cache serves the data directly. If the data is not in the cache, NT reads the data into cache before satisfying the network request.
As discussed above, the network layer fragments the segment into MTU-size datagrams which are passed to the data link layer. Since each datagram is a separate entity that may take a different route through the network, the datagrams could arrive at the destination in a different order than they were sent. Because of the possibility of receiving datagrams out of order, the receiving layers below the transport layer in the destination computer buffer and reorder the datagram fragments, if necessary, before passing them to the upper layers. While the chance of datagrams arriving out of order is small on a LAN, LAN datagrams are processed the same way as WAN datagrams.
Another reason buffering is required at the receiver is that the datagrams in a conventional network are unsolicited, i.e. the receiving network hardware does not know yet the final destinations for the data in the datagrams. The receiving node puts the unsolicited datagrams into a temporary buffer until the final application buffer is found, at which time the data is copied from the temporary buffer to the application buffer. Thus, the receiver buffering moves the data received twice.
Because of the unreliable physical network, the transport layer uses a
"checksum" in one of the fields of the transport header T (FIG. 1). The checksum is recalculated at the receiving end as the data arrives and compared with the checksum sent. Computing checksum is a large network overhead.
On the receiving side, there are two conventional ways to handle arriving datagrams. The first puts each datagram into a temporary buffer reserved for unsolicited transmissions, reorders the datagrams as necessary, and passes them up to the protocol stack where they are copied to the application buffer. Alternatively, the first datagram received is passed up while succeeding datagrams are placed in temporary buffers. This first datagram contains headers 10, so the upper layers can locate the designated application. The application then passes down an application buffer address and the data link layer begins copying the buffered data to this address, reordering datagrams as necessary. In both cases above, the arriving data is first put into a temporary buffer and later copied to the application buffer.
SUMMARY
In one embodiment, a method for transferring data over a network includes specifying a Maximum Transfer Unit ("MTU") greater or equal to the segment size, sending the network headers of an application data over the network, receiving a start- transfer signal indicating that the destination application buffer is ready to receive application data over the network, and sending the application data from the first application buffer to the second application buffer over the network. In one implementation, the network includes a Fibre Channel network. In another implementation, the network includes any network media that allows buffer-to-buffer direct memory access ("DMA") transfers of data. In yet another implementation, the sending of the network headers, the receiving of the start-transfer signal, the sending of the application data, and the receiving of the transfer status are accomplished using a single hardware SCSI exchange.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates the OSI Seven Layer Reference Model.
FIG. 2A illustrates a network in accordance with one embodiment of the present invention. FIG. 2B illustrates a method for buffer-to-buffer transfer over the network of FIG. 2 A in accordance with one embodiment of the present invention.
FIG. 3 illustrates a data read process of the method in FIG. 2B.
The use of the same reference symbols in different drawings indicates similar or identical items.
DETAILED DESCRIPTION
FIG. 2A illustrates a network for transferring data between application buffers 123 and 223 of computers 100 and 200. Computer 100 includes processor 110, memory 120, and network interface card ("NIC") 130 all coupled to peripheral component interconnect ("PCI") bus 140. Computer 100 is, for example, a client computer such as a "white box" computer using an ASUS P2B mother board with 400MHz Pentium II processor. Memory 120 includes an operating system ("OS") 121, application 122, application buffer 123, a protocol stack 124, and NIC driver 125. OS 121 is, for example, Windows NT® Workstation from Microsoft® Corporation of Redmond, Washington. NIC 130 includes direct memory access controller ("DMA") 131 and Small Computer System Interface ("SCSI") Assist Hardware 132. NIC 130 is coupled to Fibre Channel cable 133. NIC 130 is, for example, an HHBA-5100A Tachyon TL Fibre Channel Adapter Card available from Agilent Technologies Inc. of Palo Alto, California. Cable 133 is, for example, standard 62.5 micron multi-mode fiber optic cable used commonly with Gigabit Ethernet and Fibre Channel.
Computer 200 includes processor 210, memory 220, and network interface card ("NIC") 230 all coupled to peripheral component interconnect ("PCI") bus 240. Computer 200 is, for example, a server computer such as a Dell 6300 PowerEdge. Memory 220 includes operating system ("OS") 221, application 222, application buffer 223, protocol stack 224, and NIC driver 225. OS 221 is, for example, Windows NT® Server from Microsoft® Corporation of Redmond, Washington. NIC 230 includes direct memory access controller ("DMA") 231 and Small Computer System Interface ("SCSI") Assist Hardware 232. NIC 230 is coupled to a Fibre Channel media 233. NIC 230 is, for example, an HHBA-5100A Tachyon TL Fibre Channel Adapter Card available from Agilent Technologies Inc. of Palo Alto, California. Media 233 is for example, standard 62.5 micron multi-mode fiber optic cable used commonly with Gigabit Ethernet and Fibre Channel.
Hub or switch 300 couples cables 133 and 233. Hub/switch 300 is, for example, an LH5000 Digital Fibre Hub from Emulex of Costa Mesa, California.
FIG. 2B illustrates a method 40 for transferring data between application buffer
123 of computer 100 and application buffer 223 of computer 200. Method 40 starts in action 400. Action 400 is followed by action 402. In action 402, NIC drivers 125 and 225 specify a maximum transfer unit ("MTU") greater or equal to the segment size to protocol stacks 124 and 224, respectively, during boot up.
Several bottlenecks in the conventional network transfer described earlier are removed when NIC drivers 125 and 225 specify MTU greater than the segment size. For example, Ethernet requires the network layer in the protocol stack to fragment each 64 Kbyte segment into 40 or more small datagrams because Ethernet has an MTU of 1500 bytes. NIC drivers 125 and 225 overcome this fragmentation by specifying an MTU during system boot up that is large enough for an entire segment. While Ethernet cannot accommodate an MTU this large, Fibre Channel can. All of the segment data from the transport layer of protocol stack 124 and 224 are therefore submitted directly to respective NIC drivers 125 and 225. NIC drivers 125 and 225 can thereafter transfer the complete segment in one DMA burst using buffer-to-buffer mechanisms (described later) in respective NICs 130 and 230. Furthermore, the data received does not need to be saved in a temporary buffer for possible reordering as the Fibre Channel hardware guarantees in-order delivery of the DMA burst.
In one implementation, NICs 130 and 230 use a Fibre Channel frame of 2112 bytes. However, this frame size limitation is not visible outside of NICs 130 and 230. Thus, the MTU specified by NIC drivers 125 and 255 during boot up is not limited by the Fibre Channel frame size. Action 402 is followed by action 404.
In action 404, NIC driver 125 receives a read request from application 122 through protocol stack 124. The read request is a request for data stored in computer 200. Action 404 is followed by action 406. In action 406, NIC driver 125 causes NIC 130 to transmit the read request to NIC 230. In one implementation, the read request is a server message block ("SMB") read request and NIC 130 transmits the SMB read request as a Fibre Channel single frame sequence (SFS) to NIC 230. In one variation, NIC 130 transmits the read request to NIC 230 over cable 133 and cable 233 through hub/switch 300. Action 406 is followed by action 408.
In action 408, NIC driver 225 receives the read request from NIC driver 230 and passes the read request to application 222 through protocol stack 224. Action 408 is followed by action 410. In action 410, application 222 locates the requested data. The requested data may be located on a hard disk or in application buffer 223 (also known as "cache") in memory 220. Action 410 is followed by action 412.
In action 412, NIC driver 225 receives the buffer address (e.g., address of application buffer 223) of the requested data from application 222 through protocol stack 224 and sets up NIC 230 (more specifically, DMA controller 231) as the transmitting end of a DMA transfer between application buffer 223 in computer 200 and some buffer, as yet unknown, in computer 100. Action 412 is followed by action 414. In action 414, NIC driver 225 causes NIC 230 to transmit headers 10 (FIG. 1) of the requested data to NIC 130. In one implementation, NIC 230 transmits headers 10 to NIC 130 as a SCSI command ("FCP_CMND") in a Fibre Channel SFS. This allows NIC 230 to use SCSI Assist Hardware 232 for the pending DMA data transfer without invoking host software.
Contrary to a conventional network discussed earlier, NIC driver 225 does not continue sending requested data after sending the headers. Instead, NIC driver 225 sets up the sending end of a DMA transfer from the application buffer address received, sends one frame of a couple of hundred bytes containing all of headers 10 (FIG. 1), and then waits for NIC driver 125 to obtain the destination application buffer address from headers 10 and set up the receiving end of the DMA transfer. Thus, other than the headers, NIC driver 225 does not transmit unsolicited data (data without destination buffer address) to computer 100 and cause computer 100 to store the requested data in buffers reserved for unsolicited data and later copy the data to the appropriate application buffer. Action 414 is followed by action 416.
In action 416, NIC driver 125 receives headers 10 from NIC 130 and passes headers 10 to the upper layers of protocol stack 124. NIC driver 125 indicates to protocol stack 124 that there is more data to follow. This action causes protocol stack 124 to return the address of the application buffer 123 to NIC driver 125. Protocol stack 124 believes that the requested data has already been received in memory 120's unsolicited buffers (as in a conventional network described above) and proceeds to locate the application associated with headers 10 (e.g., application 122) and return the associated buffer address.
In one implementation, when NIC driver 125 receives headers 10 in an unsolicited Fibre Channel frame from NIC 130, NIC driver 125 looks at two fields in a special structure appended to the data. These fields are "LookaheadSize" and "TotalPacketSize." LookaheadSize is the amount of data in this frame. TotalPacketSize is the total amount of data in the packet including any data still sitting in application buffer 223 of computer 200. If these two fields are equal, NIC driver 125 knows that computer 200 has sent the entire message. In this case, if the OS is NT, NIC driver 125 passes the packet up to protocol stack 124 by calling "NdisMEthlndicateReceive" (described below) with "LookaheadBufferSize = Packets ize." This tells protocol stack 124 that the entire packet is being indicated up at this time.
NdisMEthIndicateReceive(
MiniportAdapterHandle, MiniportReceiveContext, HeaderBuffer, HeaderBufferSize,
LookaheadBuffer, LookaheadBufferSize, PacketSize
); Thus, small packets (e.g., read requests) are sent between computer 100 and computer 200 without buffer-to-buffer transfers. "Small" is defined by NIC driver 125 as a length too small to justify the overhead of setting up a buffer-to-buffer transfer, e.g., 1024 bytes.
If LookaheadSize is less than TotalPacketSize in the special structure appended to the data, NIC driver 125 calls NdisMEthlndicateReceive with "LookaheadBufferSize < PacketSize." Protocol stack 124 then finds the designated application (e.g., application 122) and obtains a buffer address for the remainder of the data. If the OS is NT, protocol stack 124 passes this address back down to NIC driver 125 by calling MiniportTransferData:
MiniportTransferData(
Packet,
BytesTransferred,
MiniportAdapterContext,
MiniportReceiveContext,
ByteOffset,
BytesToTransfer
);
The "Packet" parameter in the MiniportTransferData call contains pointers to the destination buffer (e.g., address of application buffer 123) for the data. Action 416 is followed by action 418.
In action 418, NIC driver 125 receives the address of application buffer 123 from application 122 through protocol stack 124 and sets up NIC 130 (more specifically DMA controller 131) as the receiving end of a DMA transfer between computers 100 and 200. Action 418 is followed by action 420. In action 420, NIC driver 125 causes NIC 130 to transmit a start-transfer signal to NIC 230 to start the DMA transfer. In one implementation, NIC 130 transmits the start-transfer signal as a SCSI "FCP_XFER_RDY" in a Fibre Channel SFS to NIC 230. This allows NIC 230 to use SCSI Assist Hardware 232 for the pending DMA data transfer without invoking host software.
Action 420 is followed by action 422. In action 422, DMA controllers 231 and
131 transfer the requested data from application buffer 223 to application buffer 123 in a single DMA burst. DMA controllers 231 and 131 move the requested data from application buffer 223 to application buffer 123 with no intermediate copies and very little processor overhead. In one implementation, the DMA transfer accrues little processor overhead from processors 110 and 210 because NIC drivers 125 and 225 configure the transport layers in respective protocol stacks 124 and 224 to forego conventional checksums. Instead, NICs 130 and 230 rely on the internal Fibre Channel hardware already performing a data integrity check. For example, each 2112 byte Fibre Channel frame includes a 32-bit cyclical redundancy check ("CRC") that detects all one and two bit errors in the frame and most other errors, including all errors over an odd number of bits. Action 420 is followed by action 424.
In action 424, NIC driver 125 causes NIC 130 to transmit a status signal to NIC 230 to acknowledge that the requested data has been received. In one implementation, NIC 130 transmits the status signal as SCSI "FCP_RSP" in a Fibre Channel SFS to NIC 230. The status signal causes NIC 230 to drop out of its SCSI Assist Hardware mode and inform NIC driver 225 that the transfer is complete. Action 424 is followed by action 426, which ends method 40.
As described above, method 40 does not change the programming interface seen by applications accessing the network. Thus, application programs in the network computers using this invention see only the conventional programming interface. Since method 40 does not change this interface, method 40 operates identically to legacy networks and transparently to existing applications (except that method 40 provides significantly faster data transfer than conventional networks).
FIG. 3 shows a data read between computers 100 (e.g., client) and 200 (e.g., server) from the viewpoint of NICs 130 and 230. In phase 1, NIC 130 sends the SMB read request to NIC 230. In phase 2, NIC 230 sets up the send end of the DMA and sends the first couple of hundred bytes of the SMB read response. NIC 130 gets the destination address from its application and writes it into DMA controller 131. In phase 3, DMA controllers 131 and 231 send the data by DMA from application buffer 223 to application buffer 123. The phases in FIG. 3 include one or more lettered actions A, B, C, D, E, F, and G, which are now explained in further detail.
In phase 1, action A, NIC 130 sends the SMB Read request (e.g., "R_SMB") to NIC 230 in a Fibre Channel SFS. The request goes across the network and is put into an SFS Buffer 504 reserved at computer 200 for unsolicited arriving frames. NIC 130 sends an interrupt 506 to NIC driver 125 to indicate that the SFS (e.g., "R_SMB") has been sent successfully. NIC 230 sends an interrupt 509 to indicate to NIC driver 225 the arrival of the unsolicited SFS.
In phase 1, action B, NIC 230 passes the SMB read request up to protocol stack 224 to application 222. Application 222 is, for example, an NT server module SRV. At the completion of action B, there is no state information remaining in the network regarding the read operation. The read response that comes back from computer 200 with the data is a completely independent network event.
In phase 2, action C, NIC 230 receives an SMB read response (e.g., "R_SMB_RSP + large data") from protocol stack 224. The SMB read response includes the SMB information and pointers to the requested data. NIC 230 sets up to send by DMA the requested data onto media 233 to computer 100. NIC 230 sends headers 10, which includes the SMB read response header (A in FIG. 1) as a SCSI command ("FCP_CMND") in a Fibre Channel SFS to NIC 130. This "Lookahead information" goes across the network and is put into an SFS Buffer 505 reserved at computer 100 for unsolicited arriving frames. Treating the Lookahead information as a SCSI command allows NIC 230 to invoke SCSI Assist Hardware 232, which avoids host interrupts for the pending DMA transfer. NIC 130 sends an interrupt 507 to NIC driver 125 to indicate the arrival of an unsolicited SFS (e.g., "FCP_CMND").
In phase 2, action D, NIC 130 passes headers 10 up to protocol stack 124 with an indication that more data is available (e.g., this is a partial packet where LookaheadBufferSize < PacketSize).
In phase 3, action E, protocol stack 124 has processed headers 10 (e.g., the partial packet) that was passed up and returns the address of the application buffer (e.g., application buffer 123 ) to receive the requested data. NIC 130 sets up a DMA from media 133 to this buffer (e.g., application buffer 123). NIC 130 sends a SCSI signal ("FCP_XFER_RDY") in a Fibre Channel SFS to the waiting NIC 230 to start the DMA transfer.
In phase 3, action F, NIC 130 and NIC 230 DMA the requested data from application buffer 223 (FIG. 1; e.g., NT cache) to application buffer 123 in a single burst as a SCSI data transfer ("FCP_DATA"). NIC 130 then sends an interrupt 511 to NIC driver 125 to indicate the end of the DMA transfer. In phase 3, action G, NIC 130 sends a SCSI signal (e.g., "FCP_RSP") to NIC 230 to return status for the buffer-to-buffer DMA transfer. NIC 130 sends an interrupt 508 to NIC driver 125 to indicate that the SCSI signal (e.g., "FCP_RSP") has been successfully sent. NIC 230 sends an interrupt 510 to NIC driver 225 to indicate that it received a SCSI signal (e.g., "FCP_RSP") from NIC 130, indicating the DMA completed.
Although the present disclosure describes the use of Fibre Channel technology as the network media, one skilled in the art recognizes that the disclosed methods could benefit any network media, including Ethernet. Specifically, any network media can benefit from (1) specifying an MTU during boot up greater than or equal to the segment size to avoid fragmentation of the data by the protocol stack, (2) pre-fetching the destination address on the computer receiving the data by sending over just the network headers while the data to send remains on the sending computer, and (3) sending data from the sending computer directly to this destination address (instead of to an intermediate buffer in the receiving computer) thereby avoiding repeatedly copying the data. If the network media supports a buffer-to-buffer DMA transfer, sending the data in the above step 3 reduces to a single DMA burst.
Numerous modifications and adaptations of the embodiments described herein will be apparent to the skilled artisan in view of the disclosure. For example, method 40 is not platform specific and can work on other platforms such as Linux, other forms of the Unix operating system, Apple operating systems, or any other operating system that allows, or can be modified to allow, the passing up of the headers and the passing down of the buffer address of the application. As previously discussed, although Fibre Channel may be used as the network media, other network media may be used and benefit from method 40. Such changes and modifications are encompassed by the attached claims.

Claims

CLAIMSI CLAIM:
1. A method for transferring data over a network, comprising the acts of:
specifying an MTU greater than or equal to the segment size of a first computer;
sending, by the first computer to a second computer over the network, headers of a data located in a first application buffer in the first computer;
receiving, by the first computer from the second computer over the network, a start-transfer signal indicating that the second computer is ready to receive the data in a second application buffer in the second computer; and
sending, by the first computer to the second computer over the network, the data from the first application buffer to the second application buffer.
2. The method of Claim 1 , wherein the network comprises a Fibre Channel network.
3. The method of Claim 1 , wherein the network comprises network hardware that allows buffer-to-buffer DMA transfer of data.
4. The method of Claim 1 , wherein said sending of the data comprises a buffer-to- buffer DMA transfer between the first application buffer and the second application buffer without intermediate copies.
5. The method of Claim 1, wherein the sending of the headers comprises transmitting the headers as a SCSI command of FCP_CMND.
6. The method of Claim 1, wherein the receiving of the start-transfer signal comprises accepting the start-transfer signal as a SCSI signal of FCP_XFER_RDY.
7. The method of Claim 6, further comprising utilizing a SCSI assist hardware of a network interface card in the first computer to process the start-transfer signal without involving a host processor of the first computer.
8. The method of Claim 1, wherein said sending of the data comprises transmitting the data as a SCSI data.
9. The method of Claim 1 , further comprising receiving, by the first computer from the second computer over the network, a status signal indicating the data has been received.
10. The method of Claim 9, wherein said receiving the status signal comprises receiving the status signal as a SCSI signal of FCP RSP.
11. A method for data transfer over a network, comprising the acts of:
receiving headers for a data from a first computer over the network;
sending the headers up a protocol stack of a second computer;
indicating to the protocol stack that more data is to follow;
receiving an application buffer address for storing the data; and
sending a start-transfer signal for transmission of the data to the first computer over the network.
12. The method of Claim 11 , wherein the network comprises a Fibre Channel network.
13. The method of Claim 11 , wherein the network comprises network hardware that allows buffer-to-buffer DMA transfer of data.
14. The method of Claim 11 , further comprising the act of receiving the data in a buffer-to-buffer DMA transfer between a first application buffer in the first computer and a second application buffer in the second computer without intermediate copies, the second application buffer corresponding to the application buffer address.
15. The method of Claim 11 , wherein said receiving the headers comprises accepting the headers as a SCSI command of FCP_CMND.
16. The method of Claim 11 , wherein said sending the start-transfer signal comprises transmitting start-transfer signal as a SCSI signal of FCP__XFER_RDY.
17. The method of Claim 11 , further comprising receiving the data by the second computer from the first computer over the network.
18. The method of Claim 17, wherein said receiving the data comprises accepting the data as a SCSI data.
19. The method of Claim 17, further comprising sending a status signal to the first computer indicating receiving the data by the second computer over the network.
20. The method of Claim 19, wherein said sending the status signal comprises transmitting the status signal as a SCSI signal of FCP_RSP.
21. A computer read computer-readable medium carrying a program for data transfer comprising:
a first instruction to specify an MTU greater than or equal to the segment size of a first computer;
a second instruction to send, by the first computer to a second computer over the network, headers of a data located in a first application buffer in the first computer;
a third instruction to set up a hardware of the first computer to receive from the second computer over the network a start-transfer signal indicating that the second computer is ready to receive the data in a second application buffer in the second computer; and
a fourth instruction to send, by the first computer to the second computer over the network, the data from the first application buffer to the second application buffer.
22. The medium of Claim 21 , wherein the network comprises a Fibre Channel network.
23. The medium of Claim 21 , wherein the network comprises network hardware that allows buffer-to-buffer DMA transfer of data.
24. The medium of Claim 21 , wherein the hardware is a SCSI assist hardware.
25. The medium of Claim 21 , wherein the second instruction comprises a fifth instruction to transmit the headers as a SCSI command of FCP_CMND.
26. The medium of Claim 21 , further comprising a fifth instruction to utilize a SCSI assist hardware of a network interface card in the first computer to process the start- transfer signal without involving a host processor of the first computer.
27. The medium of Claim 21 , wherein the fourth instruction comprises a fifth instruction to transmit the data as a SCSI data.
28. The medium of Claim 21 , further comprising a fifth instruction to receive, by the first computer from the second computer over the network, a status signal indicating the data has been received.
29. The medium of Claim 28, wherein the fifth instruction comprises a sixth instruction to receive the status signal as a SCSI signal of FCP RSP.
30. A computer-readable medium carrying a program for transferring data comprising:
a first instruction to receive headers for a data from a first computer over the network;
a second instruction to send the headers up a protocol stack of a second computer;
a third instruction to indicate to the protocol stack that more data is to follow;
a fourth instruction to receive an application buffer address for storing the data; and
a fifth instruction to send a start-transfer signal for transmission of the data to the first computer over the network.
31. The medium of Claim 30, wherein the network comprises a Fibre Channel network.
32. The medium of Claim 30, wherein the network comprises network hardware that allows buffer-to-buffer DMA transfer of data.
33. The medium of Claim 30, further comprising a sixth instruction to receive the data in a buffer-to-buffer DMA transfer between a first application buffer in the first computer and a second application buffer in the second computer without intermediate copies, the second application buffer corresponding to the application buffer address.
34. The medium of Claim 30, wherein the first instruction comprises a sixth instruction to accept the headers as a SCSI command of FCP_CMND.
35. The medium of Claim 30, wherein the fifth instruction comprises a sixth instruction to transmit the start-transfer signal as a SCSI command of
FCP_XFER_RDY.
36. The medium of Claim 30, further comprising a sixth instruction to receive the data by the second computer from the first computer over the network.
37. The medium of Claim 36, wherein the sixth instruction comprises a seventh instruction to accept the data as a SCSI data.
38. The medium of Claim 36, further comprising a seventh instruction to send a status signal to the first computer indicating receiving the data by the second computer over the network.
39. The medium of Claim 38, wherein the seventh instruction comprises an eighth instruction to transmit the status signal as a SCSI signal of FCP RSP.
PCT/US2001/009125 2000-03-30 2001-03-21 Network dma method WO2001075621A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001249337A AU2001249337A1 (en) 2000-03-30 2001-03-21 Network dma method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/539,229 US6775693B1 (en) 2000-03-30 2000-03-30 Network DMA method
US09/539,229 2000-03-30

Publications (1)

Publication Number Publication Date
WO2001075621A1 true WO2001075621A1 (en) 2001-10-11

Family

ID=24150350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/009125 WO2001075621A1 (en) 2000-03-30 2001-03-21 Network dma method

Country Status (3)

Country Link
US (1) US6775693B1 (en)
AU (1) AU2001249337A1 (en)
WO (1) WO2001075621A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372914A (en) * 2001-02-28 2002-09-04 3Com Corp Network interface that places bulk data directly into application buffers

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5941972A (en) 1997-12-31 1999-08-24 Crossroads Systems, Inc. Storage router and method for providing virtual local storage
USRE42761E1 (en) 1997-12-31 2011-09-27 Crossroads Systems, Inc. Storage router and method for providing virtual local storage
US6842798B1 (en) * 2000-12-20 2005-01-11 Adaptec, Inc. Method and structure for supporting flow control by a SCSI target during the data out phase of the packetized SCSI protocol
US7016942B1 (en) * 2002-08-05 2006-03-21 Gary Odom Dynamic hosting
US7346701B2 (en) * 2002-08-30 2008-03-18 Broadcom Corporation System and method for TCP offload
US7397768B1 (en) 2002-09-11 2008-07-08 Qlogic, Corporation Zone management in a multi-module fibre channel switch
US6928524B2 (en) * 2002-12-05 2005-08-09 International Business Machines Corporation Data processing system with naked cache line write operations
US8417852B2 (en) * 2003-06-05 2013-04-09 Nvidia Corporation Uploading TCP frame data to user buffers and buffers in system memory
US7420931B2 (en) * 2003-06-05 2008-09-02 Nvidia Corporation Using TCP/IP offload to accelerate packet filtering
US7646767B2 (en) 2003-07-21 2010-01-12 Qlogic, Corporation Method and system for programmable data dependant network routing
US7234101B1 (en) 2003-08-27 2007-06-19 Qlogic, Corporation Method and system for providing data integrity in storage systems
US20050240727A1 (en) * 2004-04-23 2005-10-27 Shishir Shah Method and system for managing storage area networks
US7930377B2 (en) 2004-04-23 2011-04-19 Qlogic, Corporation Method and system for using boot servers in networks
US7669190B2 (en) 2004-05-18 2010-02-23 Qlogic, Corporation Method and system for efficiently recording processor events in host bus adapters
US7577772B2 (en) * 2004-09-08 2009-08-18 Qlogic, Corporation Method and system for optimizing DMA channel selection
US20060064531A1 (en) * 2004-09-23 2006-03-23 Alston Jerald K Method and system for optimizing data transfer in networks
US7676611B2 (en) 2004-10-01 2010-03-09 Qlogic, Corporation Method and system for processing out of orders frames
US7164425B2 (en) * 2004-12-21 2007-01-16 Qlogic Corporation Method and system for high speed network application
US7392437B2 (en) * 2005-01-20 2008-06-24 Qlogic, Corporation Method and system for testing host bus adapters
US7231480B2 (en) * 2005-04-06 2007-06-12 Qlogic, Corporation Method and system for receiver detection in PCI-Express devices
CN100399311C (en) * 2005-07-28 2008-07-02 财团法人工业技术研究院 Direct internal memory access system for interface of internet small-sized computer system
US7461195B1 (en) 2006-03-17 2008-12-02 Qlogic, Corporation Method and system for dynamically adjusting data transfer rates in PCI-express devices
US7836220B2 (en) * 2006-08-17 2010-11-16 Apple Inc. Network direct memory access
US7721018B2 (en) * 2006-08-24 2010-05-18 Microchip Technology Incorporated Direct memory access controller with flow control
US8509255B2 (en) * 2007-06-26 2013-08-13 International Business Machines Corporation Hardware packet pacing using a DMA in a parallel computer
US8032892B2 (en) * 2007-06-26 2011-10-04 International Business Machines Corporation Message passing with a limited number of DMA byte counters
US8010875B2 (en) 2007-06-26 2011-08-30 International Business Machines Corporation Error correcting code with chip kill capability and power saving enhancement
US8756350B2 (en) 2007-06-26 2014-06-17 International Business Machines Corporation Method and apparatus for efficiently tracking queue entries relative to a timestamp
US8468416B2 (en) 2007-06-26 2013-06-18 International Business Machines Corporation Combined group ECC protection and subgroup parity protection
US7886084B2 (en) 2007-06-26 2011-02-08 International Business Machines Corporation Optimized collectives using a DMA on a parallel computer
US8140925B2 (en) 2007-06-26 2012-03-20 International Business Machines Corporation Method and apparatus to debug an integrated circuit chip via synchronous clock stop and scan
US7827391B2 (en) 2007-06-26 2010-11-02 International Business Machines Corporation Method and apparatus for single-stepping coherence events in a multiprocessor system under software control
US8230433B2 (en) 2007-06-26 2012-07-24 International Business Machines Corporation Shared performance monitor in a multiprocessor system
US8103832B2 (en) * 2007-06-26 2012-01-24 International Business Machines Corporation Method and apparatus of prefetching streams of varying prefetch depth
US7802025B2 (en) 2007-06-26 2010-09-21 International Business Machines Corporation DMA engine for repeating communication patterns
US7793038B2 (en) 2007-06-26 2010-09-07 International Business Machines Corporation System and method for programmable bank selection for banked memory subsystems
US7877551B2 (en) * 2007-06-26 2011-01-25 International Business Machines Corporation Programmable partitioning for high-performance coherence domains in a multiprocessor system
US8108738B2 (en) 2007-06-26 2012-01-31 International Business Machines Corporation Data eye monitor method and apparatus
US8458282B2 (en) 2007-06-26 2013-06-04 International Business Machines Corporation Extended write combining using a write continuation hint flag
US7984448B2 (en) * 2007-06-26 2011-07-19 International Business Machines Corporation Mechanism to support generic collective communication across a variety of programming models
US8069317B2 (en) * 2007-10-12 2011-11-29 International Business Machines Corporation Providing and utilizing high performance block storage metadata
US9141477B2 (en) * 2007-10-12 2015-09-22 International Business Machines Corporation Data protection for variable length records by utilizing high performance block storage metadata
KR20100112151A (en) * 2008-01-10 2010-10-18 스미토모덴코 네트웍스 가부시키가이샤 Network card and information processor
US8230317B2 (en) * 2008-04-09 2012-07-24 International Business Machines Corporation Data protection method for variable length records by utilizing high performance block storage metadata
US8190832B2 (en) 2009-01-29 2012-05-29 International Business Machines Corporation Data storage performance enhancement through a write activity level metric recorded in high performance block storage metadata
US8054848B2 (en) * 2009-05-19 2011-11-08 International Business Machines Corporation Single DMA transfers from device drivers to network adapters
DE102012205988A1 (en) * 2012-04-12 2013-10-17 Robert Bosch Gmbh Subscriber station for a bus system and method for transmitting messages between subscriber stations of a bus system
US10275375B2 (en) * 2013-03-10 2019-04-30 Mellanox Technologies, Ltd. Network interface controller with compression capabilities
CN104063344B (en) * 2014-06-20 2018-06-26 华为技术有限公司 A kind of method and network interface card for storing data
US9846576B2 (en) * 2014-12-27 2017-12-19 Intel Corporation Technologies for reprogramming network interface cards over a network
US9749723B2 (en) * 2015-03-05 2017-08-29 Huawei Technologies Co., Ltd. System and method for optical network
US10310760B1 (en) 2018-05-21 2019-06-04 Pure Storage, Inc. Layering communication fabric protocols

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0657824A1 (en) * 1993-11-05 1995-06-14 Advanced Micro Devices, Inc. Apparatus for Ethernet packet reception
WO1999016177A2 (en) * 1997-09-24 1999-04-01 Emulex Corporation Communication processor having buffer list modifier control bits

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5325401A (en) * 1992-03-13 1994-06-28 Comstream Corporation L-band tuner with quadrature downconverter for PSK data applications
US6327613B1 (en) * 1998-01-12 2001-12-04 Adaptec, Inc. Method and apparatus for sharing peripheral devices over a network
US6310884B1 (en) * 1998-05-21 2001-10-30 Lsi Logic Corporation Data transfer method and apparatus that allocate storage based upon a received relative offset
US6493750B1 (en) * 1998-10-30 2002-12-10 Agilent Technologies, Inc. Command forwarding: a method for optimizing I/O latency and throughput in fibre channel client/server/target mass storage architectures
US6549934B1 (en) * 1999-03-01 2003-04-15 Microsoft Corporation Method and system for remote access to computer devices via client managed server buffers exclusively allocated to the client
US6570849B1 (en) * 1999-10-15 2003-05-27 Tropic Networks Inc. TDM-quality voice over packet

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0657824A1 (en) * 1993-11-05 1995-06-14 Advanced Micro Devices, Inc. Apparatus for Ethernet packet reception
WO1999016177A2 (en) * 1997-09-24 1999-04-01 Emulex Corporation Communication processor having buffer list modifier control bits

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372914A (en) * 2001-02-28 2002-09-04 3Com Corp Network interface that places bulk data directly into application buffers
GB2372914B (en) * 2001-02-28 2003-12-24 3Com Corp Direct data placement and message reassembly
US6779056B2 (en) 2001-02-28 2004-08-17 3Com Corporation Direct data placement and message reassembly

Also Published As

Publication number Publication date
US6775693B1 (en) 2004-08-10
AU2001249337A1 (en) 2001-10-15

Similar Documents

Publication Publication Date Title
US6775693B1 (en) Network DMA method
US7818362B2 (en) Split socket send queue apparatus and method with efficient queue flow control, retransmission and sack support mechanisms
US7734720B2 (en) Apparatus and system for distributing block data on a private network without using TCP/IP
US6724762B2 (en) System and method for implementing multi-pathing data transfers in a system area network
US6941386B2 (en) Protocol processing stack for use with intelligent network interface device
US6760769B2 (en) Apparatus and methods for transmitting data at high speed using TCP/IP
US8180928B2 (en) Method and system for supporting read operations with CRC for iSCSI and iSCSI chimney
US6427171B1 (en) Protocol processing stack for use with intelligent network interface device
US7949792B2 (en) Encoding a TCP offload engine within FCP
US20110185089A1 (en) Method and System for Supporting Hardware Acceleration for iSCSI Read and Write Operations and iSCSI Chimney
US6791989B1 (en) Fibre channel interface controller that performs non-blocking output and input of fibre channel data frames and acknowledgement frames to and from a fibre channel
EP1759317B1 (en) Method and system for supporting read operations for iscsi and iscsi chimney
US7924859B2 (en) Method and system for efficiently using buffer space
US6981014B2 (en) Systems and methods for high speed data transmission using TCP/IP
US20080263171A1 (en) Peripheral device that DMAS the same data to different locations in a computer
US20050283545A1 (en) Method and system for supporting write operations with CRC for iSCSI and iSCSI chimney
US7639715B1 (en) Dedicated application interface for network systems
US20050281261A1 (en) Method and system for supporting write operations for iSCSI and iSCSI chimney
US20040267960A1 (en) Force master capability during multicast transfers
US20210011662A1 (en) System and method for ensuring command order in a storage controller
US20200220952A1 (en) System and method for accelerating iscsi command processing
JPH0320094B2 (en)

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP