US20060271639A1 - Multipath control device and system - Google Patents

Multipath control device and system Download PDF

Info

Publication number
US20060271639A1
US20060271639A1 US11/178,509 US17850905A US2006271639A1 US 20060271639 A1 US20060271639 A1 US 20060271639A1 US 17850905 A US17850905 A US 17850905A US 2006271639 A1 US2006271639 A1 US 2006271639A1
Authority
US
United States
Prior art keywords
input
storage device
command
transmission
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/178,509
Inventor
Atsuya Kumagai
Toshihiko Murakami
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Assigned to HITACHI, LTD. reassignment HITACHI, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KUMAGAI, ATSUYA, MURAKAMI, TOSHIHIKO
Publication of US20060271639A1 publication Critical patent/US20060271639A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/14Multichannel or multilink protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Definitions

  • the present invention relates to a load distribution method for a computer system, and more particularly to a load distribution method for ports related to storage devices.
  • SAN storage area network
  • multipath technologies have been used in which redundant paths are used for issuing input/output requests.
  • Through involvement of the multipath technologies it becomes possible to issue an input/output request even a trouble occurs on the path in use, by switching to another path, and to improve an input/output throughput by issuing an input/output request to a plurality of paths in accordance with predetermined rules.
  • an apparatus which issues an input/output request to storage devices via a plurality of paths selects the paths
  • a Round Robin algorithm of issuing an input/output request in accordance with an issue order decided before hand for each path.
  • Other examples are a Least Queue Depth algorithm of issuing an input/output request to the path having the minimum number of input/output requests stored in the queue assigned to each path, and a Least Blocks algorithm of issuing a write request to the path having the minimum total sum of write blocks stored in the queue assigned to each path.
  • the Least Blocks algorithm among others are characterized in that the amount of future transmission data is predicted from the number of write request blocks stored in the queue, so that the transmission data amounts on paths can be smoothed. Refer to “iSCSI Management API” by SNIA.
  • All conventional techniques do not predict a reception data amount on each path. A large difference of data amounts may occur among paths, or if a transmission load to be caused by write requests is heavy, a read request cannot be issued although the reception load is low.
  • an apparatus which issues an input/output request predicts not only a transmission data amount to be formed by write requests in a transmission queue but also a reception data amount to be formed by read requests in the transmission queue.
  • the apparatus which issues an input/output request stores a newly generated write request in the queue having the minimum predicted transmission data amount, and stores a newly generated read request in the queue having the minimum predicted reception data amount.
  • the apparatus which issues an input/output request predicts the data transmission amount and data reception amount at each port to be formed by a received write request and read request, respectively, and adds the predicted amounts to a data transmission amount and data reception amount at each port predicted to be formed by a write request and read request to be issued from the apparatus.
  • the transmission data amounts and reception data amounts on paths can be smoothed at the same time so that a data input/output throughput can be improved. Even if the number of paths is single, a read request can be issued to a storage so as not to be over a data reception ability at the port of the apparatus which issues an input/output request.
  • FIG. 1 is a diagram showing the configuration of a computer system according to a first embodiment of the invention.
  • FIG. 2 is a diagram showing the contents of a memory of a storage device of the embodiment.
  • FIG. 3 is a diagram showing examples of a transmission queue.
  • FIG. 4 is a diagram showing examples of a reception queue.
  • FIG. 5 is a diagram showing an example of data transmission/reception amount information.
  • FIG. 6 is a diagram showing an example of command management information.
  • FIG. 7 is a diagram showing an example of target information.
  • FIG. 8 is a diagram showing the structure of a memory of a management terminal of the embodiment.
  • FIG. 9 is a flow chart illustrating a command forwarding process to be executed by a command forwarding program of the embodiment.
  • FIG. 10 is a flow chart illustrating a process to be executed by an initiator program of the embodiment.
  • FIG. 11 is a flow chart illustrating another command forwarding process to be executed by the command forwarding program of the embodiment.
  • FIG. 12 is a flow chart illustrating a process to be executed by a target program of the embodiment.
  • FIG. 13 is a diagram showing the configuration of a computer system according to a second embodiment.
  • FIG. 14 is a flow chart illustrating a command reception process to be executed by a target program of the second embodiment.
  • FIG. 15 is a flow chart illustrating a response transmission process to be executed by the target program of the second embodiment.
  • FIG. 16 is a flow chart illustrating a command transmission process to be executed by a command issue program according to a third embodiment.
  • the present invention is applied to a computer system in which a storage device transfers a SCSI command received from a host computer to an external storage device.
  • FIG. 1 is a diagram showing the configuration of a computer system of the first embodiment.
  • the computer system of the first embodiment has a storage device 100 , an external storage device 110 , a plurality of hosts 130 , and a management terminal 150 .
  • the storage device 100 and external storage device 110 are interconnected via a network 120 such as the Internet.
  • the storage device 100 and a plurality of hosts 130 are connected via a network 140 .
  • the storage device 100 is connected to the management terminal 150 .
  • the host 130 is an information processing apparatus (host computer) which executes an application involving data input/output of the storage device 100 .
  • the storage device 100 has a CPU 101 , a memory 102 , a cache 103 for temporarily storing data to speed up accesses, a disk controller 104 , one or more disks 105 , ports 106 , a management port 108 , and a bus 109 interconnecting these devices.
  • the CPU 101 performs various processes to be described later, by executing programs stored in the memory 102 .
  • the memory 102 stores programs and data to be described later.
  • the cache 103 temporarily stores write data.
  • the disk controller 104 controls data input/output of the disks 105 .
  • the disk controller 104 may perform processes corresponding to Redundant Array of Independent Disks (RAID).
  • the disk 105 stores data read/written by the host 130 .
  • a non-volatile memory 107 stores programs and data to be stored into the memory 102 when the storage device 100 is activated.
  • the ports 106 are mechanisms such as network cards for connecting local area network (LAN) cables to the storage device 100 , and execute data transmission/reception processes relative to external devices via the networks 120 and 140 .
  • LAN local area network
  • the storage device 100 may have three or more ports 106 .
  • the management port 108 connects the management terminal 150 to the storage device 100 .
  • the storage device 100 has a relay function of transferring an input/output request issued from the host 130 to the external storage device 110 via the network 120 and transferring a response and data received from the external storage device 110 to the host 130 .
  • the external storage device 130 has a structure similar to that of the storage device 100 , excepting the relay function.
  • the host 130 has an initiator function of the iSCSI protocol.
  • the storage device 100 has a target function and an initiator function.
  • the external storage device 110 has a target function.
  • FIG. 2 shows programs and data stored in the memory 102 .
  • the memory 102 stores an initiator program 201 , a target program 202 , a command forwarding program 203 , a transmission queue 204 , a reception queue 205 , data transmission/reception amount information 206 , command management information 207 , target information 208 , a redundant path control program 209 and an initializing program 210 .
  • the initiator program 201 is a program for encapsulating a SCSI command and data into an iSCSI PDU, extracting a SCSI response from an iSCSI PDU, and transmitting/receiving an iSCSI PDU to/from an external iSCSI target, in accordance with the iSCSI protocol.
  • the initiator program 201 extracts the SCSI response from the iSCSI PDU and stores it in the reception queue 205 .
  • the transmission operation of an iSCSI command will be later detailed.
  • the target program 202 performs mutual exchange between the SCSI command and data and the iSCSI PDU and transmits/receives an iSCSI PDU.
  • the target program 202 extracts the SCSI command from the iSCSI PDU and stores it in the reception queue 205 , and further the target program 202 adds an iSCSI header to the SCSI response stored in the top entry of the transmission queue 204 to be described later and transmits it to the host 130 . This operation will be detailed later.
  • the command forwarding program 203 stores the SCSI command stored in the top entry of the reception queue 205 in the transmission queue 204 , and stores the SCSI response received by the initiator program 201 in the transmission queue 204 . This operation will be detailed later.
  • the redundant path control program 209 and initializing program 210 will be described later.
  • the transmission queue 204 is an area in the memory 102 for storing the SCSI command or SCSI response to be transmitted, and defined at each port.
  • the storage device 100 since the storage device 100 has three ports 106 , there are three transmission queues 204 a , 204 b and 204 c corresponding to the ports 106 a , 106 b and 106 c , respectively.
  • FIG. 3 shows examples of the transmission queues 204 a , 204 b and 204 c .
  • an area 301 in the transmission queue 204 is the top entry in the memory area, and entries 302 , 303 , and 304 are defined in this order.
  • the initiator program 201 or target program 202 reads the SCSI command or SCSI response stored in the top entry of the transmission queue 204 and deletes it and the order of the SCSI commands or SCSI responses stored at the second and subsequent entries is raised by one entry up.
  • a write request for two blocks is stored in the top entry of the transmission queue 204 a .
  • the block size is set to 512 bytes.
  • the reception queue 205 is an area in the memory 102 for storing the received SCSI command or SCSI response defined at each port.
  • FIG. 4 shows examples of the reception queues 205 a , 205 b and 205 c .
  • an area 401 in the reception queue 205 is the top entry in the memory area, and entries 402 , 403 , and 404 are defined in this order.
  • the initiator program 201 or target program 202 reads the SCSI command or SCSI response stored in the top entry of the reception queue 205 and deletes it and the order of the SCSI commands or SCSI responses stored in the second and subsequent entries is raised by one entry up.
  • a Read command and Write commands for the external storage device 110 are stored in the transmission queues 204 a and 204 b , and the data reception amount of the Read command and the data transmission amounts of the Write commands are shown.
  • a Read response and Write responses received from the external storage device 110 are stored in the reception queues 205 a and 205 b .
  • the Read response is response data to the Read command.
  • Write commands received from the host 130 are stored in the reception queue 205 c , and data reception amounts of the Write commands are shown.
  • a Read response to be transmitted to the host 130 is stored in the transmission queue 204 c.
  • the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 4, it is sufficient that the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 1 or more.
  • FIG. 5 is a diagram showing examples of the data transmission/reception amount information 206 .
  • the data transmission/reception amount information 206 is stored in a table constituted of a combination of information on a port identifier 501 , a transmission byte number 502 , a reception byte number 503 and initiator assignment information 504 .
  • the port identifier 501 is a name for identifying the port.
  • the transmission byte number 502 indicates the number of bytes of transmission data formed by the SCSI Write stored in the queue.
  • the reception byte number 503 indicates the number of bytes of reception data formed by the SCSI Read stored in the queue.
  • the initiator assignment information 504 indicates whether the initiator program 201 is assigned.
  • the value “1” in a cell 505 means that the initiator program 201 can issue an input/output request from the port b.
  • the value “0” in a cell 506 means that the initiator program 201 cannot issue an input/output request from the port c.
  • a cell 507 indicates that the total sum of the requested data amount by the SCSI Read stored in the transmission queue 204 b is 2048 bytes.
  • FIG. 6 is a diagram showing examples of the command management information 207 .
  • the command management information 207 is stored in a table constituted of a combination of information on a command tag 601 , an initiator name 602 and a target name 603 .
  • the command tag 601 is a number for identifying the SCSI command.
  • the initiator name 602 is a name of an initiator issuing the SCSI command.
  • the target name 603 is a name of a target issuing the SCSI command.
  • the examples shown in FIG. 6 show that an initiator I 1 issues a SCSI commands 11 and 12 to a target T 1 .
  • the item corresponding to the SCSI command for which the response is completed is deleted from the command management information 207 .
  • An input/output request in the state that the SCSI command managed by the command management information has already been transmitted and a corresponding SCSI response is not still received, is called an outstanding I/O.
  • An upper limit of the number of outstanding I/Os at the same time instant is preset, and the initiator program 201 controls so that the number of outstanding I/Os at the same instant does not exceed the upper limit. This upper limit is called the maximum number of outstanding I/Os.
  • FIG. 7 shows examples of the target information 208 .
  • the target information 208 is stored in a table constituted of a combination of information on a target name 701 and a location 702 .
  • the target name 701 is a name for identifying the target.
  • the location 702 is a location of the target identified by a host name, an IP address, a TCP port number and the like. The examples shown in FIG.
  • a target “localtarget” operates at the position identified by an IP address of 192.168.1.1 and a TCP port number 3260 , i.e., at the storage device 100 and that a target “remotetarget” existing in the external storage operates at the position identified by an IP address of 192.168.2.2 and a TCP port number 3260 and at the position identified by an IP address of 192.168.3.2 and a TCP port number 3260 .
  • the redundant path control program 209 allows the management terminal 150 to set a load distribution algorithm or the like via the management port 108 .
  • the redundant path control program 209 can set the algorithm of the present invention as well as other algorithms such as Round Robin, Least Queue Depth and Least Blocks.
  • the initializing program 210 initializes the data transmission/reception amount information 206 shown in FIG. 5 , the command management information 207 shown in FIG. 6 and the target information 208 shown in FIG. 208 .
  • CPU 101 executes the initializing program 210 stored in the memory 102 to thereby initialize the data transmission/reception amount information 206 , command management information 207 and target information 208 .
  • the management terminal 150 is a personal computer or the like for performing setting works for the storage device 100 .
  • the management terminal 150 has a CPU 151 , a memory 152 , a non-volatile memory 153 , an input unit 154 , an output unit 155 , a port 156 and a bus 157 interconnecting these devices.
  • CPU 151 performs processes to be described layer, by executing programs stored in the memory 152 .
  • the memory 152 stores programs and data to be described later.
  • the non-volatile memory 153 stores programs and data to be stored in the memory 152 when the management terminal 150 is activated.
  • the port 156 is a mechanism such as a network card for connecting a local area network (LAN) cable to the management terminal 150 , and performs data transmission/reception processes relative to the storage device 100 via a LAN.
  • LAN local area network
  • FIG. 8 shows a program stored in the memory 152 of the management terminal 150 .
  • a redundant path setting program 901 is stored in the memory 152 .
  • the redundant path setting program 901 sets the load distribution algorithm or the like to the storage device 100 .
  • the redundant path setting program 901 notifies the redundant path control program 209 of the load distribution algorithm selected from the input unit 154 .
  • FIG. 9 is a flow chart illustrating a process to be executed when the command forwarding program 203 transfers a SCSI command. This process starts when CPU 101 executes the command forwarding program 203 stored in the memory 102 . If a SCSI command is not stored in the top entry of the reception queue 205 c (S 801 : No), the command forwarding program 203 does not perform the command forwarding process until a SCSI command is stored in the top entry of the reception queues 205 c . If a SCSI command is stored in the top entry of the reception queue 205 c (S 801 : Yes), the command forwarding program 203 refers to the target information 208 to judge whether the SCSI command is destined to the external storage device 110 (S 802 ). If the SCSI command is not destined to the external storage device 110 (S 802 : No), the command forwarding program transfers the SCSI command to the disk controller 104 (S 803 ) to thereafter advance to S 808 .
  • the SCSI command is destined to the external storage device 110 (S 802 : Yes)
  • the command forwarding program 203 refers to the data transmission/reception amount information 206 to store the SCSI command in the transmission queue corresponding to the port having the minimum transmission byte number 502 , among the ports having the initiator assignment information 504 of “1” (S 806 ).
  • the command forwarding program 203 updates the data transmission/reception amount information 206 in accordance with the data transmission/reception amount of the SCSI command stored in the transmission queue 204 (S 807 ). Namely, in the case of the SCSI Read command, a data reception amount to be received by this command is added to the reception byte number 503 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206 . In the case of the SCSI Write command, a data transmission amount to be transmitted by this command is added to the transmission byte number 502 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206 .
  • the command forwarding program 203 further erases the top entry of the reception queue 205 storing the SCSI command stored in the transmission queue, advances, by one entry toward the top entry side, the storage location of each command stored in the second and subsequent entries (S 808 ).
  • the command forwarding program 203 stores the SCSI Read command in the transmission queue 204 a at Step S 805 .
  • the command forwarding program 203 stores the SCSI Write command in the transmission queue 204 b at Step S 806 .
  • FIG. 10 is a flow chart illustrating a process to be executed when the initiator program 201 transmits a SCSI command. This process starts when CPU 101 executes the initiator program 201 stored in the memory 102 . If a SCSI command is stored in the top entry of the transmission queue 204 a or 204 b (S 1001 : Yes), the initiator program 201 refers to the command management information 207 to judge whether the current number of outstanding I/Os is smaller than the maximum number of outstanding I/Os (S 1002 ). If a SCSI command is not stored in the top entry of the transmission queue 204 (S 1001 : No), the initiator program 201 does not perform the command transmission process until a SCSI command is stored in the top entry of the transmission queue 204 .
  • the initiator program 201 adds a header to the SCSI command and data to generate an iSCSI PDU (S 1003 ), divides the iSCSI PDU into Ethernet frames, transmits the Ethernet frames from the port 106 corresponding to the transmission queue 204 (S 1004 ), and adds an entry of the SCSI command to the command management information 207 (S 1005 ).
  • the initiator program enters a standby state until the current number of outstanding I/Os becomes smaller than the maximum number of outstanding I/Os.
  • the maximum number of outstanding I/Os is set to 4
  • the maximum number of outstanding I/Os is not limited to this unless it exceeds the maximum number of commands capable of being stored in the transmission queue.
  • the initiator program 201 deletes the SCSI command transmitted from the transmission queue 204 and the location position of each SCSI command stored in the second and subsequent entries is advanced by one entry up (S 1006 ).
  • the initiator program 201 updates the data transmission/reception amount information 206 (S 1007 ).
  • the transmitted data amount is subtracted from the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port from which the command was transmitted.
  • the data transmission/reception amount information 206 is not updated.
  • FIG. 11 is a flow chart illustrating a process to be executed when the command forwarding program 203 transfers a SCSI response and data. This process starts when CPU 101 executes the command forwarding program 203 stored in the memory 102 . If a SCSI response and data are stored in the top entry of the reception queue 205 a or 205 b (S 1101 : Yes), the command forwarding program 203 stores the SCSI response in the transmission queue 204 corresponding to the port whereat the corresponding SCSI command was received (S 1102 ).
  • the command forwarding program 203 further deletes the SCSI response stored in the reception queue 205 a or 205 b from which the SCSI response was extracted, and advances, by one entry toward the top entry, the location position of each SCSI command in the second and subsequent entries (S 1103 ).
  • the command forwarding program 203 updates the data transmission/reception amount information 206 (S 1104 ).
  • the received data amount is subtracted from the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port at which the response was received.
  • the data transmission/reception amount information 206 is not updated.
  • the command forwarding program 203 does not perform the response transfer process until a SCSI response is stored in the top entry of the transmission queue 204 .
  • FIG. 12 is a flow chart illustrating a process to be executed when the target program 202 transmits a SCSI response and Read data. This process starts when CPU 101 executes the target program 202 stored in the memory 102 . If a SCSI response and data are stored in the top entry of the transmission queue 204 (S 1201 : Yes), the target program 202 generates an iSCSI PDU from the SCSI response and data (S 1202 ), transmits the generated iSCSI PDU from the port (S 1203 ) and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S 1204 ). Next, the target program 202 deletes the SCSI response transmitted from the transmission queue 204 , and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S 1205 ).
  • the target program 202 does not perform the response transmission process until a SCSI command is stored in the top entry of the transmission queue 204 .
  • the first embodiment it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by the iSCSI initiator operating in the storage device 100 .
  • the storage device 100 transmits/receives the SCSI command and SCSI response to/from the host 130 via a single port 106 c .
  • the present invention is also applicable to the case in which the storage device 100 transmits/receives the SCSI command and SCSI response to/from the host 130 via two or more ports. This will be detailed in the third embodiment.
  • the storage device 100 uses the port 106 only for transmitting a SCSI command and receiving a SCSI response.
  • the ports 106 a and 106 b are used only by an initiator and do not receive a SCSI command
  • the port 106 c is used only for a target and does not transmit a SCSI command, limiting the role of each port.
  • a load distribution can be conducted by considering only the load of the transmission port.
  • the storage device 100 uses the port 106 for transmission/reception of a SCSI command and a SCSI response.
  • FIG. 13 is a diagram showing the configuration of a second embodiment of a computer system.
  • the devices and programs constituting this system are similar to those of the first embodiment, excepting that the same network 120 interconnecting the storage device 100 and hosts 130 is used for interconnecting the storage device 100 and external storage device 110 and that the operation of the target program 202 is modified.
  • the role of each port 106 is not limited as in the case of the first embodiment. In the second embodiment, therefore, the load distribution among the ports is conducted by considering the loads of both the transmission and reception ports.
  • FIG. 14 is a flow chart illustrating a process to be executed when the target program 202 receives an iSCSI PDU. This process starts when CPU 101 executes the target program 202 stored in the memory 102 .
  • the target program 202 extracts an SCSI command and data from the iSCSI PDU (S 1402 ).
  • the target program 202 further adds an entry of the SCSI command to the command management information 207 (S 1403 ) and adds the SCSI command to the bottom entry of the reception queue 205 (S 1404 ).
  • the target program 202 updates the data transmission/reception amount information 206 (S 1405 ).
  • the received SCSI command is a SCSI Read command
  • a data transmission amount to be transmitted by the command is added to the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port received the command.
  • a data reception amount to be received by the command is added to the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port received the command.
  • the port 106 a receives a SCSI Read command requesting data of 1024 bytes, the value “2048” of the reception byte number 502 is rewritten to “3072”.
  • the target program 202 does not perform the PDU transmission process until an iSCSI PDU is received.
  • FIG. 15 is a flow chart illustrating a process to be executed when the target program 202 transmits a SCSI response and Read data. This process starts when CPU 101 executes the target program 202 stored in the memory 102 . If a SCSI response is stored in the top entry of the transmission queue 204 (S 1501 Yes), the target program 202 generates an iSCSI PDU from the SCSI response and data (S 1502 ), transmits the generated iSCSI PDU from the corresponding port (S 1503 ), and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S 1504 ).
  • the target program 202 further deletes the SCSI response stored in the top entry of the transmission queue 204 and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S 1505 ). Then, the target program 202 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S 1506 ). Namely, in the case of a Read response, a data transmission amount by the command is subtracted from the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port. In the case of a Write response, a data reception amount by the command is subtracted from the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port.
  • the target program 202 does not perform the response transmission process until a SCSI response is stored in the top entry of the transmission queue.
  • the second embodiment it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by an iSCSI initiator and an iSCSI target operating in the storage device 100 .
  • the third embodiment is characterized in a port load distribution control on the side of a host 130 when the storage device 100 transmits/receives a SCSI command and a SCSI response to/from the host via two or more ports.
  • the host 130 is provided with a command issue program 211 in place of the command forwarding program 203 .
  • the initiator program 201 performs the process shown in FIG. 10 , excluding S 1002 and S 1005 .
  • the programs and control information constituting the second embodiment are used without modification.
  • FIG. 16 is a flow chart illustrating a process to be executed when the command issue program 211 issues a SCSI command. This process starts when the host 130 executes the command issue program 211 stored in a memory. If a SCSI command is not stored in the top entry of a SCSI buffer (S 1601 : No), the command issue program 211 does not perform the command transmission process until a SCSI command is stored in the top entry of the SCSI buffer. If a SCSI command is stored in the top entry of a SCSI buffer (S 1601 : Yes), the command issue program 211 judges whether the SCSI command is a SCSI Read (S 1602 ).
  • the command issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI Read command in the transmission queue 204 corresponding to the port having the minimum reception byte number 503 among the ports having the initiator assignment information 504 of “1” (S 1603 ). If the SCSI command is not a SCSI Read (S 1602 : No), the command issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI command in the transmission queue 204 corresponding to the port having the minimum transmission byte number 502 among the ports having the initiator assignment information 504 of “1” (S 1604 ).
  • the command issue program 211 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S 1605 ).
  • the process S 1605 is similar to the process S 807 .
  • the command issue program 211 further deletes the top entry of the SCSI buffer storing the transferred SCSI command, and advances by one entry toward the top entry the storage location of each SCSI command stored in the second and subsequent entries (S 1606 ).
  • the command issue program 211 executes the processes S 1103 and S 1104 .
  • SAN is configured by an IP network, and a SCSI command and data are transmitted/received in accordance with the iSCSI protocol.
  • the present invention is not limited thereto, but the present invention may adopt other protocols such as a Fibre Channel if the protocol can perform data input/output relative to the storage device.

Abstract

In a storage device having redundant input/output paths, both a transmission data amount and a reception data amount are smoothed among paths. A storage device predicts not only a transmission data amount to be formed by an output request in a transmission queue but also a reception data amount to be formed by an input request in the transmission queue. The storage device stores a newly occurred output request in a queue having a minimum predicted transmission data amount and stores a newly occurred input request in a queue having a minimum predicted reception data amount. In a storage device having redundant input/output paths, transmission data amounts and reception data amounts can be smoothed among the paths.

Description

    INCORPORATION BY REFERENCE
  • The present application claims priority from Japanese application JP2005-147799 filed on May 20, 2005, the content of which is hereby incorporated by reference into this application.
  • BACKGROUND OF THE INVENTION
  • The present invention relates to a load distribution method for a computer system, and more particularly to a load distribution method for ports related to storage devices.
  • In a conventional storage area network (SAN) connecting server computers and storage devices via a dedicated network, multipath technologies have been used in which redundant paths are used for issuing input/output requests. Through involvement of the multipath technologies, it becomes possible to issue an input/output request even a trouble occurs on the path in use, by switching to another path, and to improve an input/output throughput by issuing an input/output request to a plurality of paths in accordance with predetermined rules.
  • As an example of the algorithm that an apparatus which issues an input/output request to storage devices via a plurality of paths selects the paths, there is a Round Robin algorithm of issuing an input/output request in accordance with an issue order decided before hand for each path. Other examples are a Least Queue Depth algorithm of issuing an input/output request to the path having the minimum number of input/output requests stored in the queue assigned to each path, and a Least Blocks algorithm of issuing a write request to the path having the minimum total sum of write blocks stored in the queue assigned to each path. The Least Blocks algorithm among others are characterized in that the amount of future transmission data is predicted from the number of write request blocks stored in the queue, so that the transmission data amounts on paths can be smoothed. Refer to “iSCSI Management API” by SNIA.
  • SUMMARY OF THE INVENTION
  • All conventional techniques do not predict a reception data amount on each path. A large difference of data amounts may occur among paths, or if a transmission load to be caused by write requests is heavy, a read request cannot be issued although the reception load is low.
  • In order to solve these issues, an apparatus which issues an input/output request predicts not only a transmission data amount to be formed by write requests in a transmission queue but also a reception data amount to be formed by read requests in the transmission queue. The apparatus which issues an input/output request stores a newly generated write request in the queue having the minimum predicted transmission data amount, and stores a newly generated read request in the queue having the minimum predicted reception data amount.
  • The apparatus which issues an input/output request predicts the data transmission amount and data reception amount at each port to be formed by a received write request and read request, respectively, and adds the predicted amounts to a data transmission amount and data reception amount at each port predicted to be formed by a write request and read request to be issued from the apparatus.
  • According to the present invention, the transmission data amounts and reception data amounts on paths can be smoothed at the same time so that a data input/output throughput can be improved. Even if the number of paths is single, a read request can be issued to a storage so as not to be over a data reception ability at the port of the apparatus which issues an input/output request.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagram showing the configuration of a computer system according to a first embodiment of the invention.
  • FIG. 2 is a diagram showing the contents of a memory of a storage device of the embodiment.
  • FIG. 3 is a diagram showing examples of a transmission queue.
  • FIG. 4 is a diagram showing examples of a reception queue.
  • FIG. 5 is a diagram showing an example of data transmission/reception amount information.
  • FIG. 6 is a diagram showing an example of command management information.
  • FIG. 7 is a diagram showing an example of target information.
  • FIG. 8 is a diagram showing the structure of a memory of a management terminal of the embodiment.
  • FIG. 9 is a flow chart illustrating a command forwarding process to be executed by a command forwarding program of the embodiment.
  • FIG. 10 is a flow chart illustrating a process to be executed by an initiator program of the embodiment.
  • FIG. 11 is a flow chart illustrating another command forwarding process to be executed by the command forwarding program of the embodiment.
  • FIG. 12 is a flow chart illustrating a process to be executed by a target program of the embodiment.
  • FIG. 13 is a diagram showing the configuration of a computer system according to a second embodiment.
  • FIG. 14 is a flow chart illustrating a command reception process to be executed by a target program of the second embodiment.
  • FIG. 15 is a flow chart illustrating a response transmission process to be executed by the target program of the second embodiment.
  • FIG. 16 is a flow chart illustrating a command transmission process to be executed by a command issue program according to a third embodiment.
  • DESCRIPTION OF THE EMBODIMENTS
  • Embodiments of the invention will be described with reference to the accompanying drawings.
  • First Embodiment
  • In the first embodiment, the present invention is applied to a computer system in which a storage device transfers a SCSI command received from a host computer to an external storage device.
  • FIG. 1 is a diagram showing the configuration of a computer system of the first embodiment. As shown, the computer system of the first embodiment has a storage device 100, an external storage device 110, a plurality of hosts 130, and a management terminal 150. The storage device 100 and external storage device 110 are interconnected via a network 120 such as the Internet. The storage device 100 and a plurality of hosts 130 are connected via a network 140. The storage device 100 is connected to the management terminal 150.
  • The host 130 is an information processing apparatus (host computer) which executes an application involving data input/output of the storage device 100.
  • The storage device 100 has a CPU 101, a memory 102, a cache 103 for temporarily storing data to speed up accesses, a disk controller 104, one or more disks 105, ports 106, a management port 108, and a bus 109 interconnecting these devices.
  • CPU 101 performs various processes to be described later, by executing programs stored in the memory 102. The memory 102 stores programs and data to be described later. The cache 103 temporarily stores write data. The disk controller 104 controls data input/output of the disks 105. The disk controller 104 may perform processes corresponding to Redundant Array of Independent Disks (RAID). The disk 105 stores data read/written by the host 130. A non-volatile memory 107 stores programs and data to be stored into the memory 102 when the storage device 100 is activated.
  • The ports 106 are mechanisms such as network cards for connecting local area network (LAN) cables to the storage device 100, and execute data transmission/reception processes relative to external devices via the networks 120 and 140. In this embodiment, although the storage device 100 has three ports 106 a, 106 b and 106 c, the storage device 100 may have three or more ports 106. The management port 108 connects the management terminal 150 to the storage device 100.
  • The storage device 100 has a relay function of transferring an input/output request issued from the host 130 to the external storage device 110 via the network 120 and transferring a response and data received from the external storage device 110 to the host 130. The external storage device 130 has a structure similar to that of the storage device 100, excepting the relay function.
  • The host 130 has an initiator function of the iSCSI protocol. The storage device 100 has a target function and an initiator function. The external storage device 110 has a target function.
  • FIG. 2 shows programs and data stored in the memory 102. The memory 102 stores an initiator program 201, a target program 202, a command forwarding program 203, a transmission queue 204, a reception queue 205, data transmission/reception amount information 206, command management information 207, target information 208, a redundant path control program 209 and an initializing program 210.
  • The initiator program 201 is a program for encapsulating a SCSI command and data into an iSCSI PDU, extracting a SCSI response from an iSCSI PDU, and transmitting/receiving an iSCSI PDU to/from an external iSCSI target, in accordance with the iSCSI protocol. When the port 106 receives an iSCSI PDU including a SCSI response, the initiator program 201 extracts the SCSI response from the iSCSI PDU and stores it in the reception queue 205. The transmission operation of an iSCSI command will be later detailed.
  • The target program 202 performs mutual exchange between the SCSI command and data and the iSCSI PDU and transmits/receives an iSCSI PDU. When the port 106 receives an iSCSI PDU, the target program 202 extracts the SCSI command from the iSCSI PDU and stores it in the reception queue 205, and further the target program 202 adds an iSCSI header to the SCSI response stored in the top entry of the transmission queue 204 to be described later and transmits it to the host 130. This operation will be detailed later.
  • The command forwarding program 203 stores the SCSI command stored in the top entry of the reception queue 205 in the transmission queue 204, and stores the SCSI response received by the initiator program 201 in the transmission queue 204. This operation will be detailed later.
  • The redundant path control program 209 and initializing program 210 will be described later.
  • The transmission queue 204 is an area in the memory 102 for storing the SCSI command or SCSI response to be transmitted, and defined at each port. In this embodiment, since the storage device 100 has three ports 106, there are three transmission queues 204 a, 204 b and 204 c corresponding to the ports 106 a, 106 b and 106 c, respectively. FIG. 3 shows examples of the transmission queues 204 a, 204 b and 204 c. In the examples shown in FIG. 3, an area 301 in the transmission queue 204 is the top entry in the memory area, and entries 302, 303, and 304 are defined in this order. The initiator program 201 or target program 202 reads the SCSI command or SCSI response stored in the top entry of the transmission queue 204 and deletes it and the order of the SCSI commands or SCSI responses stored at the second and subsequent entries is raised by one entry up. In the examples shown in FIG. 3, a write request for two blocks is stored in the top entry of the transmission queue 204 a. In this embodiment, the block size is set to 512 bytes.
  • The reception queue 205 is an area in the memory 102 for storing the received SCSI command or SCSI response defined at each port. In this embodiment, similar to the transmission queue, there are three reception queues 205 a, 205 b and 205 c corresponding to the ports 106 a, 106 b and 106 c, respectively. FIG. 4 shows examples of the reception queues 205 a, 205 b and 205 c. Similar to the transmission queues 204, an area 401 in the reception queue 205 is the top entry in the memory area, and entries 402, 403, and 404 are defined in this order. The initiator program 201 or target program 202 reads the SCSI command or SCSI response stored in the top entry of the reception queue 205 and deletes it and the order of the SCSI commands or SCSI responses stored in the second and subsequent entries is raised by one entry up.
  • In the examples, a Read command and Write commands for the external storage device 110 are stored in the transmission queues 204 a and 204 b, and the data reception amount of the Read command and the data transmission amounts of the Write commands are shown. A Read response and Write responses received from the external storage device 110 are stored in the reception queues 205 a and 205 b. The Read response is response data to the Read command. Write commands received from the host 130 are stored in the reception queue 205 c, and data reception amounts of the Write commands are shown. A Read response to be transmitted to the host 130 is stored in the transmission queue 204 c.
  • In the queues shown in FIGS. 3 and 4, although the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 4, it is sufficient that the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 1 or more.
  • FIG. 5 is a diagram showing examples of the data transmission/reception amount information 206. The data transmission/reception amount information 206 is stored in a table constituted of a combination of information on a port identifier 501, a transmission byte number 502, a reception byte number 503 and initiator assignment information 504. The port identifier 501 is a name for identifying the port. The transmission byte number 502 indicates the number of bytes of transmission data formed by the SCSI Write stored in the queue. The reception byte number 503 indicates the number of bytes of reception data formed by the SCSI Read stored in the queue. The initiator assignment information 504 indicates whether the initiator program 201 is assigned. The value “1” in a cell 505 means that the initiator program 201 can issue an input/output request from the port b. The value “0” in a cell 506 means that the initiator program 201 cannot issue an input/output request from the port c. A cell 507 indicates that the total sum of the requested data amount by the SCSI Read stored in the transmission queue 204 b is 2048 bytes.
  • FIG. 6 is a diagram showing examples of the command management information 207. The command management information 207 is stored in a table constituted of a combination of information on a command tag 601, an initiator name 602 and a target name 603. The command tag 601 is a number for identifying the SCSI command. The initiator name 602 is a name of an initiator issuing the SCSI command. The target name 603 is a name of a target issuing the SCSI command. The examples shown in FIG. 6 show that an initiator I1 issues a SCSI commands 11 and 12 to a target T1. The item corresponding to the SCSI command for which the response is completed is deleted from the command management information 207. An input/output request, in the state that the SCSI command managed by the command management information has already been transmitted and a corresponding SCSI response is not still received, is called an outstanding I/O. An upper limit of the number of outstanding I/Os at the same time instant is preset, and the initiator program 201 controls so that the number of outstanding I/Os at the same instant does not exceed the upper limit. This upper limit is called the maximum number of outstanding I/Os.
  • FIG. 7 shows examples of the target information 208. The target information 208 is stored in a table constituted of a combination of information on a target name 701 and a location 702. The target name 701 is a name for identifying the target. The location 702 is a location of the target identified by a host name, an IP address, a TCP port number and the like. The examples shown in FIG. 7 show that a target “localtarget” operates at the position identified by an IP address of 192.168.1.1 and a TCP port number 3260, i.e., at the storage device 100 and that a target “remotetarget” existing in the external storage operates at the position identified by an IP address of 192.168.2.2 and a TCP port number 3260 and at the position identified by an IP address of 192.168.3.2 and a TCP port number 3260.
  • The redundant path control program 209 allows the management terminal 150 to set a load distribution algorithm or the like via the management port 108. The redundant path control program 209 can set the algorithm of the present invention as well as other algorithms such as Round Robin, Least Queue Depth and Least Blocks.
  • The initializing program 210 initializes the data transmission/reception amount information 206 shown in FIG. 5, the command management information 207 shown in FIG. 6 and the target information 208 shown in FIG. 208. In executing an initializing process for the storage device at the time when a power supply of the storage device 100 is turned on or at other times, CPU 101 executes the initializing program 210 stored in the memory 102 to thereby initialize the data transmission/reception amount information 206, command management information 207 and target information 208.
  • The management terminal 150 is a personal computer or the like for performing setting works for the storage device 100. The management terminal 150 has a CPU 151, a memory 152, a non-volatile memory 153, an input unit 154, an output unit 155, a port 156 and a bus 157 interconnecting these devices. CPU 151 performs processes to be described layer, by executing programs stored in the memory 152. The memory 152 stores programs and data to be described later. The non-volatile memory 153 stores programs and data to be stored in the memory 152 when the management terminal 150 is activated. The port 156 is a mechanism such as a network card for connecting a local area network (LAN) cable to the management terminal 150, and performs data transmission/reception processes relative to the storage device 100 via a LAN.
  • FIG. 8 shows a program stored in the memory 152 of the management terminal 150. A redundant path setting program 901 is stored in the memory 152.
  • The redundant path setting program 901 sets the load distribution algorithm or the like to the storage device 100. The redundant path setting program 901 notifies the redundant path control program 209 of the load distribution algorithm selected from the input unit 154.
  • Next, description will be made on the operation of the computer system and each process to be executed by the storage device 100.
  • First, with reference to FIG. 9, description will be made on an operation to be performed when the storage device 100 transfers a SCSI command and data received from the host 130 to the external storage device 110.
  • FIG. 9 is a flow chart illustrating a process to be executed when the command forwarding program 203 transfers a SCSI command. This process starts when CPU 101 executes the command forwarding program 203 stored in the memory 102. If a SCSI command is not stored in the top entry of the reception queue 205 c (S801: No), the command forwarding program 203 does not perform the command forwarding process until a SCSI command is stored in the top entry of the reception queues 205 c. If a SCSI command is stored in the top entry of the reception queue 205 c (S801: Yes), the command forwarding program 203 refers to the target information 208 to judge whether the SCSI command is destined to the external storage device 110 (S802). If the SCSI command is not destined to the external storage device 110 (S802: No), the command forwarding program transfers the SCSI command to the disk controller 104 (S803) to thereafter advance to S808.
  • If the SCSI command is destined to the external storage device 110 (S802: Yes), it is checked whether the SCSI command is a SCSI Read (S804). If the SCSI command is the SCSI Read (S804: Yes), the command forwarding program 203 refers to the data transmission/reception amount information 206 to store the SCSI Read command in the transmission queue 204 corresponding to the port having the minimum reception byte number 503, among the ports having the initiator assignment information 504 of “1” (S805). If the SCSI command is not the SCSI Read (S804: No), it is either a SCSI Write command or other commands. Therefore, the command forwarding program 203 refers to the data transmission/reception amount information 206 to store the SCSI command in the transmission queue corresponding to the port having the minimum transmission byte number 502, among the ports having the initiator assignment information 504 of “1” (S806).
  • After the process S805 or S806, the command forwarding program 203 updates the data transmission/reception amount information 206 in accordance with the data transmission/reception amount of the SCSI command stored in the transmission queue 204 (S807). Namely, in the case of the SCSI Read command, a data reception amount to be received by this command is added to the reception byte number 503 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206. In the case of the SCSI Write command, a data transmission amount to be transmitted by this command is added to the transmission byte number 502 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206. In the case of the other commands, since the data transmission/reception amount can be neglected, the data transmission/reception amount information 206 will not be updated. The command forwarding program 203 further erases the top entry of the reception queue 205 storing the SCSI command stored in the transmission queue, advances, by one entry toward the top entry side, the storage location of each command stored in the second and subsequent entries (S808).
  • For example, assuming that the SCSI Read command of 1024 bytes is stored in the top entry of the reception queue 205 c, since the reception byte number 503 corresponding to the transmission queue 204 a and shown in FIG. 5 is minimum, the command forwarding program 203 stores the SCSI Read command in the transmission queue 204 a at Step S805.
  • For example, assuming that the SCSI Write command of 1024 bytes is stored in the top entry of the reception queue 205 c, since the transmission byte number 502 corresponding to the transmission queue 204 a and shown in FIG. 5 is minimum, the command forwarding program 203 stores the SCSI Write command in the transmission queue 204 b at Step S806.
  • FIG. 10 is a flow chart illustrating a process to be executed when the initiator program 201 transmits a SCSI command. This process starts when CPU 101 executes the initiator program 201 stored in the memory 102. If a SCSI command is stored in the top entry of the transmission queue 204 a or 204 b (S1001: Yes), the initiator program 201 refers to the command management information 207 to judge whether the current number of outstanding I/Os is smaller than the maximum number of outstanding I/Os (S1002). If a SCSI command is not stored in the top entry of the transmission queue 204 (S1001: No), the initiator program 201 does not perform the command transmission process until a SCSI command is stored in the top entry of the transmission queue 204. If it is judged in the process S1002 that the current number of outstanding I/Os is smaller than the maximum number of outstanding I/Os (S1002: Yes), the initiator program 201 adds a header to the SCSI command and data to generate an iSCSI PDU (S1003), divides the iSCSI PDU into Ethernet frames, transmits the Ethernet frames from the port 106 corresponding to the transmission queue 204 (S1004), and adds an entry of the SCSI command to the command management information 207 (S1005).
  • If the current number of outstanding I/Os is equal to the maximum number of outstanding I/Os (S1002: No), the initiator program enters a standby state until the current number of outstanding I/Os becomes smaller than the maximum number of outstanding I/Os. In this embodiment, although the maximum number of outstanding I/Os is set to 4, the maximum number of outstanding I/Os is not limited to this unless it exceeds the maximum number of commands capable of being stored in the transmission queue.
  • After the process S1005, the initiator program 201 deletes the SCSI command transmitted from the transmission queue 204 and the location position of each SCSI command stored in the second and subsequent entries is advanced by one entry up (S1006). Next, the initiator program 201 updates the data transmission/reception amount information 206 (S1007). In the case of a SCSI Write command, the transmitted data amount is subtracted from the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port from which the command was transmitted. In the case of the SCSI Read command, the data transmission/reception amount information 206 is not updated.
  • Next, with reference to FIG. 11, description will be made on an operation to be performed when the storage device 100 transfers the SCSI response and Read data received from the external storage device 110 to the host 130.
  • FIG. 11 is a flow chart illustrating a process to be executed when the command forwarding program 203 transfers a SCSI response and data. This process starts when CPU 101 executes the command forwarding program 203 stored in the memory 102. If a SCSI response and data are stored in the top entry of the reception queue 205 a or 205 b (S1101: Yes), the command forwarding program 203 stores the SCSI response in the transmission queue 204 corresponding to the port whereat the corresponding SCSI command was received (S1102). The command forwarding program 203 further deletes the SCSI response stored in the reception queue 205 a or 205 b from which the SCSI response was extracted, and advances, by one entry toward the top entry, the location position of each SCSI command in the second and subsequent entries (S1103). Next, the command forwarding program 203 updates the data transmission/reception amount information 206 (S1104). In the case of a Read response, the received data amount is subtracted from the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port at which the response was received. In the case of a Write response, the data transmission/reception amount information 206 is not updated.
  • If a SCSI response is not stored in the top entry of the reception queue 205 (S1101: No), the command forwarding program 203 does not perform the response transfer process until a SCSI response is stored in the top entry of the transmission queue 204.
  • FIG. 12 is a flow chart illustrating a process to be executed when the target program 202 transmits a SCSI response and Read data. This process starts when CPU 101 executes the target program 202 stored in the memory 102. If a SCSI response and data are stored in the top entry of the transmission queue 204 (S1201: Yes), the target program 202 generates an iSCSI PDU from the SCSI response and data (S1202), transmits the generated iSCSI PDU from the port (S1203) and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S1204). Next, the target program 202 deletes the SCSI response transmitted from the transmission queue 204, and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S1205).
  • If a SCSI response is not stored in the top entry of the transmission queue 204 (S1201: No), the target program 202 does not perform the response transmission process until a SCSI command is stored in the top entry of the transmission queue 204.
  • According to the first embodiment, it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by the iSCSI initiator operating in the storage device 100.
  • In the description of the first embodiment, the storage device 100 transmits/receives the SCSI command and SCSI response to/from the host 130 via a single port 106 c. The present invention is also applicable to the case in which the storage device 100 transmits/receives the SCSI command and SCSI response to/from the host 130 via two or more ports. This will be detailed in the third embodiment.
  • Second Embodiment
  • In the description of the first embodiment, the storage device 100 uses the port 106 only for transmitting a SCSI command and receiving a SCSI response. In other words, the ports 106 a and 106 b are used only by an initiator and do not receive a SCSI command, whereas the port 106 c is used only for a target and does not transmit a SCSI command, limiting the role of each port. In the first embodiment, therefore, a load distribution can be conducted by considering only the load of the transmission port. In the second embodiment, the storage device 100 uses the port 106 for transmission/reception of a SCSI command and a SCSI response.
  • FIG. 13 is a diagram showing the configuration of a second embodiment of a computer system. The devices and programs constituting this system are similar to those of the first embodiment, excepting that the same network 120 interconnecting the storage device 100 and hosts 130 is used for interconnecting the storage device 100 and external storage device 110 and that the operation of the target program 202 is modified. In the second embodiment, the role of each port 106 is not limited as in the case of the first embodiment. In the second embodiment, therefore, the load distribution among the ports is conducted by considering the loads of both the transmission and reception ports.
  • In the following, description will be made on the operation of the computer system and a modified process in the storage device 100.
  • FIG. 14 is a flow chart illustrating a process to be executed when the target program 202 receives an iSCSI PDU. This process starts when CPU 101 executes the target program 202 stored in the memory 102. As the port 106 receives an iSCSI PDU (S1401: Yes), the target program 202 extracts an SCSI command and data from the iSCSI PDU (S1402). The target program 202 further adds an entry of the SCSI command to the command management information 207 (S1403) and adds the SCSI command to the bottom entry of the reception queue 205 (S1404). Then, the target program 202 updates the data transmission/reception amount information 206 (S1405). If the received SCSI command is a SCSI Read command, a data transmission amount to be transmitted by the command is added to the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port received the command. If the received SCSI command is a SCSI Write command, a data reception amount to be received by the command is added to the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port received the command.
  • For example, if the port 106 a receives a SCSI Read command requesting data of 1024 bytes, the value “2048” of the reception byte number 502 is rewritten to “3072”.
  • The target program 202 does not perform the PDU transmission process until an iSCSI PDU is received.
  • FIG. 15 is a flow chart illustrating a process to be executed when the target program 202 transmits a SCSI response and Read data. This process starts when CPU 101 executes the target program 202 stored in the memory 102. If a SCSI response is stored in the top entry of the transmission queue 204 (S1501 Yes), the target program 202 generates an iSCSI PDU from the SCSI response and data (S1502), transmits the generated iSCSI PDU from the corresponding port (S1503), and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S1504). The target program 202 further deletes the SCSI response stored in the top entry of the transmission queue 204 and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S1505). Then, the target program 202 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S1506). Namely, in the case of a Read response, a data transmission amount by the command is subtracted from the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port. In the case of a Write response, a data reception amount by the command is subtracted from the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port.
  • If a SCSI response is not stored in the top entry of the transmission queue 204 (S1501: No), the target program 202 does not perform the response transmission process until a SCSI response is stored in the top entry of the transmission queue.
  • According to the second embodiment, it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by an iSCSI initiator and an iSCSI target operating in the storage device 100.
  • Third Embodiment
  • The third embodiment is characterized in a port load distribution control on the side of a host 130 when the storage device 100 transmits/receives a SCSI command and a SCSI response to/from the host via two or more ports. The host 130 is provided with a command issue program 211 in place of the command forwarding program 203. The initiator program 201 performs the process shown in FIG. 10, excluding S1002 and S1005. There is no target program 202. Similar to the first embodiment, there exist the transmission queue 204, reception queue 205 and data transmission/reception amount information 206.
  • As the structure on the storage device 100 side of the third embodiment, the programs and control information constituting the second embodiment are used without modification.
  • FIG. 16 is a flow chart illustrating a process to be executed when the command issue program 211 issues a SCSI command. This process starts when the host 130 executes the command issue program 211 stored in a memory. If a SCSI command is not stored in the top entry of a SCSI buffer (S1601: No), the command issue program 211 does not perform the command transmission process until a SCSI command is stored in the top entry of the SCSI buffer. If a SCSI command is stored in the top entry of a SCSI buffer (S1601: Yes), the command issue program 211 judges whether the SCSI command is a SCSI Read (S1602). If the SCSI command is a SCSI Read (S1602: Yes), the command issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI Read command in the transmission queue 204 corresponding to the port having the minimum reception byte number 503 among the ports having the initiator assignment information 504 of “1” (S1603). If the SCSI command is not a SCSI Read (S1602: No), the command issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI command in the transmission queue 204 corresponding to the port having the minimum transmission byte number 502 among the ports having the initiator assignment information 504 of “1” (S1604). After the process S1603 or S1604, the command issue program 211 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S1605). The process S1605 is similar to the process S807. The command issue program 211 further deletes the top entry of the SCSI buffer storing the transferred SCSI command, and advances by one entry toward the top entry the storage location of each SCSI command stored in the second and subsequent entries (S1606).
  • If a SCSI response exists in the top entry of the reception queue 205, the command issue program 211 executes the processes S1103 and S1104.
  • In the description of the above embodiments, SAN is configured by an IP network, and a SCSI command and data are transmitted/received in accordance with the iSCSI protocol. The present invention is not limited thereto, but the present invention may adopt other protocols such as a Fibre Channel if the protocol can perform data input/output relative to the storage device.

Claims (15)

1. A storage device connected to another storage device via a network and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said other storage device, the storage device comprising:
selecting means for selecting, if said input/output request is an input request, a transmission queue having a minimum data reception amount to be formed by an input/output request or requests already stored in the transmission queue, and if said input/output request is an output request, a transmission queue having a minimum data transmission amount to be formed by an input/output request or requests already stored in the transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
2. The storage device according to claim 1, wherein said transmission queue is provided in correspondence with a port for connecting said storage device to said other storage device.
3. The storage device according to claim 2, further comprising a table, provided in a memory of the storage device, for storing said data transmission amount and said data reception amount at each of said ports.
4. A storage device connected to another storage device and a host computer via a network and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said other storage device, and a reception queue paired with each of said transmission queues for temporarily storeing an input/output request received from said host computer, the storage device comprising:
selecting means for selecting a transmission queue having a minimum total sum of a data transmission amount to be formed by an input request or requests stored in said reception queue and a data transmission amount to be formed by an output request or requests stored in said transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
5. The storage device according to claim 4, wherein said transmission queue is provided in correspondence with a port for connecting said storage device to said other storage device.
6. The storage device according to claim 5, further comprising a table, provided in a memory of the storage device, for storing said data transmission amount and said data reception amount at each of said ports.
7. A storage device comprising:
a CPU and a memory:
a plurality of ports, connected to an external storage device via a network, for transmitting/receiving an input/output command and a response;
a port, connected to a host computer via the network, for transmitting/receiving an input/output command and a response;
a plurality of transmission queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command to be transmitted to said external storage device;
a plurality of reception queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command received from said host computer;
data transmission/reception information provided in said memory and being representative of a data transmission amount and a data reception amount of data to be formed by said input/output command or commands stored in said transmission queue, for each of said ports; and
a command forwarding program, stored in said memory to be executed by said CPU, for selecting, if said input/output command to be stored in said reception queue is an input command, a transmission queue having a minimum data reception, and if said input/output command is an output command, a transmission queue having a minimum data transmission, forwarding said input/output command from said reception queue to said selected transmission queue, and increasing said data reception amount or said data transmission amount corresponding to said selected transmission queue by a data amount increased by said input/output command,
wherein said data transmission amount is reduced by a data amount increased if said output command is transmitted from said selected transmission queue via said port, and said data reception amount is reduced by a data amount increased if a response to said input command is received.
8. The storage device according to claim 7, further comprising command management information, provided in said memory, being representative of an identifier of a pending input/output command, wherein if said input/output command is transmitted from said transmission queue via said port, the storage device registers said input/output command in said command management information, and if a response to said input/output command is received, the storage device deletes the identifier of said input/output command.
9. A host computer connected to a storage device via a network and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said storage device, the storage device comprising:
selecting means for selecting, if said input/output request is an input request, a transmission queue having a minimum data reception amount to be formed by an input/output request or requests already stored in the transmission queue, and if said input/output request is an output request, a transmission queue having a minimum data transmission amount to be formed by an input/output request or requests already stored in the transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
10. The host computer according to claim 9, wherein said transmission queue is provided in correspondence with a port for connecting said host computer to said storage device.
11. The host computer according to claim 10, further comprising a table, provided in a memory of the host computer, for storing said data transmission amount and said data reception amount at each of said ports.
12. A storage device comprising:
a CPU and a memory:
a plurality of ports, connected to a host computer and an external storage device via a network, for transmitting/receiving an input/output command and a response;
a plurality of transmission queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command to be transmitted to said external storage device;
a plurality of reception queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command received from said host computer;
data transmission/reception information provided in said memory and being representative of a data transmission amount and a data reception amount of data to be formed by said input/output command or commands stored in said transmission queue, for each of said ports; and
a command forwarding program, stored in said memory to be executed by said CPU, for selecting, if said input/output command to be stored in said reception queue is an input command, a transmission queue having a minimum data reception, and if said input/output command is an output command, a transmission queue having a minimum data transmission, forwarding said input/output command from said reception queue to said selected transmission queue, and increasing said data reception amount or said data transmission amount corresponding to said selected transmission queue by a data amount increased by said input/output command,
wherein said data transmission amount is reduced by a data amount increased when said output command is transmitted from said selected transmission queue via said port, and said data reception amount is reduced by a data amount increased when a response corresponding to said input command is received.
13. The storage device according to claim 12, further comprising command management information, provided in said memory, being representative of an identifier of a pending input/output command, wherein if said input/output command is transmitted from said transmission queue via said port, the storage device registers said input/output command in said command management information, and if a response to said input/output command is received, the storage device deletes the identifier of said input/output command.
14. A computer system comprising:
a host computer, connected to a storage device via a network, for transmitting an input/output request to said storage device, said storage device connected via the network to another storage device and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said other storage device, said storage device comprising:
selecting means for selecting, if said input/output request received in said reception queue is an input request, a transmission queue having a minimum data reception amount to be formed by an input/output request or requests already stored in the transmission queue, and if said input/output request received in said reception queue is an output request, a transmission queue having a minimum data transmission amount to be formed by an input/output request or requests already stored in the transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
15. The computer system according to claim 14, wherein said transmission queue is provided in correspondence with a port for connecting said storage device to said other storage device, and said reception queue is provided in correspondence with a port for connecting said storage device to said host computer.
US11/178,509 2005-05-20 2005-07-12 Multipath control device and system Abandoned US20060271639A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005147799A JP2006323729A (en) 2005-05-20 2005-05-20 Device and system for performing multipath control
JP2005-147799 2005-05-20

Publications (1)

Publication Number Publication Date
US20060271639A1 true US20060271639A1 (en) 2006-11-30

Family

ID=37464754

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/178,509 Abandoned US20060271639A1 (en) 2005-05-20 2005-07-12 Multipath control device and system

Country Status (2)

Country Link
US (1) US20060271639A1 (en)
JP (1) JP2006323729A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080141256A1 (en) * 2006-12-08 2008-06-12 Forrer Jr Thomas R System and Method to Improve Sequential Serial Attached Small Computer System Interface Storage Device Performance
US20090003361A1 (en) * 2007-06-27 2009-01-01 Emulex Design & Manufacturing Corporation Multi-protocol controller that supports PCle, SAS and enhanced ethernet
US8793399B1 (en) * 2008-08-06 2014-07-29 Qlogic, Corporation Method and system for accelerating network packet processing
US11044313B2 (en) 2018-10-09 2021-06-22 EMC IP Holding Company LLC Categorizing host IO load pattern and communicating categorization to storage system
US11050660B2 (en) * 2018-09-28 2021-06-29 EMC IP Holding Company LLC Host device with multi-path layer implementing path selection based at least in part on fabric identifiers

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5533887B2 (en) * 2010-02-10 2014-06-25 日本電気株式会社 Storage device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6115387A (en) * 1997-02-14 2000-09-05 Advanced Micro Devices, Inc. Method and apparatus for controlling initiation of transmission of data as a function of received data
US6145028A (en) * 1997-12-11 2000-11-07 Ncr Corporation Enhanced multi-pathing to an array of storage devices
US6341315B1 (en) * 1999-02-26 2002-01-22 Crossroads Systems, Inc. Streaming method and system for fiber channel network devices
US20020129143A1 (en) * 2000-05-19 2002-09-12 Mckinnon Martin W. Solicitations for allocations of access across a shared communications medium
US6711170B1 (en) * 1999-08-31 2004-03-23 Mosaid Technologies, Inc. Method and apparatus for an interleaved non-blocking packet buffer
US20050053077A1 (en) * 2003-07-23 2005-03-10 International Business Machines Corporation System and method for collapsing VOQ'S of a packet switch fabric
US7055059B2 (en) * 1993-04-23 2006-05-30 Emc Corporation Remote data mirroring
US7080168B2 (en) * 2003-07-18 2006-07-18 Intel Corporation Maintaining aggregate data counts for flow controllable queues
US7103890B2 (en) * 2003-03-24 2006-09-05 Microsoft Corporation Non-blocking buffered inter-machine data transfer with acknowledgement
US20060221974A1 (en) * 2005-04-02 2006-10-05 Cisco Technology, Inc. Method and apparatus for dynamic load balancing over a network link bundle
US7292589B2 (en) * 2002-08-13 2007-11-06 Narendra Kumar Dhara Flow based dynamic load balancing for cost effective switching systems
US7307948B2 (en) * 2002-10-21 2007-12-11 Emulex Design & Manufacturing Corporation System with multiple path fail over, fail back and load balancing

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7055059B2 (en) * 1993-04-23 2006-05-30 Emc Corporation Remote data mirroring
US6115387A (en) * 1997-02-14 2000-09-05 Advanced Micro Devices, Inc. Method and apparatus for controlling initiation of transmission of data as a function of received data
US6145028A (en) * 1997-12-11 2000-11-07 Ncr Corporation Enhanced multi-pathing to an array of storage devices
US6341315B1 (en) * 1999-02-26 2002-01-22 Crossroads Systems, Inc. Streaming method and system for fiber channel network devices
US6711170B1 (en) * 1999-08-31 2004-03-23 Mosaid Technologies, Inc. Method and apparatus for an interleaved non-blocking packet buffer
US20020129143A1 (en) * 2000-05-19 2002-09-12 Mckinnon Martin W. Solicitations for allocations of access across a shared communications medium
US7292589B2 (en) * 2002-08-13 2007-11-06 Narendra Kumar Dhara Flow based dynamic load balancing for cost effective switching systems
US7307948B2 (en) * 2002-10-21 2007-12-11 Emulex Design & Manufacturing Corporation System with multiple path fail over, fail back and load balancing
US7103890B2 (en) * 2003-03-24 2006-09-05 Microsoft Corporation Non-blocking buffered inter-machine data transfer with acknowledgement
US7080168B2 (en) * 2003-07-18 2006-07-18 Intel Corporation Maintaining aggregate data counts for flow controllable queues
US20050053077A1 (en) * 2003-07-23 2005-03-10 International Business Machines Corporation System and method for collapsing VOQ'S of a packet switch fabric
US20060221974A1 (en) * 2005-04-02 2006-10-05 Cisco Technology, Inc. Method and apparatus for dynamic load balancing over a network link bundle

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080141256A1 (en) * 2006-12-08 2008-06-12 Forrer Jr Thomas R System and Method to Improve Sequential Serial Attached Small Computer System Interface Storage Device Performance
US8307128B2 (en) * 2006-12-08 2012-11-06 International Business Machines Corporation System and method to improve sequential serial attached small computer system interface storage device performance
US20090003361A1 (en) * 2007-06-27 2009-01-01 Emulex Design & Manufacturing Corporation Multi-protocol controller that supports PCle, SAS and enhanced ethernet
US7917682B2 (en) * 2007-06-27 2011-03-29 Emulex Design & Manufacturing Corporation Multi-protocol controller that supports PCIe, SAS and enhanced Ethernet
US8793399B1 (en) * 2008-08-06 2014-07-29 Qlogic, Corporation Method and system for accelerating network packet processing
US11050660B2 (en) * 2018-09-28 2021-06-29 EMC IP Holding Company LLC Host device with multi-path layer implementing path selection based at least in part on fabric identifiers
US11044313B2 (en) 2018-10-09 2021-06-22 EMC IP Holding Company LLC Categorizing host IO load pattern and communicating categorization to storage system

Also Published As

Publication number Publication date
JP2006323729A (en) 2006-11-30

Similar Documents

Publication Publication Date Title
US20200210069A1 (en) Methods and systems for data storage using solid state drives
US7272687B2 (en) Cache redundancy for LSI raid controllers
US8127077B2 (en) Virtual path storage system and control method for the same
JP4014923B2 (en) Shared memory control method and control system
JP4087072B2 (en) Storage system and virtual private volume control method
JP3997061B2 (en) Storage subsystem and storage subsystem control method
US20180337995A1 (en) System and method for sharing san storage
US7484058B2 (en) Reactive deadlock management in storage area networks
US20120110397A1 (en) Data transmission system, storage medium and data transmission program
US20060271639A1 (en) Multipath control device and system
US20160216891A1 (en) Dynamic storage fabric
CN101383732A (en) Intelligent failback in a load-balanced networking environment
US20050262309A1 (en) Proactive transfer ready resource management in storage area networks
CN112346653A (en) Drive box, storage system and data transfer method
US9558149B2 (en) Dual system
US7240167B2 (en) Storage apparatus
US7003553B2 (en) Storage control system with channel control device having data storage memory and transfer destination circuit which transfers data for accessing target cache area without passing through data storage memory
US8417858B2 (en) System and method for enabling multiple processors to share multiple SAS wide ports
CN116841926A (en) Network interface and buffer control method thereof
US11095698B2 (en) Techniques for processing management messages using multiple streams
US7839875B1 (en) Method and system for an efficient transport loopback mechanism for TCP/IP sockets
US11880570B1 (en) Storage system, data transmission method, and network interface
US20220188259A1 (en) Data transfer system and system host
WO2022267909A1 (en) Method for reading and writing data and related apparatus
US9195410B2 (en) Storage system and access arbitration method

Legal Events

Date Code Title Description
AS Assignment

Owner name: HITACHI, LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KUMAGAI, ATSUYA;MURAKAMI, TOSHIHIKO;REEL/FRAME:016796/0102

Effective date: 20050622

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION