US20060271639A1 - Multipath control device and system - Google Patents
Multipath control device and system Download PDFInfo
- Publication number
- US20060271639A1 US20060271639A1 US11/178,509 US17850905A US2006271639A1 US 20060271639 A1 US20060271639 A1 US 20060271639A1 US 17850905 A US17850905 A US 17850905A US 2006271639 A1 US2006271639 A1 US 2006271639A1
- Authority
- US
- United States
- Prior art keywords
- input
- storage device
- command
- transmission
- queue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/14—Multichannel or multilink protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
Definitions
- the present invention relates to a load distribution method for a computer system, and more particularly to a load distribution method for ports related to storage devices.
- SAN storage area network
- multipath technologies have been used in which redundant paths are used for issuing input/output requests.
- Through involvement of the multipath technologies it becomes possible to issue an input/output request even a trouble occurs on the path in use, by switching to another path, and to improve an input/output throughput by issuing an input/output request to a plurality of paths in accordance with predetermined rules.
- an apparatus which issues an input/output request to storage devices via a plurality of paths selects the paths
- a Round Robin algorithm of issuing an input/output request in accordance with an issue order decided before hand for each path.
- Other examples are a Least Queue Depth algorithm of issuing an input/output request to the path having the minimum number of input/output requests stored in the queue assigned to each path, and a Least Blocks algorithm of issuing a write request to the path having the minimum total sum of write blocks stored in the queue assigned to each path.
- the Least Blocks algorithm among others are characterized in that the amount of future transmission data is predicted from the number of write request blocks stored in the queue, so that the transmission data amounts on paths can be smoothed. Refer to “iSCSI Management API” by SNIA.
- All conventional techniques do not predict a reception data amount on each path. A large difference of data amounts may occur among paths, or if a transmission load to be caused by write requests is heavy, a read request cannot be issued although the reception load is low.
- an apparatus which issues an input/output request predicts not only a transmission data amount to be formed by write requests in a transmission queue but also a reception data amount to be formed by read requests in the transmission queue.
- the apparatus which issues an input/output request stores a newly generated write request in the queue having the minimum predicted transmission data amount, and stores a newly generated read request in the queue having the minimum predicted reception data amount.
- the apparatus which issues an input/output request predicts the data transmission amount and data reception amount at each port to be formed by a received write request and read request, respectively, and adds the predicted amounts to a data transmission amount and data reception amount at each port predicted to be formed by a write request and read request to be issued from the apparatus.
- the transmission data amounts and reception data amounts on paths can be smoothed at the same time so that a data input/output throughput can be improved. Even if the number of paths is single, a read request can be issued to a storage so as not to be over a data reception ability at the port of the apparatus which issues an input/output request.
- FIG. 1 is a diagram showing the configuration of a computer system according to a first embodiment of the invention.
- FIG. 2 is a diagram showing the contents of a memory of a storage device of the embodiment.
- FIG. 3 is a diagram showing examples of a transmission queue.
- FIG. 4 is a diagram showing examples of a reception queue.
- FIG. 5 is a diagram showing an example of data transmission/reception amount information.
- FIG. 6 is a diagram showing an example of command management information.
- FIG. 7 is a diagram showing an example of target information.
- FIG. 8 is a diagram showing the structure of a memory of a management terminal of the embodiment.
- FIG. 9 is a flow chart illustrating a command forwarding process to be executed by a command forwarding program of the embodiment.
- FIG. 10 is a flow chart illustrating a process to be executed by an initiator program of the embodiment.
- FIG. 11 is a flow chart illustrating another command forwarding process to be executed by the command forwarding program of the embodiment.
- FIG. 12 is a flow chart illustrating a process to be executed by a target program of the embodiment.
- FIG. 13 is a diagram showing the configuration of a computer system according to a second embodiment.
- FIG. 14 is a flow chart illustrating a command reception process to be executed by a target program of the second embodiment.
- FIG. 15 is a flow chart illustrating a response transmission process to be executed by the target program of the second embodiment.
- FIG. 16 is a flow chart illustrating a command transmission process to be executed by a command issue program according to a third embodiment.
- the present invention is applied to a computer system in which a storage device transfers a SCSI command received from a host computer to an external storage device.
- FIG. 1 is a diagram showing the configuration of a computer system of the first embodiment.
- the computer system of the first embodiment has a storage device 100 , an external storage device 110 , a plurality of hosts 130 , and a management terminal 150 .
- the storage device 100 and external storage device 110 are interconnected via a network 120 such as the Internet.
- the storage device 100 and a plurality of hosts 130 are connected via a network 140 .
- the storage device 100 is connected to the management terminal 150 .
- the host 130 is an information processing apparatus (host computer) which executes an application involving data input/output of the storage device 100 .
- the storage device 100 has a CPU 101 , a memory 102 , a cache 103 for temporarily storing data to speed up accesses, a disk controller 104 , one or more disks 105 , ports 106 , a management port 108 , and a bus 109 interconnecting these devices.
- the CPU 101 performs various processes to be described later, by executing programs stored in the memory 102 .
- the memory 102 stores programs and data to be described later.
- the cache 103 temporarily stores write data.
- the disk controller 104 controls data input/output of the disks 105 .
- the disk controller 104 may perform processes corresponding to Redundant Array of Independent Disks (RAID).
- the disk 105 stores data read/written by the host 130 .
- a non-volatile memory 107 stores programs and data to be stored into the memory 102 when the storage device 100 is activated.
- the ports 106 are mechanisms such as network cards for connecting local area network (LAN) cables to the storage device 100 , and execute data transmission/reception processes relative to external devices via the networks 120 and 140 .
- LAN local area network
- the storage device 100 may have three or more ports 106 .
- the management port 108 connects the management terminal 150 to the storage device 100 .
- the storage device 100 has a relay function of transferring an input/output request issued from the host 130 to the external storage device 110 via the network 120 and transferring a response and data received from the external storage device 110 to the host 130 .
- the external storage device 130 has a structure similar to that of the storage device 100 , excepting the relay function.
- the host 130 has an initiator function of the iSCSI protocol.
- the storage device 100 has a target function and an initiator function.
- the external storage device 110 has a target function.
- FIG. 2 shows programs and data stored in the memory 102 .
- the memory 102 stores an initiator program 201 , a target program 202 , a command forwarding program 203 , a transmission queue 204 , a reception queue 205 , data transmission/reception amount information 206 , command management information 207 , target information 208 , a redundant path control program 209 and an initializing program 210 .
- the initiator program 201 is a program for encapsulating a SCSI command and data into an iSCSI PDU, extracting a SCSI response from an iSCSI PDU, and transmitting/receiving an iSCSI PDU to/from an external iSCSI target, in accordance with the iSCSI protocol.
- the initiator program 201 extracts the SCSI response from the iSCSI PDU and stores it in the reception queue 205 .
- the transmission operation of an iSCSI command will be later detailed.
- the target program 202 performs mutual exchange between the SCSI command and data and the iSCSI PDU and transmits/receives an iSCSI PDU.
- the target program 202 extracts the SCSI command from the iSCSI PDU and stores it in the reception queue 205 , and further the target program 202 adds an iSCSI header to the SCSI response stored in the top entry of the transmission queue 204 to be described later and transmits it to the host 130 . This operation will be detailed later.
- the command forwarding program 203 stores the SCSI command stored in the top entry of the reception queue 205 in the transmission queue 204 , and stores the SCSI response received by the initiator program 201 in the transmission queue 204 . This operation will be detailed later.
- the redundant path control program 209 and initializing program 210 will be described later.
- the transmission queue 204 is an area in the memory 102 for storing the SCSI command or SCSI response to be transmitted, and defined at each port.
- the storage device 100 since the storage device 100 has three ports 106 , there are three transmission queues 204 a , 204 b and 204 c corresponding to the ports 106 a , 106 b and 106 c , respectively.
- FIG. 3 shows examples of the transmission queues 204 a , 204 b and 204 c .
- an area 301 in the transmission queue 204 is the top entry in the memory area, and entries 302 , 303 , and 304 are defined in this order.
- the initiator program 201 or target program 202 reads the SCSI command or SCSI response stored in the top entry of the transmission queue 204 and deletes it and the order of the SCSI commands or SCSI responses stored at the second and subsequent entries is raised by one entry up.
- a write request for two blocks is stored in the top entry of the transmission queue 204 a .
- the block size is set to 512 bytes.
- the reception queue 205 is an area in the memory 102 for storing the received SCSI command or SCSI response defined at each port.
- FIG. 4 shows examples of the reception queues 205 a , 205 b and 205 c .
- an area 401 in the reception queue 205 is the top entry in the memory area, and entries 402 , 403 , and 404 are defined in this order.
- the initiator program 201 or target program 202 reads the SCSI command or SCSI response stored in the top entry of the reception queue 205 and deletes it and the order of the SCSI commands or SCSI responses stored in the second and subsequent entries is raised by one entry up.
- a Read command and Write commands for the external storage device 110 are stored in the transmission queues 204 a and 204 b , and the data reception amount of the Read command and the data transmission amounts of the Write commands are shown.
- a Read response and Write responses received from the external storage device 110 are stored in the reception queues 205 a and 205 b .
- the Read response is response data to the Read command.
- Write commands received from the host 130 are stored in the reception queue 205 c , and data reception amounts of the Write commands are shown.
- a Read response to be transmitted to the host 130 is stored in the transmission queue 204 c.
- the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 4, it is sufficient that the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 1 or more.
- FIG. 5 is a diagram showing examples of the data transmission/reception amount information 206 .
- the data transmission/reception amount information 206 is stored in a table constituted of a combination of information on a port identifier 501 , a transmission byte number 502 , a reception byte number 503 and initiator assignment information 504 .
- the port identifier 501 is a name for identifying the port.
- the transmission byte number 502 indicates the number of bytes of transmission data formed by the SCSI Write stored in the queue.
- the reception byte number 503 indicates the number of bytes of reception data formed by the SCSI Read stored in the queue.
- the initiator assignment information 504 indicates whether the initiator program 201 is assigned.
- the value “1” in a cell 505 means that the initiator program 201 can issue an input/output request from the port b.
- the value “0” in a cell 506 means that the initiator program 201 cannot issue an input/output request from the port c.
- a cell 507 indicates that the total sum of the requested data amount by the SCSI Read stored in the transmission queue 204 b is 2048 bytes.
- FIG. 6 is a diagram showing examples of the command management information 207 .
- the command management information 207 is stored in a table constituted of a combination of information on a command tag 601 , an initiator name 602 and a target name 603 .
- the command tag 601 is a number for identifying the SCSI command.
- the initiator name 602 is a name of an initiator issuing the SCSI command.
- the target name 603 is a name of a target issuing the SCSI command.
- the examples shown in FIG. 6 show that an initiator I 1 issues a SCSI commands 11 and 12 to a target T 1 .
- the item corresponding to the SCSI command for which the response is completed is deleted from the command management information 207 .
- An input/output request in the state that the SCSI command managed by the command management information has already been transmitted and a corresponding SCSI response is not still received, is called an outstanding I/O.
- An upper limit of the number of outstanding I/Os at the same time instant is preset, and the initiator program 201 controls so that the number of outstanding I/Os at the same instant does not exceed the upper limit. This upper limit is called the maximum number of outstanding I/Os.
- FIG. 7 shows examples of the target information 208 .
- the target information 208 is stored in a table constituted of a combination of information on a target name 701 and a location 702 .
- the target name 701 is a name for identifying the target.
- the location 702 is a location of the target identified by a host name, an IP address, a TCP port number and the like. The examples shown in FIG.
- a target “localtarget” operates at the position identified by an IP address of 192.168.1.1 and a TCP port number 3260 , i.e., at the storage device 100 and that a target “remotetarget” existing in the external storage operates at the position identified by an IP address of 192.168.2.2 and a TCP port number 3260 and at the position identified by an IP address of 192.168.3.2 and a TCP port number 3260 .
- the redundant path control program 209 allows the management terminal 150 to set a load distribution algorithm or the like via the management port 108 .
- the redundant path control program 209 can set the algorithm of the present invention as well as other algorithms such as Round Robin, Least Queue Depth and Least Blocks.
- the initializing program 210 initializes the data transmission/reception amount information 206 shown in FIG. 5 , the command management information 207 shown in FIG. 6 and the target information 208 shown in FIG. 208 .
- CPU 101 executes the initializing program 210 stored in the memory 102 to thereby initialize the data transmission/reception amount information 206 , command management information 207 and target information 208 .
- the management terminal 150 is a personal computer or the like for performing setting works for the storage device 100 .
- the management terminal 150 has a CPU 151 , a memory 152 , a non-volatile memory 153 , an input unit 154 , an output unit 155 , a port 156 and a bus 157 interconnecting these devices.
- CPU 151 performs processes to be described layer, by executing programs stored in the memory 152 .
- the memory 152 stores programs and data to be described later.
- the non-volatile memory 153 stores programs and data to be stored in the memory 152 when the management terminal 150 is activated.
- the port 156 is a mechanism such as a network card for connecting a local area network (LAN) cable to the management terminal 150 , and performs data transmission/reception processes relative to the storage device 100 via a LAN.
- LAN local area network
- FIG. 8 shows a program stored in the memory 152 of the management terminal 150 .
- a redundant path setting program 901 is stored in the memory 152 .
- the redundant path setting program 901 sets the load distribution algorithm or the like to the storage device 100 .
- the redundant path setting program 901 notifies the redundant path control program 209 of the load distribution algorithm selected from the input unit 154 .
- FIG. 9 is a flow chart illustrating a process to be executed when the command forwarding program 203 transfers a SCSI command. This process starts when CPU 101 executes the command forwarding program 203 stored in the memory 102 . If a SCSI command is not stored in the top entry of the reception queue 205 c (S 801 : No), the command forwarding program 203 does not perform the command forwarding process until a SCSI command is stored in the top entry of the reception queues 205 c . If a SCSI command is stored in the top entry of the reception queue 205 c (S 801 : Yes), the command forwarding program 203 refers to the target information 208 to judge whether the SCSI command is destined to the external storage device 110 (S 802 ). If the SCSI command is not destined to the external storage device 110 (S 802 : No), the command forwarding program transfers the SCSI command to the disk controller 104 (S 803 ) to thereafter advance to S 808 .
- the SCSI command is destined to the external storage device 110 (S 802 : Yes)
- the command forwarding program 203 refers to the data transmission/reception amount information 206 to store the SCSI command in the transmission queue corresponding to the port having the minimum transmission byte number 502 , among the ports having the initiator assignment information 504 of “1” (S 806 ).
- the command forwarding program 203 updates the data transmission/reception amount information 206 in accordance with the data transmission/reception amount of the SCSI command stored in the transmission queue 204 (S 807 ). Namely, in the case of the SCSI Read command, a data reception amount to be received by this command is added to the reception byte number 503 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206 . In the case of the SCSI Write command, a data transmission amount to be transmitted by this command is added to the transmission byte number 502 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206 .
- the command forwarding program 203 further erases the top entry of the reception queue 205 storing the SCSI command stored in the transmission queue, advances, by one entry toward the top entry side, the storage location of each command stored in the second and subsequent entries (S 808 ).
- the command forwarding program 203 stores the SCSI Read command in the transmission queue 204 a at Step S 805 .
- the command forwarding program 203 stores the SCSI Write command in the transmission queue 204 b at Step S 806 .
- FIG. 10 is a flow chart illustrating a process to be executed when the initiator program 201 transmits a SCSI command. This process starts when CPU 101 executes the initiator program 201 stored in the memory 102 . If a SCSI command is stored in the top entry of the transmission queue 204 a or 204 b (S 1001 : Yes), the initiator program 201 refers to the command management information 207 to judge whether the current number of outstanding I/Os is smaller than the maximum number of outstanding I/Os (S 1002 ). If a SCSI command is not stored in the top entry of the transmission queue 204 (S 1001 : No), the initiator program 201 does not perform the command transmission process until a SCSI command is stored in the top entry of the transmission queue 204 .
- the initiator program 201 adds a header to the SCSI command and data to generate an iSCSI PDU (S 1003 ), divides the iSCSI PDU into Ethernet frames, transmits the Ethernet frames from the port 106 corresponding to the transmission queue 204 (S 1004 ), and adds an entry of the SCSI command to the command management information 207 (S 1005 ).
- the initiator program enters a standby state until the current number of outstanding I/Os becomes smaller than the maximum number of outstanding I/Os.
- the maximum number of outstanding I/Os is set to 4
- the maximum number of outstanding I/Os is not limited to this unless it exceeds the maximum number of commands capable of being stored in the transmission queue.
- the initiator program 201 deletes the SCSI command transmitted from the transmission queue 204 and the location position of each SCSI command stored in the second and subsequent entries is advanced by one entry up (S 1006 ).
- the initiator program 201 updates the data transmission/reception amount information 206 (S 1007 ).
- the transmitted data amount is subtracted from the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port from which the command was transmitted.
- the data transmission/reception amount information 206 is not updated.
- FIG. 11 is a flow chart illustrating a process to be executed when the command forwarding program 203 transfers a SCSI response and data. This process starts when CPU 101 executes the command forwarding program 203 stored in the memory 102 . If a SCSI response and data are stored in the top entry of the reception queue 205 a or 205 b (S 1101 : Yes), the command forwarding program 203 stores the SCSI response in the transmission queue 204 corresponding to the port whereat the corresponding SCSI command was received (S 1102 ).
- the command forwarding program 203 further deletes the SCSI response stored in the reception queue 205 a or 205 b from which the SCSI response was extracted, and advances, by one entry toward the top entry, the location position of each SCSI command in the second and subsequent entries (S 1103 ).
- the command forwarding program 203 updates the data transmission/reception amount information 206 (S 1104 ).
- the received data amount is subtracted from the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port at which the response was received.
- the data transmission/reception amount information 206 is not updated.
- the command forwarding program 203 does not perform the response transfer process until a SCSI response is stored in the top entry of the transmission queue 204 .
- FIG. 12 is a flow chart illustrating a process to be executed when the target program 202 transmits a SCSI response and Read data. This process starts when CPU 101 executes the target program 202 stored in the memory 102 . If a SCSI response and data are stored in the top entry of the transmission queue 204 (S 1201 : Yes), the target program 202 generates an iSCSI PDU from the SCSI response and data (S 1202 ), transmits the generated iSCSI PDU from the port (S 1203 ) and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S 1204 ). Next, the target program 202 deletes the SCSI response transmitted from the transmission queue 204 , and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S 1205 ).
- the target program 202 does not perform the response transmission process until a SCSI command is stored in the top entry of the transmission queue 204 .
- the first embodiment it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by the iSCSI initiator operating in the storage device 100 .
- the storage device 100 transmits/receives the SCSI command and SCSI response to/from the host 130 via a single port 106 c .
- the present invention is also applicable to the case in which the storage device 100 transmits/receives the SCSI command and SCSI response to/from the host 130 via two or more ports. This will be detailed in the third embodiment.
- the storage device 100 uses the port 106 only for transmitting a SCSI command and receiving a SCSI response.
- the ports 106 a and 106 b are used only by an initiator and do not receive a SCSI command
- the port 106 c is used only for a target and does not transmit a SCSI command, limiting the role of each port.
- a load distribution can be conducted by considering only the load of the transmission port.
- the storage device 100 uses the port 106 for transmission/reception of a SCSI command and a SCSI response.
- FIG. 13 is a diagram showing the configuration of a second embodiment of a computer system.
- the devices and programs constituting this system are similar to those of the first embodiment, excepting that the same network 120 interconnecting the storage device 100 and hosts 130 is used for interconnecting the storage device 100 and external storage device 110 and that the operation of the target program 202 is modified.
- the role of each port 106 is not limited as in the case of the first embodiment. In the second embodiment, therefore, the load distribution among the ports is conducted by considering the loads of both the transmission and reception ports.
- FIG. 14 is a flow chart illustrating a process to be executed when the target program 202 receives an iSCSI PDU. This process starts when CPU 101 executes the target program 202 stored in the memory 102 .
- the target program 202 extracts an SCSI command and data from the iSCSI PDU (S 1402 ).
- the target program 202 further adds an entry of the SCSI command to the command management information 207 (S 1403 ) and adds the SCSI command to the bottom entry of the reception queue 205 (S 1404 ).
- the target program 202 updates the data transmission/reception amount information 206 (S 1405 ).
- the received SCSI command is a SCSI Read command
- a data transmission amount to be transmitted by the command is added to the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port received the command.
- a data reception amount to be received by the command is added to the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port received the command.
- the port 106 a receives a SCSI Read command requesting data of 1024 bytes, the value “2048” of the reception byte number 502 is rewritten to “3072”.
- the target program 202 does not perform the PDU transmission process until an iSCSI PDU is received.
- FIG. 15 is a flow chart illustrating a process to be executed when the target program 202 transmits a SCSI response and Read data. This process starts when CPU 101 executes the target program 202 stored in the memory 102 . If a SCSI response is stored in the top entry of the transmission queue 204 (S 1501 Yes), the target program 202 generates an iSCSI PDU from the SCSI response and data (S 1502 ), transmits the generated iSCSI PDU from the corresponding port (S 1503 ), and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S 1504 ).
- the target program 202 further deletes the SCSI response stored in the top entry of the transmission queue 204 and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S 1505 ). Then, the target program 202 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S 1506 ). Namely, in the case of a Read response, a data transmission amount by the command is subtracted from the transmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port. In the case of a Write response, a data reception amount by the command is subtracted from the reception byte number 503 in the data transmission/reception amount information 206 corresponding to the port.
- the target program 202 does not perform the response transmission process until a SCSI response is stored in the top entry of the transmission queue.
- the second embodiment it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by an iSCSI initiator and an iSCSI target operating in the storage device 100 .
- the third embodiment is characterized in a port load distribution control on the side of a host 130 when the storage device 100 transmits/receives a SCSI command and a SCSI response to/from the host via two or more ports.
- the host 130 is provided with a command issue program 211 in place of the command forwarding program 203 .
- the initiator program 201 performs the process shown in FIG. 10 , excluding S 1002 and S 1005 .
- the programs and control information constituting the second embodiment are used without modification.
- FIG. 16 is a flow chart illustrating a process to be executed when the command issue program 211 issues a SCSI command. This process starts when the host 130 executes the command issue program 211 stored in a memory. If a SCSI command is not stored in the top entry of a SCSI buffer (S 1601 : No), the command issue program 211 does not perform the command transmission process until a SCSI command is stored in the top entry of the SCSI buffer. If a SCSI command is stored in the top entry of a SCSI buffer (S 1601 : Yes), the command issue program 211 judges whether the SCSI command is a SCSI Read (S 1602 ).
- the command issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI Read command in the transmission queue 204 corresponding to the port having the minimum reception byte number 503 among the ports having the initiator assignment information 504 of “1” (S 1603 ). If the SCSI command is not a SCSI Read (S 1602 : No), the command issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI command in the transmission queue 204 corresponding to the port having the minimum transmission byte number 502 among the ports having the initiator assignment information 504 of “1” (S 1604 ).
- the command issue program 211 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S 1605 ).
- the process S 1605 is similar to the process S 807 .
- the command issue program 211 further deletes the top entry of the SCSI buffer storing the transferred SCSI command, and advances by one entry toward the top entry the storage location of each SCSI command stored in the second and subsequent entries (S 1606 ).
- the command issue program 211 executes the processes S 1103 and S 1104 .
- SAN is configured by an IP network, and a SCSI command and data are transmitted/received in accordance with the iSCSI protocol.
- the present invention is not limited thereto, but the present invention may adopt other protocols such as a Fibre Channel if the protocol can perform data input/output relative to the storage device.
Abstract
In a storage device having redundant input/output paths, both a transmission data amount and a reception data amount are smoothed among paths. A storage device predicts not only a transmission data amount to be formed by an output request in a transmission queue but also a reception data amount to be formed by an input request in the transmission queue. The storage device stores a newly occurred output request in a queue having a minimum predicted transmission data amount and stores a newly occurred input request in a queue having a minimum predicted reception data amount. In a storage device having redundant input/output paths, transmission data amounts and reception data amounts can be smoothed among the paths.
Description
- The present application claims priority from Japanese application JP2005-147799 filed on May 20, 2005, the content of which is hereby incorporated by reference into this application.
- The present invention relates to a load distribution method for a computer system, and more particularly to a load distribution method for ports related to storage devices.
- In a conventional storage area network (SAN) connecting server computers and storage devices via a dedicated network, multipath technologies have been used in which redundant paths are used for issuing input/output requests. Through involvement of the multipath technologies, it becomes possible to issue an input/output request even a trouble occurs on the path in use, by switching to another path, and to improve an input/output throughput by issuing an input/output request to a plurality of paths in accordance with predetermined rules.
- As an example of the algorithm that an apparatus which issues an input/output request to storage devices via a plurality of paths selects the paths, there is a Round Robin algorithm of issuing an input/output request in accordance with an issue order decided before hand for each path. Other examples are a Least Queue Depth algorithm of issuing an input/output request to the path having the minimum number of input/output requests stored in the queue assigned to each path, and a Least Blocks algorithm of issuing a write request to the path having the minimum total sum of write blocks stored in the queue assigned to each path. The Least Blocks algorithm among others are characterized in that the amount of future transmission data is predicted from the number of write request blocks stored in the queue, so that the transmission data amounts on paths can be smoothed. Refer to “iSCSI Management API” by SNIA.
- All conventional techniques do not predict a reception data amount on each path. A large difference of data amounts may occur among paths, or if a transmission load to be caused by write requests is heavy, a read request cannot be issued although the reception load is low.
- In order to solve these issues, an apparatus which issues an input/output request predicts not only a transmission data amount to be formed by write requests in a transmission queue but also a reception data amount to be formed by read requests in the transmission queue. The apparatus which issues an input/output request stores a newly generated write request in the queue having the minimum predicted transmission data amount, and stores a newly generated read request in the queue having the minimum predicted reception data amount.
- The apparatus which issues an input/output request predicts the data transmission amount and data reception amount at each port to be formed by a received write request and read request, respectively, and adds the predicted amounts to a data transmission amount and data reception amount at each port predicted to be formed by a write request and read request to be issued from the apparatus.
- According to the present invention, the transmission data amounts and reception data amounts on paths can be smoothed at the same time so that a data input/output throughput can be improved. Even if the number of paths is single, a read request can be issued to a storage so as not to be over a data reception ability at the port of the apparatus which issues an input/output request.
-
FIG. 1 is a diagram showing the configuration of a computer system according to a first embodiment of the invention. -
FIG. 2 is a diagram showing the contents of a memory of a storage device of the embodiment. -
FIG. 3 is a diagram showing examples of a transmission queue. -
FIG. 4 is a diagram showing examples of a reception queue. -
FIG. 5 is a diagram showing an example of data transmission/reception amount information. -
FIG. 6 is a diagram showing an example of command management information. -
FIG. 7 is a diagram showing an example of target information. -
FIG. 8 is a diagram showing the structure of a memory of a management terminal of the embodiment. -
FIG. 9 is a flow chart illustrating a command forwarding process to be executed by a command forwarding program of the embodiment. -
FIG. 10 is a flow chart illustrating a process to be executed by an initiator program of the embodiment. -
FIG. 11 is a flow chart illustrating another command forwarding process to be executed by the command forwarding program of the embodiment. -
FIG. 12 is a flow chart illustrating a process to be executed by a target program of the embodiment. -
FIG. 13 is a diagram showing the configuration of a computer system according to a second embodiment. -
FIG. 14 is a flow chart illustrating a command reception process to be executed by a target program of the second embodiment. -
FIG. 15 is a flow chart illustrating a response transmission process to be executed by the target program of the second embodiment. -
FIG. 16 is a flow chart illustrating a command transmission process to be executed by a command issue program according to a third embodiment. - Embodiments of the invention will be described with reference to the accompanying drawings.
- In the first embodiment, the present invention is applied to a computer system in which a storage device transfers a SCSI command received from a host computer to an external storage device.
-
FIG. 1 is a diagram showing the configuration of a computer system of the first embodiment. As shown, the computer system of the first embodiment has astorage device 100, anexternal storage device 110, a plurality ofhosts 130, and amanagement terminal 150. Thestorage device 100 andexternal storage device 110 are interconnected via anetwork 120 such as the Internet. Thestorage device 100 and a plurality ofhosts 130 are connected via anetwork 140. Thestorage device 100 is connected to themanagement terminal 150. - The
host 130 is an information processing apparatus (host computer) which executes an application involving data input/output of thestorage device 100. - The
storage device 100 has aCPU 101, amemory 102, acache 103 for temporarily storing data to speed up accesses, adisk controller 104, one ormore disks 105, ports 106, amanagement port 108, and abus 109 interconnecting these devices. -
CPU 101 performs various processes to be described later, by executing programs stored in thememory 102. Thememory 102 stores programs and data to be described later. Thecache 103 temporarily stores write data. Thedisk controller 104 controls data input/output of thedisks 105. Thedisk controller 104 may perform processes corresponding to Redundant Array of Independent Disks (RAID). Thedisk 105 stores data read/written by thehost 130. Anon-volatile memory 107 stores programs and data to be stored into thememory 102 when thestorage device 100 is activated. - The ports 106 are mechanisms such as network cards for connecting local area network (LAN) cables to the
storage device 100, and execute data transmission/reception processes relative to external devices via thenetworks storage device 100 has threeports storage device 100 may have three or more ports 106. Themanagement port 108 connects themanagement terminal 150 to thestorage device 100. - The
storage device 100 has a relay function of transferring an input/output request issued from thehost 130 to theexternal storage device 110 via thenetwork 120 and transferring a response and data received from theexternal storage device 110 to thehost 130. Theexternal storage device 130 has a structure similar to that of thestorage device 100, excepting the relay function. - The
host 130 has an initiator function of the iSCSI protocol. Thestorage device 100 has a target function and an initiator function. Theexternal storage device 110 has a target function. -
FIG. 2 shows programs and data stored in thememory 102. Thememory 102 stores aninitiator program 201, atarget program 202, acommand forwarding program 203, atransmission queue 204, areception queue 205, data transmission/reception amount information 206,command management information 207,target information 208, a redundantpath control program 209 and an initializingprogram 210. - The
initiator program 201 is a program for encapsulating a SCSI command and data into an iSCSI PDU, extracting a SCSI response from an iSCSI PDU, and transmitting/receiving an iSCSI PDU to/from an external iSCSI target, in accordance with the iSCSI protocol. When the port 106 receives an iSCSI PDU including a SCSI response, theinitiator program 201 extracts the SCSI response from the iSCSI PDU and stores it in thereception queue 205. The transmission operation of an iSCSI command will be later detailed. - The
target program 202 performs mutual exchange between the SCSI command and data and the iSCSI PDU and transmits/receives an iSCSI PDU. When the port 106 receives an iSCSI PDU, thetarget program 202 extracts the SCSI command from the iSCSI PDU and stores it in thereception queue 205, and further thetarget program 202 adds an iSCSI header to the SCSI response stored in the top entry of thetransmission queue 204 to be described later and transmits it to thehost 130. This operation will be detailed later. - The
command forwarding program 203 stores the SCSI command stored in the top entry of thereception queue 205 in thetransmission queue 204, and stores the SCSI response received by theinitiator program 201 in thetransmission queue 204. This operation will be detailed later. - The redundant path control
program 209 and initializingprogram 210 will be described later. - The
transmission queue 204 is an area in thememory 102 for storing the SCSI command or SCSI response to be transmitted, and defined at each port. In this embodiment, since thestorage device 100 has three ports 106, there are threetransmission queues ports FIG. 3 shows examples of thetransmission queues FIG. 3 , anarea 301 in thetransmission queue 204 is the top entry in the memory area, andentries initiator program 201 ortarget program 202 reads the SCSI command or SCSI response stored in the top entry of thetransmission queue 204 and deletes it and the order of the SCSI commands or SCSI responses stored at the second and subsequent entries is raised by one entry up. In the examples shown inFIG. 3 , a write request for two blocks is stored in the top entry of thetransmission queue 204 a. In this embodiment, the block size is set to 512 bytes. - The
reception queue 205 is an area in thememory 102 for storing the received SCSI command or SCSI response defined at each port. In this embodiment, similar to the transmission queue, there are threereception queues ports FIG. 4 shows examples of thereception queues transmission queues 204, anarea 401 in thereception queue 205 is the top entry in the memory area, andentries initiator program 201 ortarget program 202 reads the SCSI command or SCSI response stored in the top entry of thereception queue 205 and deletes it and the order of the SCSI commands or SCSI responses stored in the second and subsequent entries is raised by one entry up. - In the examples, a Read command and Write commands for the
external storage device 110 are stored in thetransmission queues external storage device 110 are stored in thereception queues host 130 are stored in thereception queue 205 c, and data reception amounts of the Write commands are shown. A Read response to be transmitted to thehost 130 is stored in thetransmission queue 204 c. - In the queues shown in
FIGS. 3 and 4 , although the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 4, it is sufficient that the maximum number of SCSI commands or SCSI responses capable of being stored in each queue is 1 or more. -
FIG. 5 is a diagram showing examples of the data transmission/reception amount information 206. The data transmission/reception amount information 206 is stored in a table constituted of a combination of information on aport identifier 501, atransmission byte number 502, areception byte number 503 andinitiator assignment information 504. Theport identifier 501 is a name for identifying the port. Thetransmission byte number 502 indicates the number of bytes of transmission data formed by the SCSI Write stored in the queue. Thereception byte number 503 indicates the number of bytes of reception data formed by the SCSI Read stored in the queue. Theinitiator assignment information 504 indicates whether theinitiator program 201 is assigned. The value “1” in acell 505 means that theinitiator program 201 can issue an input/output request from the port b. The value “0” in acell 506 means that theinitiator program 201 cannot issue an input/output request from the port c. Acell 507 indicates that the total sum of the requested data amount by the SCSI Read stored in thetransmission queue 204 b is 2048 bytes. -
FIG. 6 is a diagram showing examples of thecommand management information 207. Thecommand management information 207 is stored in a table constituted of a combination of information on acommand tag 601, aninitiator name 602 and atarget name 603. Thecommand tag 601 is a number for identifying the SCSI command. Theinitiator name 602 is a name of an initiator issuing the SCSI command. Thetarget name 603 is a name of a target issuing the SCSI command. The examples shown inFIG. 6 show that an initiator I1 issues a SCSI commands 11 and 12 to a target T1. The item corresponding to the SCSI command for which the response is completed is deleted from thecommand management information 207. An input/output request, in the state that the SCSI command managed by the command management information has already been transmitted and a corresponding SCSI response is not still received, is called an outstanding I/O. An upper limit of the number of outstanding I/Os at the same time instant is preset, and theinitiator program 201 controls so that the number of outstanding I/Os at the same instant does not exceed the upper limit. This upper limit is called the maximum number of outstanding I/Os. -
FIG. 7 shows examples of thetarget information 208. Thetarget information 208 is stored in a table constituted of a combination of information on atarget name 701 and alocation 702. Thetarget name 701 is a name for identifying the target. Thelocation 702 is a location of the target identified by a host name, an IP address, a TCP port number and the like. The examples shown inFIG. 7 show that a target “localtarget” operates at the position identified by an IP address of 192.168.1.1 and aTCP port number 3260, i.e., at thestorage device 100 and that a target “remotetarget” existing in the external storage operates at the position identified by an IP address of 192.168.2.2 and aTCP port number 3260 and at the position identified by an IP address of 192.168.3.2 and aTCP port number 3260. - The redundant path control
program 209 allows themanagement terminal 150 to set a load distribution algorithm or the like via themanagement port 108. The redundant path controlprogram 209 can set the algorithm of the present invention as well as other algorithms such as Round Robin, Least Queue Depth and Least Blocks. - The
initializing program 210 initializes the data transmission/reception amount information 206 shown inFIG. 5 , thecommand management information 207 shown inFIG. 6 and thetarget information 208 shown inFIG. 208 . In executing an initializing process for the storage device at the time when a power supply of thestorage device 100 is turned on or at other times,CPU 101 executes theinitializing program 210 stored in thememory 102 to thereby initialize the data transmission/reception amount information 206,command management information 207 andtarget information 208. - The
management terminal 150 is a personal computer or the like for performing setting works for thestorage device 100. Themanagement terminal 150 has aCPU 151, amemory 152, anon-volatile memory 153, aninput unit 154, anoutput unit 155, aport 156 and abus 157 interconnecting these devices.CPU 151 performs processes to be described layer, by executing programs stored in thememory 152. Thememory 152 stores programs and data to be described later. Thenon-volatile memory 153 stores programs and data to be stored in thememory 152 when themanagement terminal 150 is activated. Theport 156 is a mechanism such as a network card for connecting a local area network (LAN) cable to themanagement terminal 150, and performs data transmission/reception processes relative to thestorage device 100 via a LAN. -
FIG. 8 shows a program stored in thememory 152 of themanagement terminal 150. A redundantpath setting program 901 is stored in thememory 152. - The redundant
path setting program 901 sets the load distribution algorithm or the like to thestorage device 100. The redundantpath setting program 901 notifies the redundant path controlprogram 209 of the load distribution algorithm selected from theinput unit 154. - Next, description will be made on the operation of the computer system and each process to be executed by the
storage device 100. - First, with reference to
FIG. 9 , description will be made on an operation to be performed when thestorage device 100 transfers a SCSI command and data received from thehost 130 to theexternal storage device 110. -
FIG. 9 is a flow chart illustrating a process to be executed when thecommand forwarding program 203 transfers a SCSI command. This process starts whenCPU 101 executes thecommand forwarding program 203 stored in thememory 102. If a SCSI command is not stored in the top entry of thereception queue 205 c (S801: No), thecommand forwarding program 203 does not perform the command forwarding process until a SCSI command is stored in the top entry of thereception queues 205 c. If a SCSI command is stored in the top entry of thereception queue 205 c (S801: Yes), thecommand forwarding program 203 refers to thetarget information 208 to judge whether the SCSI command is destined to the external storage device 110 (S802). If the SCSI command is not destined to the external storage device 110 (S802: No), the command forwarding program transfers the SCSI command to the disk controller 104 (S803) to thereafter advance to S808. - If the SCSI command is destined to the external storage device 110 (S802: Yes), it is checked whether the SCSI command is a SCSI Read (S804). If the SCSI command is the SCSI Read (S804: Yes), the
command forwarding program 203 refers to the data transmission/reception amount information 206 to store the SCSI Read command in thetransmission queue 204 corresponding to the port having the minimumreception byte number 503, among the ports having theinitiator assignment information 504 of “1” (S805). If the SCSI command is not the SCSI Read (S804: No), it is either a SCSI Write command or other commands. Therefore, thecommand forwarding program 203 refers to the data transmission/reception amount information 206 to store the SCSI command in the transmission queue corresponding to the port having the minimumtransmission byte number 502, among the ports having theinitiator assignment information 504 of “1” (S806). - After the process S805 or S806, the
command forwarding program 203 updates the data transmission/reception amount information 206 in accordance with the data transmission/reception amount of the SCSI command stored in the transmission queue 204 (S807). Namely, in the case of the SCSI Read command, a data reception amount to be received by this command is added to thereception byte number 503 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206. In the case of the SCSI Write command, a data transmission amount to be transmitted by this command is added to thetransmission byte number 502 at the port corresponding to the transmission queue selected by the data transmission/reception amount information 206. In the case of the other commands, since the data transmission/reception amount can be neglected, the data transmission/reception amount information 206 will not be updated. Thecommand forwarding program 203 further erases the top entry of thereception queue 205 storing the SCSI command stored in the transmission queue, advances, by one entry toward the top entry side, the storage location of each command stored in the second and subsequent entries (S808). - For example, assuming that the SCSI Read command of 1024 bytes is stored in the top entry of the
reception queue 205 c, since thereception byte number 503 corresponding to thetransmission queue 204 a and shown inFIG. 5 is minimum, thecommand forwarding program 203 stores the SCSI Read command in thetransmission queue 204 a at Step S805. - For example, assuming that the SCSI Write command of 1024 bytes is stored in the top entry of the
reception queue 205 c, since thetransmission byte number 502 corresponding to thetransmission queue 204 a and shown inFIG. 5 is minimum, thecommand forwarding program 203 stores the SCSI Write command in thetransmission queue 204 b at Step S806. -
FIG. 10 is a flow chart illustrating a process to be executed when theinitiator program 201 transmits a SCSI command. This process starts whenCPU 101 executes theinitiator program 201 stored in thememory 102. If a SCSI command is stored in the top entry of thetransmission queue initiator program 201 refers to thecommand management information 207 to judge whether the current number of outstanding I/Os is smaller than the maximum number of outstanding I/Os (S1002). If a SCSI command is not stored in the top entry of the transmission queue 204 (S1001: No), theinitiator program 201 does not perform the command transmission process until a SCSI command is stored in the top entry of thetransmission queue 204. If it is judged in the process S1002 that the current number of outstanding I/Os is smaller than the maximum number of outstanding I/Os (S1002: Yes), theinitiator program 201 adds a header to the SCSI command and data to generate an iSCSI PDU (S1003), divides the iSCSI PDU into Ethernet frames, transmits the Ethernet frames from the port 106 corresponding to the transmission queue 204 (S1004), and adds an entry of the SCSI command to the command management information 207 (S1005). - If the current number of outstanding I/Os is equal to the maximum number of outstanding I/Os (S1002: No), the initiator program enters a standby state until the current number of outstanding I/Os becomes smaller than the maximum number of outstanding I/Os. In this embodiment, although the maximum number of outstanding I/Os is set to 4, the maximum number of outstanding I/Os is not limited to this unless it exceeds the maximum number of commands capable of being stored in the transmission queue.
- After the process S1005, the
initiator program 201 deletes the SCSI command transmitted from thetransmission queue 204 and the location position of each SCSI command stored in the second and subsequent entries is advanced by one entry up (S1006). Next, theinitiator program 201 updates the data transmission/reception amount information 206 (S1007). In the case of a SCSI Write command, the transmitted data amount is subtracted from thetransmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port from which the command was transmitted. In the case of the SCSI Read command, the data transmission/reception amount information 206 is not updated. - Next, with reference to
FIG. 11 , description will be made on an operation to be performed when thestorage device 100 transfers the SCSI response and Read data received from theexternal storage device 110 to thehost 130. -
FIG. 11 is a flow chart illustrating a process to be executed when thecommand forwarding program 203 transfers a SCSI response and data. This process starts whenCPU 101 executes thecommand forwarding program 203 stored in thememory 102. If a SCSI response and data are stored in the top entry of thereception queue command forwarding program 203 stores the SCSI response in thetransmission queue 204 corresponding to the port whereat the corresponding SCSI command was received (S1102). Thecommand forwarding program 203 further deletes the SCSI response stored in thereception queue command forwarding program 203 updates the data transmission/reception amount information 206 (S1104). In the case of a Read response, the received data amount is subtracted from thereception byte number 503 in the data transmission/reception amount information 206 corresponding to the port at which the response was received. In the case of a Write response, the data transmission/reception amount information 206 is not updated. - If a SCSI response is not stored in the top entry of the reception queue 205 (S1101: No), the
command forwarding program 203 does not perform the response transfer process until a SCSI response is stored in the top entry of thetransmission queue 204. -
FIG. 12 is a flow chart illustrating a process to be executed when thetarget program 202 transmits a SCSI response and Read data. This process starts whenCPU 101 executes thetarget program 202 stored in thememory 102. If a SCSI response and data are stored in the top entry of the transmission queue 204 (S1201: Yes), thetarget program 202 generates an iSCSI PDU from the SCSI response and data (S1202), transmits the generated iSCSI PDU from the port (S1203) and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S1204). Next, thetarget program 202 deletes the SCSI response transmitted from thetransmission queue 204, and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S1205). - If a SCSI response is not stored in the top entry of the transmission queue 204 (S1201: No), the
target program 202 does not perform the response transmission process until a SCSI command is stored in the top entry of thetransmission queue 204. - According to the first embodiment, it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by the iSCSI initiator operating in the
storage device 100. - In the description of the first embodiment, the
storage device 100 transmits/receives the SCSI command and SCSI response to/from thehost 130 via asingle port 106 c. The present invention is also applicable to the case in which thestorage device 100 transmits/receives the SCSI command and SCSI response to/from thehost 130 via two or more ports. This will be detailed in the third embodiment. - In the description of the first embodiment, the
storage device 100 uses the port 106 only for transmitting a SCSI command and receiving a SCSI response. In other words, theports port 106 c is used only for a target and does not transmit a SCSI command, limiting the role of each port. In the first embodiment, therefore, a load distribution can be conducted by considering only the load of the transmission port. In the second embodiment, thestorage device 100 uses the port 106 for transmission/reception of a SCSI command and a SCSI response. -
FIG. 13 is a diagram showing the configuration of a second embodiment of a computer system. The devices and programs constituting this system are similar to those of the first embodiment, excepting that thesame network 120 interconnecting thestorage device 100 and hosts 130 is used for interconnecting thestorage device 100 andexternal storage device 110 and that the operation of thetarget program 202 is modified. In the second embodiment, the role of each port 106 is not limited as in the case of the first embodiment. In the second embodiment, therefore, the load distribution among the ports is conducted by considering the loads of both the transmission and reception ports. - In the following, description will be made on the operation of the computer system and a modified process in the
storage device 100. -
FIG. 14 is a flow chart illustrating a process to be executed when thetarget program 202 receives an iSCSI PDU. This process starts whenCPU 101 executes thetarget program 202 stored in thememory 102. As the port 106 receives an iSCSI PDU (S1401: Yes), thetarget program 202 extracts an SCSI command and data from the iSCSI PDU (S1402). Thetarget program 202 further adds an entry of the SCSI command to the command management information 207 (S1403) and adds the SCSI command to the bottom entry of the reception queue 205 (S1404). Then, thetarget program 202 updates the data transmission/reception amount information 206 (S1405). If the received SCSI command is a SCSI Read command, a data transmission amount to be transmitted by the command is added to thetransmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port received the command. If the received SCSI command is a SCSI Write command, a data reception amount to be received by the command is added to thereception byte number 503 in the data transmission/reception amount information 206 corresponding to the port received the command. - For example, if the
port 106 a receives a SCSI Read command requesting data of 1024 bytes, the value “2048” of thereception byte number 502 is rewritten to “3072”. - The
target program 202 does not perform the PDU transmission process until an iSCSI PDU is received. -
FIG. 15 is a flow chart illustrating a process to be executed when thetarget program 202 transmits a SCSI response and Read data. This process starts whenCPU 101 executes thetarget program 202 stored in thememory 102. If a SCSI response is stored in the top entry of the transmission queue 204 (S1501 Yes), thetarget program 202 generates an iSCSI PDU from the SCSI response and data (S1502), transmits the generated iSCSI PDU from the corresponding port (S1503), and deletes the entry of the SCSI command corresponding to the SCSI response from the command management information 207 (S1504). Thetarget program 202 further deletes the SCSI response stored in the top entry of thetransmission queue 204 and advances by one entry toward the top entry the storage location of each SCSI response stored in the second and subsequent entries (S1505). Then, thetarget program 202 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S1506). Namely, in the case of a Read response, a data transmission amount by the command is subtracted from thetransmission byte number 502 in the data transmission/reception amount information 206 corresponding to the port. In the case of a Write response, a data reception amount by the command is subtracted from thereception byte number 503 in the data transmission/reception amount information 206 corresponding to the port. - If a SCSI response is not stored in the top entry of the transmission queue 204 (S1501: No), the
target program 202 does not perform the response transmission process until a SCSI response is stored in the top entry of the transmission queue. - According to the second embodiment, it is possible to smooth transmission/reception data amounts at ports per unit time to be transmitted/received by an iSCSI initiator and an iSCSI target operating in the
storage device 100. - The third embodiment is characterized in a port load distribution control on the side of a
host 130 when thestorage device 100 transmits/receives a SCSI command and a SCSI response to/from the host via two or more ports. Thehost 130 is provided with acommand issue program 211 in place of thecommand forwarding program 203. Theinitiator program 201 performs the process shown inFIG. 10 , excluding S1002 and S1005. There is notarget program 202. Similar to the first embodiment, there exist thetransmission queue 204,reception queue 205 and data transmission/reception amount information 206. - As the structure on the
storage device 100 side of the third embodiment, the programs and control information constituting the second embodiment are used without modification. -
FIG. 16 is a flow chart illustrating a process to be executed when thecommand issue program 211 issues a SCSI command. This process starts when thehost 130 executes thecommand issue program 211 stored in a memory. If a SCSI command is not stored in the top entry of a SCSI buffer (S1601: No), thecommand issue program 211 does not perform the command transmission process until a SCSI command is stored in the top entry of the SCSI buffer. If a SCSI command is stored in the top entry of a SCSI buffer (S1601: Yes), thecommand issue program 211 judges whether the SCSI command is a SCSI Read (S1602). If the SCSI command is a SCSI Read (S1602: Yes), thecommand issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI Read command in thetransmission queue 204 corresponding to the port having the minimumreception byte number 503 among the ports having theinitiator assignment information 504 of “1” (S1603). If the SCSI command is not a SCSI Read (S1602: No), thecommand issue program 211 refers to the data transmission/reception amount information 206 and stores the SCSI command in thetransmission queue 204 corresponding to the port having the minimumtransmission byte number 502 among the ports having theinitiator assignment information 504 of “1” (S1604). After the process S1603 or S1604, thecommand issue program 211 updates the data transmission/reception amount information 206 in accordance with the contents of the SCSI command stored in the transmission queue 204 (S1605). The process S1605 is similar to the process S807. Thecommand issue program 211 further deletes the top entry of the SCSI buffer storing the transferred SCSI command, and advances by one entry toward the top entry the storage location of each SCSI command stored in the second and subsequent entries (S1606). - If a SCSI response exists in the top entry of the
reception queue 205, thecommand issue program 211 executes the processes S1103 and S1104. - In the description of the above embodiments, SAN is configured by an IP network, and a SCSI command and data are transmitted/received in accordance with the iSCSI protocol. The present invention is not limited thereto, but the present invention may adopt other protocols such as a Fibre Channel if the protocol can perform data input/output relative to the storage device.
Claims (15)
1. A storage device connected to another storage device via a network and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said other storage device, the storage device comprising:
selecting means for selecting, if said input/output request is an input request, a transmission queue having a minimum data reception amount to be formed by an input/output request or requests already stored in the transmission queue, and if said input/output request is an output request, a transmission queue having a minimum data transmission amount to be formed by an input/output request or requests already stored in the transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
2. The storage device according to claim 1 , wherein said transmission queue is provided in correspondence with a port for connecting said storage device to said other storage device.
3. The storage device according to claim 2 , further comprising a table, provided in a memory of the storage device, for storing said data transmission amount and said data reception amount at each of said ports.
4. A storage device connected to another storage device and a host computer via a network and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said other storage device, and a reception queue paired with each of said transmission queues for temporarily storeing an input/output request received from said host computer, the storage device comprising:
selecting means for selecting a transmission queue having a minimum total sum of a data transmission amount to be formed by an input request or requests stored in said reception queue and a data transmission amount to be formed by an output request or requests stored in said transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
5. The storage device according to claim 4 , wherein said transmission queue is provided in correspondence with a port for connecting said storage device to said other storage device.
6. The storage device according to claim 5 , further comprising a table, provided in a memory of the storage device, for storing said data transmission amount and said data reception amount at each of said ports.
7. A storage device comprising:
a CPU and a memory:
a plurality of ports, connected to an external storage device via a network, for transmitting/receiving an input/output command and a response;
a port, connected to a host computer via the network, for transmitting/receiving an input/output command and a response;
a plurality of transmission queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command to be transmitted to said external storage device;
a plurality of reception queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command received from said host computer;
data transmission/reception information provided in said memory and being representative of a data transmission amount and a data reception amount of data to be formed by said input/output command or commands stored in said transmission queue, for each of said ports; and
a command forwarding program, stored in said memory to be executed by said CPU, for selecting, if said input/output command to be stored in said reception queue is an input command, a transmission queue having a minimum data reception, and if said input/output command is an output command, a transmission queue having a minimum data transmission, forwarding said input/output command from said reception queue to said selected transmission queue, and increasing said data reception amount or said data transmission amount corresponding to said selected transmission queue by a data amount increased by said input/output command,
wherein said data transmission amount is reduced by a data amount increased if said output command is transmitted from said selected transmission queue via said port, and said data reception amount is reduced by a data amount increased if a response to said input command is received.
8. The storage device according to claim 7 , further comprising command management information, provided in said memory, being representative of an identifier of a pending input/output command, wherein if said input/output command is transmitted from said transmission queue via said port, the storage device registers said input/output command in said command management information, and if a response to said input/output command is received, the storage device deletes the identifier of said input/output command.
9. A host computer connected to a storage device via a network and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said storage device, the storage device comprising:
selecting means for selecting, if said input/output request is an input request, a transmission queue having a minimum data reception amount to be formed by an input/output request or requests already stored in the transmission queue, and if said input/output request is an output request, a transmission queue having a minimum data transmission amount to be formed by an input/output request or requests already stored in the transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
10. The host computer according to claim 9 , wherein said transmission queue is provided in correspondence with a port for connecting said host computer to said storage device.
11. The host computer according to claim 10 , further comprising a table, provided in a memory of the host computer, for storing said data transmission amount and said data reception amount at each of said ports.
12. A storage device comprising:
a CPU and a memory:
a plurality of ports, connected to a host computer and an external storage device via a network, for transmitting/receiving an input/output command and a response;
a plurality of transmission queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command to be transmitted to said external storage device;
a plurality of reception queues, provided in correspondence with said ports and in said memory, for temporarily storeing an input/output command received from said host computer;
data transmission/reception information provided in said memory and being representative of a data transmission amount and a data reception amount of data to be formed by said input/output command or commands stored in said transmission queue, for each of said ports; and
a command forwarding program, stored in said memory to be executed by said CPU, for selecting, if said input/output command to be stored in said reception queue is an input command, a transmission queue having a minimum data reception, and if said input/output command is an output command, a transmission queue having a minimum data transmission, forwarding said input/output command from said reception queue to said selected transmission queue, and increasing said data reception amount or said data transmission amount corresponding to said selected transmission queue by a data amount increased by said input/output command,
wherein said data transmission amount is reduced by a data amount increased when said output command is transmitted from said selected transmission queue via said port, and said data reception amount is reduced by a data amount increased when a response corresponding to said input command is received.
13. The storage device according to claim 12 , further comprising command management information, provided in said memory, being representative of an identifier of a pending input/output command, wherein if said input/output command is transmitted from said transmission queue via said port, the storage device registers said input/output command in said command management information, and if a response to said input/output command is received, the storage device deletes the identifier of said input/output command.
14. A computer system comprising:
a host computer, connected to a storage device via a network, for transmitting an input/output request to said storage device, said storage device connected via the network to another storage device and having a plurality of transmission queues for temporarily storeing an input/output request to be transmitted to said other storage device, said storage device comprising:
selecting means for selecting, if said input/output request received in said reception queue is an input request, a transmission queue having a minimum data reception amount to be formed by an input/output request or requests already stored in the transmission queue, and if said input/output request received in said reception queue is an output request, a transmission queue having a minimum data transmission amount to be formed by an input/output request or requests already stored in the transmission queue; and
means for storeing said input/output request to be transmitted in said selected transmission queue.
15. The computer system according to claim 14 , wherein said transmission queue is provided in correspondence with a port for connecting said storage device to said other storage device, and said reception queue is provided in correspondence with a port for connecting said storage device to said host computer.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005147799A JP2006323729A (en) | 2005-05-20 | 2005-05-20 | Device and system for performing multipath control |
JP2005-147799 | 2005-05-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060271639A1 true US20060271639A1 (en) | 2006-11-30 |
Family
ID=37464754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/178,509 Abandoned US20060271639A1 (en) | 2005-05-20 | 2005-07-12 | Multipath control device and system |
Country Status (2)
Country | Link |
---|---|
US (1) | US20060271639A1 (en) |
JP (1) | JP2006323729A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080141256A1 (en) * | 2006-12-08 | 2008-06-12 | Forrer Jr Thomas R | System and Method to Improve Sequential Serial Attached Small Computer System Interface Storage Device Performance |
US20090003361A1 (en) * | 2007-06-27 | 2009-01-01 | Emulex Design & Manufacturing Corporation | Multi-protocol controller that supports PCle, SAS and enhanced ethernet |
US8793399B1 (en) * | 2008-08-06 | 2014-07-29 | Qlogic, Corporation | Method and system for accelerating network packet processing |
US11044313B2 (en) | 2018-10-09 | 2021-06-22 | EMC IP Holding Company LLC | Categorizing host IO load pattern and communicating categorization to storage system |
US11050660B2 (en) * | 2018-09-28 | 2021-06-29 | EMC IP Holding Company LLC | Host device with multi-path layer implementing path selection based at least in part on fabric identifiers |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5533887B2 (en) * | 2010-02-10 | 2014-06-25 | 日本電気株式会社 | Storage device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6115387A (en) * | 1997-02-14 | 2000-09-05 | Advanced Micro Devices, Inc. | Method and apparatus for controlling initiation of transmission of data as a function of received data |
US6145028A (en) * | 1997-12-11 | 2000-11-07 | Ncr Corporation | Enhanced multi-pathing to an array of storage devices |
US6341315B1 (en) * | 1999-02-26 | 2002-01-22 | Crossroads Systems, Inc. | Streaming method and system for fiber channel network devices |
US20020129143A1 (en) * | 2000-05-19 | 2002-09-12 | Mckinnon Martin W. | Solicitations for allocations of access across a shared communications medium |
US6711170B1 (en) * | 1999-08-31 | 2004-03-23 | Mosaid Technologies, Inc. | Method and apparatus for an interleaved non-blocking packet buffer |
US20050053077A1 (en) * | 2003-07-23 | 2005-03-10 | International Business Machines Corporation | System and method for collapsing VOQ'S of a packet switch fabric |
US7055059B2 (en) * | 1993-04-23 | 2006-05-30 | Emc Corporation | Remote data mirroring |
US7080168B2 (en) * | 2003-07-18 | 2006-07-18 | Intel Corporation | Maintaining aggregate data counts for flow controllable queues |
US7103890B2 (en) * | 2003-03-24 | 2006-09-05 | Microsoft Corporation | Non-blocking buffered inter-machine data transfer with acknowledgement |
US20060221974A1 (en) * | 2005-04-02 | 2006-10-05 | Cisco Technology, Inc. | Method and apparatus for dynamic load balancing over a network link bundle |
US7292589B2 (en) * | 2002-08-13 | 2007-11-06 | Narendra Kumar Dhara | Flow based dynamic load balancing for cost effective switching systems |
US7307948B2 (en) * | 2002-10-21 | 2007-12-11 | Emulex Design & Manufacturing Corporation | System with multiple path fail over, fail back and load balancing |
-
2005
- 2005-05-20 JP JP2005147799A patent/JP2006323729A/en active Pending
- 2005-07-12 US US11/178,509 patent/US20060271639A1/en not_active Abandoned
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7055059B2 (en) * | 1993-04-23 | 2006-05-30 | Emc Corporation | Remote data mirroring |
US6115387A (en) * | 1997-02-14 | 2000-09-05 | Advanced Micro Devices, Inc. | Method and apparatus for controlling initiation of transmission of data as a function of received data |
US6145028A (en) * | 1997-12-11 | 2000-11-07 | Ncr Corporation | Enhanced multi-pathing to an array of storage devices |
US6341315B1 (en) * | 1999-02-26 | 2002-01-22 | Crossroads Systems, Inc. | Streaming method and system for fiber channel network devices |
US6711170B1 (en) * | 1999-08-31 | 2004-03-23 | Mosaid Technologies, Inc. | Method and apparatus for an interleaved non-blocking packet buffer |
US20020129143A1 (en) * | 2000-05-19 | 2002-09-12 | Mckinnon Martin W. | Solicitations for allocations of access across a shared communications medium |
US7292589B2 (en) * | 2002-08-13 | 2007-11-06 | Narendra Kumar Dhara | Flow based dynamic load balancing for cost effective switching systems |
US7307948B2 (en) * | 2002-10-21 | 2007-12-11 | Emulex Design & Manufacturing Corporation | System with multiple path fail over, fail back and load balancing |
US7103890B2 (en) * | 2003-03-24 | 2006-09-05 | Microsoft Corporation | Non-blocking buffered inter-machine data transfer with acknowledgement |
US7080168B2 (en) * | 2003-07-18 | 2006-07-18 | Intel Corporation | Maintaining aggregate data counts for flow controllable queues |
US20050053077A1 (en) * | 2003-07-23 | 2005-03-10 | International Business Machines Corporation | System and method for collapsing VOQ'S of a packet switch fabric |
US20060221974A1 (en) * | 2005-04-02 | 2006-10-05 | Cisco Technology, Inc. | Method and apparatus for dynamic load balancing over a network link bundle |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080141256A1 (en) * | 2006-12-08 | 2008-06-12 | Forrer Jr Thomas R | System and Method to Improve Sequential Serial Attached Small Computer System Interface Storage Device Performance |
US8307128B2 (en) * | 2006-12-08 | 2012-11-06 | International Business Machines Corporation | System and method to improve sequential serial attached small computer system interface storage device performance |
US20090003361A1 (en) * | 2007-06-27 | 2009-01-01 | Emulex Design & Manufacturing Corporation | Multi-protocol controller that supports PCle, SAS and enhanced ethernet |
US7917682B2 (en) * | 2007-06-27 | 2011-03-29 | Emulex Design & Manufacturing Corporation | Multi-protocol controller that supports PCIe, SAS and enhanced Ethernet |
US8793399B1 (en) * | 2008-08-06 | 2014-07-29 | Qlogic, Corporation | Method and system for accelerating network packet processing |
US11050660B2 (en) * | 2018-09-28 | 2021-06-29 | EMC IP Holding Company LLC | Host device with multi-path layer implementing path selection based at least in part on fabric identifiers |
US11044313B2 (en) | 2018-10-09 | 2021-06-22 | EMC IP Holding Company LLC | Categorizing host IO load pattern and communicating categorization to storage system |
Also Published As
Publication number | Publication date |
---|---|
JP2006323729A (en) | 2006-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200210069A1 (en) | Methods and systems for data storage using solid state drives | |
US7272687B2 (en) | Cache redundancy for LSI raid controllers | |
US8127077B2 (en) | Virtual path storage system and control method for the same | |
JP4014923B2 (en) | Shared memory control method and control system | |
JP4087072B2 (en) | Storage system and virtual private volume control method | |
JP3997061B2 (en) | Storage subsystem and storage subsystem control method | |
US20180337995A1 (en) | System and method for sharing san storage | |
US7484058B2 (en) | Reactive deadlock management in storage area networks | |
US20120110397A1 (en) | Data transmission system, storage medium and data transmission program | |
US20060271639A1 (en) | Multipath control device and system | |
US20160216891A1 (en) | Dynamic storage fabric | |
CN101383732A (en) | Intelligent failback in a load-balanced networking environment | |
US20050262309A1 (en) | Proactive transfer ready resource management in storage area networks | |
CN112346653A (en) | Drive box, storage system and data transfer method | |
US9558149B2 (en) | Dual system | |
US7240167B2 (en) | Storage apparatus | |
US7003553B2 (en) | Storage control system with channel control device having data storage memory and transfer destination circuit which transfers data for accessing target cache area without passing through data storage memory | |
US8417858B2 (en) | System and method for enabling multiple processors to share multiple SAS wide ports | |
CN116841926A (en) | Network interface and buffer control method thereof | |
US11095698B2 (en) | Techniques for processing management messages using multiple streams | |
US7839875B1 (en) | Method and system for an efficient transport loopback mechanism for TCP/IP sockets | |
US11880570B1 (en) | Storage system, data transmission method, and network interface | |
US20220188259A1 (en) | Data transfer system and system host | |
WO2022267909A1 (en) | Method for reading and writing data and related apparatus | |
US9195410B2 (en) | Storage system and access arbitration method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HITACHI, LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KUMAGAI, ATSUYA;MURAKAMI, TOSHIHIKO;REEL/FRAME:016796/0102 Effective date: 20050622 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |