WO2011078998A1 - Adaptive resource management - Google Patents

Adaptive resource management Download PDF

Info

Publication number
WO2011078998A1
WO2011078998A1 PCT/US2010/060536 US2010060536W WO2011078998A1 WO 2011078998 A1 WO2011078998 A1 WO 2011078998A1 US 2010060536 W US2010060536 W US 2010060536W WO 2011078998 A1 WO2011078998 A1 WO 2011078998A1
Authority
WO
WIPO (PCT)
Prior art keywords
consumers
network resource
resources
consumer
database
Prior art date
Application number
PCT/US2010/060536
Other languages
French (fr)
Inventor
Boris Klots
Subhadeep Sinha
Satish Kumar
Original Assignee
Delphix Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Delphix Corp. filed Critical Delphix Corp.
Priority to EP10839995.7A priority Critical patent/EP2517115A4/en
Publication of WO2011078998A1 publication Critical patent/WO2011078998A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/82Miscellaneous aspects
    • H04L47/822Collecting or measuring resource availability data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5019Ensuring fulfilment of SLA
    • H04L41/5022Ensuring fulfilment of SLA by giving priorities, e.g. assigning classes of service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/80Actions related to the user profile or the type of traffic
    • H04L47/805QOS or priority aware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/82Miscellaneous aspects
    • H04L47/828Allocation of resources per group of connections, e.g. per group of users
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/83Admission control; Resource allocation based on usage prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/62Establishing a time schedule for servicing the requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M15/00Arrangements for metering, time-control or time indication ; Metering, charging or billing arrangements for voice wireline or wireless communications, e.g. VoIP
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/0816Configuration setting characterised by the conditions triggering a change of settings the condition being an adaptation, e.g. in response to network events

Definitions

  • This invention relates generally to resource management for storage systems, and in particular to adaptive management of resources shared by multiple consumers.
  • Virtualization technologies allow hardware resources to be used and shared by multiple consumers.
  • a consumer can be a process running on a computer system that accesses resources to perform certain tasks.
  • An example of consumer is a task related to database operations on a system hosting databases, for example, query processing, data manipulations, reporting, replication, backup, restore, or export. These tasks can require significant amount of system resources.
  • An example of a shared hardware resource is network resource that allows consumers to communicate with external systems. Another example is a bandwidth of storage subsystem. Shared resources are allocated between various consumers. The allocation of resources to individual consumers determines the overall utilization of the hardware resources in a system.
  • Consumers of resources may be associated with priorities based on the
  • a fixed resource allocation strategy can allocate fixed amount of resources to different consumers based on their priorities. In many cases these fixed amounts are determined upfront or are results of explicit operator input. Fixed resource allocation strategies may not be able to automatically adjust to dynamic changes in consumer needs.
  • a proportional fairness based resource allocation strategy allocates an amount of resources for each consumer proportionate to its anticipated resource consumption.
  • Another resource allocation strategy is a round robin strategy that iterates through consumers in a round robin fashion to allocate resources.
  • Other types of allocation strategies include first come first served type of allocation, fair queuing (max-min fairness) and weighted queuing.
  • Virtualization of databases allows consolidation of multiple virtual databases on the same database storage system.
  • Multiple tasks associated with the virtual databases may execute on the storage system including, loading of the databases, provisioning of the virtual databases, and serving of requests and tasks related to the virtual databases.
  • These tasks are consumers of system and hardware resources, for example, network resources and storage bandwidth.
  • the goal is the allocation of resources for the consumers optimizes that optimizes the overall utilization of the resources for the system across multiple virtual databases with respect to their SLAs and priorities. Resources are distributed among various consumers depending on their dynamic needs and required SLAs.
  • Embodiments of the invention enable allocation of network resources to consumers of different priorities in a computer system.
  • a metric representing the aggregate needs of a low priority set of consumers of the network resources is determined based on observed usage of the network resources by the consumers.
  • the metric representing the needs of the low priority set of consumers is compared to a threshold value. If the needs of the low priority consumers are above a threshold value, allocations of the network resource are first determined for a high priority set of consumers. After allocating the resources to the high priority set of consumers, a remaining amount of left over allocations is determined and allocated to the low priority set of consumers.
  • resources can be allocated to the high-priority customers up to the total amount of resources minus the amounts guaranteed to the lower priority consumers.
  • the allocations of the low priority consumers are determined first and the remaining leftover resources are allocated to the high-priority consumers. Any resources still remaining are distributed over all the consumers.
  • Embodiments of the invention enable computation of total throughput of network resources used by consumers. Multiple usage values of the network resource that are cumulative over time are determined. Each cumulative usage value is associated with a time interval and is based on observed usages of network resource by consumers over the time interval. The total throughput of the network resource is determined based on an aggregate value based on the multiple cumulative usage values. The total throughput value is increased by a predetermined factor. Allocations of the network resource for each consumers of the network resource are determined based on the increased total throughput value. [009] Each allocation for a consumer determines the availability of the network resource to the consumer for a subsequent time interval. The system assumes certain guarantees for individual users and for priority groups. If these guaranteed amounts are unlikely to be consumed based on the forecasting of the described method, the surplus part of the resource will be allocated to other consumers.
  • FIG. 1 is diagram illustrating how information is copied from a production database to a database storage system and provisioned as virtual databases using a file sharing system, in accordance with an embodiment of the invention.
  • FIG. 2 is a schematic diagram of the architecture of a system that makes stores virtual databases and optimizes the shared resources for tasks related to the virtual databases, in accordance with an embodiment of the invention.
  • FIG. 3 illustrates a hierarchy of priority groups and assignment of consumers to priority groups, in accordance with an embodiment of the invention.
  • FIG. 4 illustrates network links and flows associated with consumers, in accordance with an embodiment of the invention.
  • FIG. 5 shows a flowchart of the process used for computing the total throughput of a link, in accordance with an embodiment of the invention.
  • FIG. 6 shows a flowchart of the process used for allocating the resources among consumers, in accordance with an embodiment of the invention.
  • FIG. 7 shows a flowchart of the process used for allocating the resources among consumers of a priority group based on a greedy or a fair share strategy, in accordance with an embodiment of the invention.
  • FIG. 8 illustrates an embodiment of a computing machine that can read instructions from a machine-readable medium and execute the instructions in a processor or controller.
  • Creation of virtual databases allows storage of multiple virtual databases in a database storage system.
  • Storage of multiple virtual databases on a database storage system requires execution of multiple tasks related to the virtual databases on the database storage system.
  • These tasks include creation of virtual databases, tasks related to use of virtual databases including query processing, data manipulations, replication, backup, restore, export of virtual databases and the like.
  • These tasks share hardware resources available on the database storage systems and act as consumers of the shared resources. Different tasks can be associated with different priority levels which may be determined by a system
  • the resources shared by different consumers need to be allocated between the consumers appropriately, for example, higher priority consumers may be given larger share of resources compared to lower priority consumers.
  • the allocation of resources ensures that lower priority tasks are not starved of resources.
  • some lower priority tasks may be starved but are allowed to continue to exist in the system.
  • the system aims at optimizing the overall usage of the shared resources across various consumers with respect to their priorities.
  • usage of shared resources is optimized across multiple modules of virtual database systems stored on a database storage system.
  • Virtual databases can be created based on the state of a production database at a particular point in time, and the virtual databases can then be individually accessed and modified as desired.
  • a database comprises data stored in a computer or storage subsystem for use by computer implemented applications.
  • a database server is a computer program that can interact with the database and provides database services, for example, access to the data stored in the database.
  • Database servers include commercially available programs, for example, database servers included with database management systems provided by ORACLE, SYBASE, MICROSOFT SQL SERVER, IBM's DB2, MYSQL, and the like.
  • production database is used in particular examples to illustrate a useful application of the technology; however, it can be appreciated that the techniques disclosed can be used for any database, regardless of whether the database is used as a production database.
  • the virtual databases are "virtual" in the sense that the physical implementation of the database files is decoupled from the logical use of the database files by a database server. Systems and methods for creating virtual databases and using them in workflows are disclosed in U.S. Application No. 12/603,545 filed on October 21, 2009. [022] In one embodiment, information from the production database is copied to a storage system at various times, such as periodically. This enables reconstruction of the database files associated with the production database for these different points in time.
  • the information may be managed in the storage system in an efficient manner so that copies of information are made only if necessary. For example, if a portion of the database is unchanged from a version that was previously copied, that unchanged portion need not be copied.
  • a virtual database created for a point in time is stored as a set of files that contain the information of the database as available at that point in time. Each file includes a set of database blocks and the data structures for referring to the database blocks stored for earlier copies.
  • a virtual database may be created on a database server by creating the database files for the production database corresponding to the state of the production database at a previous point in time, as required for the database server.
  • provisioning the virtual database includes managing the process of creating a running database server based on virtual database. Multiple VDBs can be provisioned based on the state of the production database at the same point in time. On the other hand, different VDBs can be based on different point in time state of the same production database or different production databases. The database server on which a virtual database has been provisioned can then read from and write to the files stored on the storage system. A database block may be shared between different files each file associated with a different VDB.
  • FIG. 1 illustrates one embodiment illustrating how information may be copied from a production database to a database storage system and provisioned as virtual databases using a file sharing system.
  • the production database systems 110 manage data for an organization.
  • the database storage system 100 retrieves data associated with databases from one or more production database systems 110 and stores the data in an efficient manner, further described below.
  • a database administrator user interface 140 allows a database administrator to perform various actions supported by the database storage system 100.
  • the database storage system 100 may send a request 150 for data to a production database system 110.
  • the production database system 110 responds by sending information stored in the production database as a stream of data 160.
  • the request 150 is sent periodically and the production database system 110 responds by sending information representing changes of data stored in the production database since the last response 160 sent by the production database system 110.
  • the database storage system 100 receives the data 160 sent by the production database system 110 and stores the data.
  • the database storage system 100 may analyze the data 160 received to determine whether to store the information or skip the information if the information is not useful for reconstructing the database at previous time points.
  • the database storage system 100 stores the information efficiently, for example, by keeping versions of database blocks that have changed and reusing database blocks that have not changed.
  • the database storage system 100 creates files that represent the information corresponding to the production database system 110 at a given point in time.
  • the database storage system 100 exposes 170 the corresponding files to a virtual database system 130 using a file sharing system 120.
  • the virtual database system 130 runs a database server that can operate with the files exposed 170 by the database storage system 100.
  • a virtual copy of the production database is created for the virtual database system 130 for a given point in time in a storage efficient manner.
  • Modules in the database storage system 100 require resources to perform tasks.
  • the resources can be network resources for communicating with external systems, computing resources or other resources.
  • the virtual database manager 275 may need resources for provisioning a VDB
  • the point-in-time copy manager 210 may need network resources for retrieving a point-in-time copy of a database from the production database system 110
  • the transaction log manager 220 may need network resources for retrieving log updates from a production database system 110
  • the virtual database manager 275 may need resources for exporting the data in a VDB to an external system.
  • a task performed by a module utilizing a resource is a consumer of the resource.
  • FIG. 2 shows is a high level block diagram illustrating a system environment suitable for managing virtual databases on a database storage system 100 and optimizing overall resources used by the VDBs stored on the database storage system 100.
  • the system environment comprises one or more production database systems 110, a database storage system 100, an administration system 140, and one or more virtual database systems 130.
  • Systems shown in FIG. 2 can communicate with each other if necessary via a network.
  • a production database system 110 is typically used by an organization for maintaining its daily transactions. For example, an online bookstore may save all the ongoing transactions related to book purchases, book returns, or inventory control in a production system 110.
  • the production system 110 includes a database server 245 and a production DB data store 250.
  • the production DB data store 250 stores data associated with a database that may represent for example, information representing daily transactions of an enterprise.
  • the database server 245 processes requests that access data stored in the production DB data store 250.
  • different and/or additional modules can be included in a production database system 110.
  • the database storage system 100 retrieves information available in the production database systems 110 and stores it.
  • the information retrieved includes database blocks comprising data stored in the database, transaction log information, metadata information related to the database, information related to users of the database and the like.
  • the information retrieved may also include configuration files associated with the databases. For example, databases may use vendor specific configuration files to specify various
  • configuration parameters including initialization parameters associated with the databases.
  • the data stored in the storage system data store 290 can be exposed to a virtual database system 130 allowing the virtual database system 130 to treat the data as a copy of the production database stored in the production database system 110.
  • the database storage system 100 includes a point-in-time copy manager 210, a transaction log manager 220, a interface manager 230, a file sharing manager 270, a virtual database manager 275, a storage system data store 290, and an adaptive resource manager 215.
  • the adaptive resource manager 215 comprises various modules including an allocation manager 225, a scheduler 235, a consumer store 255, a metrics manager 265 and a resource usage store 270. In alternative configurations, different and/or additional modules can be included in the database storage system 100.
  • the point-in-time copy manager 210 interacts with the production database system 110 by sending a request to retrieve information representing a point-in-time copy (also referred to as a "PIT copy") of a database stored in the production DB data store 250.
  • the point-in-time copy manager 210 stores the data obtained from the production database system 110 in the storage system data store 290.
  • the data retrieved by the point-in-time copy manager 210 corresponds to database blocks (or pages) of the database being copied from the production DB data store 250.
  • a subsequent PIT copy request may need to retrieve only the data that changed in the database since the previous request.
  • the data collected in the first request can be combined with the data collected in a second request to reconstruct a copy of the database corresponding to a point in time at which the data was retrieved from the production DB data store 250 for the second request.
  • the transaction log manager 220 sends request to the production database system 110 for retrieving portions of the transaction logs stored in the production database system 110.
  • the data obtained by the transaction log manager 220 is stored in the storage system data store 290.
  • a request for transaction logs retrieves only the changes in the transaction logs in the production database system 110 since a previous request for the transaction logs was processed.
  • the database blocks retrieved by a point in time copy manager 210 combined with the transaction logs retrieved by the transaction log manager 220 can be used to reconstruct a copy of a database in the production system 110 corresponding to times in the past in between the times as which point-in-time copies are made.
  • the file sharing manager 270 allows files stored in the storage system data store 290 to be shared across computers that may be connected with the database storage system 100 over the network.
  • the file sharing manager 270 uses the file sharing system 120 for sharing files.
  • An example of a system for sharing files is a network file system (NFS).
  • a system for sharing files may utilize fibre channel Storage area networks (FC-SAN) or network attached storage (NAS) or combinations and variations thereof.
  • the system for sharing files may be based on small computer system interface (SCSI) protocol, internet small computer system interface (iSCSI) protocol, fibre channel protocols or other similar and related protocols.
  • the virtual database manager 275 receives requests for creation of a virtual database for a virtual database system 130.
  • the request for creation of a virtual database may be sent by a database administrator using the administration system 140 and identifies a production database system 110, a virtual database system 130, and includes a past point-in- time corresponding to which a virtual database needs to be created.
  • the virtual database manager 275 creates the necessary files corresponding to the virtual database being created and shares the files with the virtual database system 130 using the file sharing manager 270.
  • the interface manager 230 renders for display information necessary for display using the administration system 140.
  • a database administrator user can see information available in the storage system data store 290 as well as take actions executed by the database storage system.
  • the database administrator can request the database storage system 100 to make a PIT copy of a database stored on a production database system 1 10 at a particular point-in-time.
  • the interface manager allows a system administrator to set various priorities associated with different tasks. The system administrator can also set minimum and maximum guarantees of allocation associated with various tasks.
  • the adaptive resource manager 215 contains various modules necessary to allocate shared resources between tasks representing consumers of the shared resources.
  • the consumer store 255 maintains data structures representing consumers in the database storage system 100.
  • the consumer store 255 stores the priority and sub-priority associated with each consumer. Consumers may be added to or deleted from the consumer store 255.
  • a consumer may have a status, for example, pending or active.
  • the resource usage store 270 stores information related to various resources available to the consumers in the database storage system 100 and information representing the usage of the resources.
  • the allocation manager 225 determines the allocations of various consumers for a given time interval.
  • the allocation manager performs an allocation run comprising analysis of usage of resources based on information available in the resource usage store 270 and of consumer information available in consumer store 255 to determine allocations of resources across different consumers.
  • the allocation manager determines allocations of resources periodically, where results of each allocation run are used for a subsequent time interval.
  • the scheduler 235 periodically invokes the allocation manager 225 to execute a run of the allocation including collection and analysis of usages of resources by various consumers and to determine allocation of the resources for the next time interval.
  • the allocation manager 225 invokes the scheduler to schedule the next run of the allocation manager 225.
  • the scheduler may get scheduling requests from other modules, for example, from the interface manager 230 that forwards requests made by a system administrator using the administration system 140.
  • the scheduler 235 may be implicitly invoked by execution of specific tasks, for example, when a consumer is created or deleted.
  • the metrics manager 265 gathers statistics for use by other modules or for reporting via the user interface 295. Examples of data reported include observed usage per consumer, 'unhappiness' index associated with consumers described herein, overall resource usage and the like. In an embodiment, the metrics manager maintains a cache that stores frequently accessed information for fast access. The metrics manager 265 may receive and process requests for information from the user interface 295 for display via the user interface 295.
  • a virtual database system 130 includes a database server 260.
  • the database server 260 is similar in functionality to the database server 245 and is a computer program that provides database services and application programming interfaces (APIs) for managing data stored on a data store 250.
  • the data managed by the database server 260 may be stored on the storage system data store 290 that is shared by the database storage system 100 using a file sharing system 120.
  • different and/or additional modules can be included in a virtual database system 130. Some data can be stored on local storage.
  • a consumer is assigned to a priority group that determines the preference in allocation of resources for the consumer.
  • the assignment of priority groups can be performed based on a default priority group when the consumer is added to the system or by a database administrator using the user interface 295. Alternatively consumers can be automatically mapped to priority groups based on attributes of the consumer. Automatic assignments can be subject to change by a database
  • FIG. 3 illustrates an embodiment in which a consumer can be assigned to one of two priority groups, PI (high priority group 310) and P2 (regular priority group 315). By default all consumers can be assigned to the priority group P2. A database administrator can reassign a consumer from P2 to PI priority group if necessary. Each of the priority group may be sub-divided into sub-groups.
  • each priority is divided into sub-groups, for example, high sub-group 320, medium sub-group 325, and low subgroup 330.
  • the high sub-group 320 includes consumers with priority higher than the consumers in medium sub-group 325 which in turn have priority higher than consumers in low subgroup 330.
  • a default sub-group within the priority group can be assigned to each consumer.
  • a database administrator can reassign the sub-group of a consumer if necessary.
  • FIG. 3 shows a root group 305 that includes all priority groups underneath. In some embodiments, the root group 305 can be used as the default priority group for the resources. Note that other embodiments can have a hierarchy of priority groups and sub-groups of arbitrary depth and width.
  • a consumer 350(e) is assigned to the lowest level of priority group in the hierarchy of priority-groups as shown in FIG. 3.
  • the consumer 350 can be assigned to any priority group in the hierarchy.
  • a consumer can be assigned to the PI group 310, and may be assigned to a sub-group assigned by default.
  • the parent of a consumer 350 is the group that the consumer belongs to in the priority group hierarchy.
  • FIG. 4 illustrates network and bandwidth resources used by consumers in the database storage system 100, for example, network links and flows associated with consumers.
  • the database storage system 100 is connected to one or more external consumers 430.
  • a consumer can be the task of retrieving a point-in-time copy from a production database system 110 or the task of exporting a virtual database system 130.
  • a network resource of the database storage system 100 that is shared by multiple external consumers is called a network link 410.
  • Multiple external consumers that share a network link 410 can be executing on the same remote computer or on different remote computers. Although a remote computer can share multiple network links, each external consumer 420 is assigned to a single network link 430.
  • each network link 410 can be associated with an aggregate of network interface controllers (NICs) or a single NIC.
  • NICs network interface controllers
  • Each network link 430 has a stated linkcapacity that specifies the bandwidth supported by the network link 430.
  • the stated link capacity of the network link 430 may be specified by the vendor of the network link 410.
  • the actual bandwidth that is obtained when the network link 410 is used in a system can be different from the stated bandwidth since the actual bandwidth may depend on several factors, including network configurations, configuration and capacity of storage of the database storage system 100, nature of the workload, and the caching properties of the consumer tasks.
  • a flow 430 is associated with attributes including, a network link 410 used by the flow, a priority value associated with the flow, and a network port on the database storage system 100 used by the flow.
  • a flow 430 is associated with attributes including, a network link 410 used by the flow, a priority value associated with the flow, and a network port on the database storage system 100 used by the flow.
  • the database storage system 100 can enforce limits on the bandwidth available to a flow 430.
  • the priority associated with a flow 430 typically depends on the priority of the associated consumer.
  • the database storage system 100 throttles the network traffic through each flow to guarantee specific bandwidth to each consumer.
  • each external consumer 430 task there is a consumer task executing on the database storage system 100.
  • the information related to the consumer in the database storage system 100 is stored in the consumer store 255.
  • Information related to the resources including network links is stored in the resource usage store 270.
  • a link's total throughput is the aggregated network bandwidth available to all consumers using this particular link. Portions of the network bandwidth available on a link are allocated to the consumers associated with the link. The appropriate portion allocated to a consumer is calculated based on the total throughput. However, as described above, the total throughput depends on the actual bandwidth available using the link that depends on several factors and needs to be estimated. Also, the total throughput can change over time based on the changes in the factors that affect the overall bandwidth of the link.
  • the metrics manager 265 of the adaptive resource manager 215 stores the previously estimated resource usages of the network links 410 in the resource usage store 270.
  • the previously estimated resource usage data is used to estimate the total throughput for network links 410.
  • the significance and influence of the values of the past observations of resource usage are diminished over time to accommodate for changes in workloads, and storage or network configurations that affect the total throughput.
  • a predetermined parameter lookback determines the length of historic time interval used to estimate the total throughput. All observed resource usages between the present time t and the previous time point (t-lookback) are used to determine the total throughput. However resource usage data prior to the time (t-lookback) is not considered.
  • a decay parameter is considered that reduces the contribution due to older values of resource usage. The decay parameter may reduce the importance of previous values by a factor depending on the age of the age of the data. For example, the older the data is, the smaller the contribution of the data.
  • FIG. 5 shows a flowchart of the process used for computing the total throughput of a link.
  • the allocation manager 225 initially assigns 505 total throughput to a value determined to be a low estimate of the stated link capacity LowEstimateBW.
  • the low estimate of the stated link capacity is determined to be a fraction of the stated capacity of the network link, for example, half of the stated capacity of the network link.
  • the total throughput value is estimated periodically. Accordingly, the scheduler 235 causes the allocation manager 225 to wait 510 a predetermined interval of time before recomputing the observed usage of links and the value of the total throughput.
  • the observed usage of a link is determined by estimating the usage of the link by each consumer served by the link.
  • the usage may be estimated based on the consumer's inbound as well as outbound usage of the link. For example, the usage may be based on the total amount of data sent using the link in either direction during a time interval.
  • the time interval for measuring the usage of a link by a consumer can be the predetermined time interval that the allocation manager 225 waits 510 before re-computing the
  • the time interval for measuring the usage of a link by a consumer can be 30 seconds and the data transferred measures using kilobytes.
  • the observed usage for a link during a time interval is the total of the current usage of all consumers of the link during the time interval. In case of resources that are network links, the usage is measured in both directions, sending and receiving.
  • the allocation manager 225 re-computes 520 the total throughput value using the following equation:
  • variable lookback is a parameter to determine the length of historic time interval over which the observed usages are considered for evaluating the total throughput for a link for the current time.
  • the variable t is the present time and variable s represents any time point between t and lookback for which observed usage was determined.
  • Discounted Value function is
  • DiscountValue(ObservedUsage(s), t) ObservedUsage(s) * e ⁇ a*(t .
  • the value e is a constant.
  • Historical values determined earlier than t-lookback time are not considered in the above equation (1) for evaluation ofTotalThroughput .
  • equation (1) computes the TotalThroughput of a link based on all observed usage values ObservedUsage over the previous time interval of size lookback.
  • Alternative embodiments may utilize other functions to reduce the weight of older observed usages, for example a linear function or non-linear functions can be used.
  • the weight of all previous observed usages considered is the same and the older observed usages get eliminated after lookback time.
  • the equation (1) ensures that even if observed usage values reduce significantly, the value of TotalThroughput is not reduced below LowEstimateBW .
  • the value of the lookback parameter can be dynamically adjusted. The value of the lookback parameter can be manually changed by a system administrator or determined based by the allocation manager 225.
  • lookback For example, if the observed usages in the system are changing very slowly, the value of lookback can be increased, whereas if the observed usages in the system are changing more frequently, the value of lookback parameter can be reduced.
  • changes to lookback parameter can be driven by various 'lookback policies,' for example absolute time (e.g. lookback for a month/quarter/year worth of data), or/and by the amount of data processed, e.g. lookback goes as far as needed to account for 100TB of data). These lookback policies can be either manual or automatic.
  • An alternative embodiment uses the following recursive equation for computing the TotalThroughput for the current time indicated by time t and the computation of TotalThroughput for a previous time s.
  • TotalThroughput(link,tO) LowEstimateBW (link) (3) [060]
  • the equation (2) computes the TotalThroughput value for time t based on the TotalThroughput value for a previous time point weighted by an exponential factor depending on the time difference between t and s.
  • Alternative embodiments can use a different function to determine weight applied to the previous TotalThroughput value.
  • the weight applied to the previous TotalThroughput value can be a linear function of the time difference between present time and the previous time, a non-liner function or even a constant value.
  • Typical functions used for computing the weights applied to the TotalThroughput value of previous time points attempt at reducing the significance of previous TotalThroughput values in computation of TotalThroughput for current time point.
  • an estimate of the true total throughput for the link, TrueTotalThroughputilink is computed based on the following equation:
  • the TotalThroughput (link) value obtained from equation (1) can be higher than the value computed using equation (4).
  • the TrueTotalThroughputilink) value can be used for reporting purposes.
  • the value of all allocations is increased 530 by a factor (called fudge factor), for example, by 10%.
  • the increase of the allocations is intended to cause the allocations to increase and reach a true maximum value of the allocations.
  • the additional amount of resource allocated by the fudge factor may cause the ObservedUsage for the next iteration to increase compared to the previous iteration if the increase in allocation can be consumed. If each iteration increases the allocations by the fudge factor, the
  • TotalThroughput increases in each iteration until the aggregate needs of all consumers of the resources are satisfied or the actual maximum throughput value based on the constraints of the resources is reached.
  • the additional resources introduced by the fudge factor are not consumed.
  • the observed TotalThroughput is not increased at time t. [064] If the TotalThroughput value determined by increasing 530
  • the TotalThroughput by the fudge factor is determined 535 to be higher than an upper estimate of the stated link capacity
  • the TotalThroughput value is assigned 540 to the upper estimate of the stated link capacity.
  • the upper estimate of the stated link capacity may be determined from the stated link capacity, for example, 90% of the stated link capacity for each link. Typical inefficiencies of any practical system disallow the system to reach stated link capacities for the available links. Therefore, the TotalThroughput value for a link is limited to a maximum value based on the upper estimate of the stated link capacity.
  • the allocation manager 225 allocates 545 resources to consumers based on the total throughput. Since the total throughput is increased by a predetermined factor, the consumers may receive additional resources compared to their observed usage. The allocation manager 225 waits 510 for the predetermined interval and determines 515 the observed usages for the link and also determines 520 the TotalThroughput value. Some consumers may be able to utilize the additional allocated resources whereas other consumers may not need the additional allocated resources.
  • a change in load distribution may occur, for example, if the load is switched from sequential input/output (IO) used for analytical applications to transactional load dominated by smaller IO operations that are randomly occurring. Since TotalThroughput is determined based on historical observations, the estimated
  • TotalThroughput value may be larger than the changed throughput value available to the resources on a link.
  • the overestimate of the available resources may lead to additional resources being allocated to the consumers, based on phantom portion of resource that does not actually exist.
  • the decay of historical TotalThroughput values over time accounted for in equations (1 ,2) and the elimination of historical values prior to the lookback time interval causes the extra allocation of resources to reduce and get eliminated over time causing the TotalThroughput value to reach a realistic estimate.
  • a system administrator is allowed to reset the TotalThroughput value to initial default value, causing the allocation manager 225 to re-compute the TotalThroughput value from scratch.
  • An embodiment allows the allocation manager 225 to automatically reset the
  • TotalThroughput value to initial default value either periodically or based on detection of particular events, for example, changes in network configurations or events that indicate significant load changes, for example, addition or deletion of a production database system 110 from the database storage system 100 configuration.
  • Typical consumers of resources in a system similar to the system illustrated in FIG. 1 may require a minimum amount of resources to operate.
  • a module acting as a consumer may be required to send a periodic message stating its status.
  • the status signal may be required to detect system failures, for example, modules may send a signal that indicates "I am alive" to another module in-charge of monitoring the health of various subsystems or modules. If no signal is received from a module or sub-system, the system 100 may activate procedures to detect hardware or software failures in order to take appropriate action.
  • a virtual database manager 275 interacting with a virtual database system 130 may need minimum amount of resources to continue a meaningful mode of processing for a particular task.
  • the allocation manager 225 allocates minimum amount of resources to specific consumers, the usage of these consumers may need to be minimized to favor higher priority consumers.
  • a survival level resource allocation may be guaranteed to each consumer process created in the system and the consumer process needs to be suspended or deleted to reclaim the survival minimum resources allocated to the consumer. Note that suspension of a consumer process only stops real time activity of this consumer (data access, network traffic, etc) and frees all resources associated/guaranteed to this consumer but does not destroy storage of data associated with this consumer. For example, deleting a consumer process associated with a virtual database does not require deletion of the storage associated with the VDB.
  • the survival minimum resource allocation guaranteed to a consumer is configurable by a system administrator. In another embodiment, certain default values may be assigned to different categories of consumers based on their priorities in the system.
  • the minimal resource guarantee for a consumer in the system 100 is the minimal amount of resource that is made available by the allocation manager 225 to the consumer. If the consumer does not need its allocated minimal resources, the leftover portions of the resources are allocated by the allocation manager to other consumers based on their priority. On the other hand, if the allocation manager 225 determines after allocating higher priority consumers that there are leftover resources for lower priority consumers, the allocation manager 225 can provide additional allocations to the lower priority consumers, over and above the guaranteed minimum allocation.
  • a system administrator is allowed also to set maximum allocation values for individual consumers. A default value for minimum allocation of consumer resources can be zero, and a default value for the maximum allocation of consumer resources can be infinity.
  • the system can be configured to have a minimum guarantee for an entire set of consumers as a group, for example, the P2 group 315 shown in FIG. 3.
  • the overall minimum guarantee for the P2 group corresponds to an amount of resources to be distributed among P2 consumers, if the P2 consumers are able to consume the resources. If the P2 consumers are unable to consume all the resources allocated by group minimum guarantee, the unused resources may be allocated to other consumers.
  • the benefit of being able to configure a minimum guarantee for a group of consumers is to prevent the group of consumers (for example, P2 group) from getting starved of resources by another group of consumers that has higher priority (for example, PI).
  • the value of the minimum guarantee for a group of consumers can be specified by a system administrator or predetermined to a default value, for example, zero.
  • An embodiment automatically derives the minimum guarantee automatically based on historical data. For example, group guarantee can be set as a fixed percentage of the historically observed total group usage. Alternatively, the resource needs of the group are observed in the time periods when the workload is not dominated by the high priority consumers (unconstrained periods). Based on that resource needs of the group, the group guarantee is determined so as to always provide the group with at least 65% of its estimated total need.
  • the overall minimum guarantee for a group may be either set individually for each link or set globally and then distributed across links. In the later case, the embodiment does this in proportion to the group traffic on the link.
  • the GroupGuarantee(link) is the minimum guarantee for a group, for a specific link.
  • the GroupGuarantee is the overall minimum guarantee for the group.
  • the GroupThroughput(link) is the total throughput of the traffic generated by the group for a specific link.
  • the value ⁇ GroupThroughput(link) represents the sum of the linkeLlNKS
  • the allocation manager 225 may check various constraints including the following: (1) The sum of individual guarantees and survival guarantees for all the consumers in a group (for example P2), does not exceed the overall guarantee for the group. (2) The sum of the individual guarantees and survival guarantees for all the consumers in a group is below the low estimate for bandwidth for the link LowEstimateBW(link) which is determined as a predetermined fraction of the stated capacity of the link. (2) The overall guarantee specified for the group is below the
  • LowEstimateBW ⁇ link LowEstimateBW ⁇ link
  • FIG. 6 shows a flowchart of the process used for allocating the resources among consumers.
  • the process illustrated in FIG. 6 assumes consumers are classified into two sets, PI, and P2.
  • the consumers in priority group PI are higher priority consumers compared to consumers in priority group P2.
  • the process in FIG. 6 guarantees that the PI consumers are given priority over P2 consumers while the guarantees for P2s are preserved. Allocations are determined for PI consumers before P2 consumers, unless it is known that the needs of the P2 consumers are very low and are not causing any risk to the allocations of PI consumers.
  • the survival guarantees of the consumers in set P2 are allocated 605.
  • the group guarantees of the consumers of P2 priority group represent the amount of resources available to the consumers collectively if they can use the resources made available. The unused amount is returned to a common allocation pool.
  • the guarantees of the consumers of P2 priority group are designed to protect the consumers of the lower priority P2 group from being starved by the consumers of the higher priority PI group.
  • the needs of the P2 consumers are determined 605 to check 610 if the needs of the P2 consumers are below the P2 guarantees.
  • the needs of a consumer are determined based on the observed usage of the consumer. In an embodiment, a consumer is given an
  • the value of the margin by which the observed usage is increased for a consumer depends on the priority and sub-priority of the consumer as shown in FIG. 3.
  • the following table shows an example of margins determined based on the priority and sub-priority of a consumer.
  • the margin can be a function depending on observed usage.
  • NewUsage(C) of a consumer C is determined by increasing the observed usage
  • NewUsage(C) value for a particular consumer can enforce a maximum value MAX(C).
  • the calculation is shown in the equatios (6,6a) below.
  • the components of the formula are: the minimum individual guarantee for the consumer, the survival minimum value assigned to the consumer and projected need of the consumer with the margin.
  • the margin for priority P and sub-priority S is indicated by MARGIN(P,S).
  • NewUsage(C) MAX ⁇ MinGuarantee(C),MinSurvival(C), ObservedUsage(C) x (1 + MARGIN(P, S) (6)
  • NewUsage(C) MIN ⁇ NewUsage(C),Max(Q) (6a)
  • the allocations for P2 consumers are determined 615 based on their needs, followed by allocations of PI consumers determined 620 based to their needs. Since the needs of the P2 consumers are known to be below their guarantees, their allocations can be determined before the allocations of PI consumers. Since the P2 consumers are expected to consume less resources then the amount they are guaranteed they are not causing any risk to PI consumers of being allocated less resources.
  • a greedy algorithm described herein is used for determining 615 the allocations for P2 consumers. The greedy algorithm sequentially allocates the resources to the different consumers, going thorough the list of the consumers in order of decreasing priority.
  • the remaining (leftover) resources are allocated 645 to all the consumers.
  • the remaining resources may be allocated 645 proportional to the needs of the consumers.
  • the allocation 645 of the remaining resources may be weighted by the priority/sub-priority of the consumer.
  • the P2 needs are above the P2 guarantees, first the amount of resources equal to (TotalThroughput-P2Guarantees-AllSurvivalMinimumAllocations) is allocated to PI consumers 625 based on their needs. Since the needs of the P2 consumers are higher than their guarantees, it is possible that if the P2 consumers are allocated resources before the PI consumers, there may not be sufficient resources left for PI consumers. After the resources required for PI consumers are determined 625. the remaining resources are checked 630 to determine if there are sufficient resources left for P2 consumers. If there are sufficient resources left for the P2 consumers, the allocations for the P2 consumers are determined 635 based on their needs, for example, based on a greedy algorithm.
  • the remaining leftover resources can be allocated 645. If after determining 625 allocations for the PI consumers, it is determined that the remaining resources are not sufficient for the P2 consumers, the resources are allocated to P2 consumers based on a fair share strategy described below. In this situation, it is highly likely that there are no more leftover resources. However, if any leftover resources are found, they are allocated 645. After the leftover resources are allocated, a fudge factor may be introduced to find the maximum capacity as described above for step 535 in FIG. 5.
  • FIG. 7 shows a flowchart of the process used for allocating the resources among consumers of a priority group based on a greedy or a fair share strategy.
  • RemainingResource is initialized to a value representing the remaining resources at the stage at which the process shown in FIG. 7 is executed.
  • the process iterates over all the consumers of the priority group in order of decreasing priority.
  • the consumers within the priority group P may be divided into subgroups that are assigned sub-priorities as shown in FIG. 3.
  • the consumers of priority P2 may be processed in the order: consumers with priority P2 and sub-priority high, followed by consumers with priority P2 and sub-priority medium, followed by consumers with priority P2 and sub-priority low. Accordingly, a consumer C with the highest priority/sub-priority that hasn't been processed is selected 710.
  • the allocation for the consumer selected 710 is determined 715.
  • the strategy used for determining 715 the allocation for consumer C is different for the greedy allocation compared to the fair share allocation.
  • the allocation for consumer C is determined to be the NewUsage(C) see formula (6). Therefore, in the greedy allocation strategy, the consumer is allocated as much as the consumer needs based on its NewUsage value which already takes into account the guarantees
  • the allocation for the consumer may be less than the NewUsage value determined for the consumer.
  • First a FairShare(C) value is determined for the consumer using the equation (7) below:
  • the fair share value for a consumer FairShare(C) is determined based on the fraction of resources R allocated to the consumer C compared to the total resource allocated for all consumers ci in the set Consumers of the priority group.
  • W(c) is a weight assigned to consumer C. Weights are designed to reflect the priority and subpriority of the consumers.
  • the allocation for consumer C is determined to be min(NewUsage(C),FairShare(C)) .
  • the allocation of a consumer C may be limited by the FairShare(C) value computed for the consumer, even if the consumer C needs NewUsage(C) resources.
  • the leftover allocation 645 divides remaining allocations after resource allocations for all consumers have been determined based on greedy or fair share allocation strategies. Leftover allocations may not be provided to consumers that have reached their maximum allocations.
  • the resources are determined to be lightly loaded, the leftover resources are divided equally among all consumers.
  • the previously determined allocations of all consumers are incremented by the amount obtained by equally dividing the leftover resources among all consumers.
  • the system may be determined to be lightly loaded for a link if the number of consumers using the link is low and the observed usages of consumers using the link is also determined to be low. For example, the system may be considered lightly loaded for a link if there are fewer than 50 consumers using the link and the overall observed usage of the link is less than a quarter of the stated link capacity.
  • the resources are divided between consumers in proportion of usage and weight of the consumers.
  • the weight associated with a consumer is based on priority preferences, for example, the weight may be determined based on the priority and sub-priority associated with the consumer.
  • the following equation shows how the share Share(C) of a consumer C is determined for leftover resources R.
  • the share of a consumer Share (C) of the remaining resource R is determined based on the weighted fraction of the usage of consumer C compared to the weighted usage for all consumers ci in the set Consumers of the priority group. Based on the equation (8) above, the total of all Share(C) for all consumers adds up to the remaining total resource. Shares of consumers with the same usage are proportional to weights determined by their priorities and sub-priorities. Furthermore, shares of consumers within the same priority/sub- priority groups are proportional to their usage.
  • the allocations of resources for consumers determined previously are updated by adding the corresponding Share(C) value to each allocation. If the resulting value exceeds the maximum limit configured for the consumer, the allocation is limited to the maximum limit. Based on the above updates to allocations, the value of remaining resources is computed again. If for any reason there are still remaining resources, the above allocation can be repeated.
  • an unhappiness index is determined by the metrics manager 265 as a measure of potential of starvation for a particular consumer.
  • the metric is based on the fraction of a time interval during which the usage of the consumer exceeds a
  • the unhappiness index is measures over a fixed time window, for example, 24 hours. During the fixed time window, there can be several allocation runs during which the allocation manager 225 re-computes the allocations for the next time interval. The time interval between two allocation runs is called an allocation interval and corresponds to the time during which the previous allocation was enforced. [088]
  • the summation in the numerator of (9) adds the time interval ti when the usage of consumer C during ti, usage (C,ti) is greater than N%.
  • the denominator of equation (9) adds all the time intervals ti within the window TWINDOW thereby providing the time of the entire time window TWINDOW.
  • the consumers are ordered in decreasing order of their unhappiness index to obtain their unhappiness rank.
  • the unhappiness tank and unhappiness index of the consumers may be reported by the metrics manager 265 to a system administrator, for example, via a user interface 295.
  • the system administrator may decide to change the priority or sub-priority of the consumer based on the unhappiness index combined with other criteria for example, the type of the task.
  • the metrics manager 265 may provide the information regarding unhappiness of consumers to allow the allocation manager to make automatic adjustments to the priority or sub-priority of the consumers.
  • the unhappiness measure of a consumer is used for automatic adjustment of priority/sub-priority of the consumer. For example, if a consumer is unhappy most of the time, the consumer may be automatically promoted to a higher priority.
  • FIG. 8 is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller). Specifically, FIG. 8 shows a diagrammatic representation of a machine in the example form of a computer system 800 within which instructions 824 (e.g., software) for causing the machine to perform any one or more of the methodologies discussed herein may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • instructions 824 e.g., software
  • the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server
  • the machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 824 (sequential or otherwise) that specify actions to be taken by that machine.
  • PC personal computer
  • PDA personal digital assistant
  • STB set-top box
  • a cellular telephone a smartphone
  • smartphone a web appliance
  • network router switch or bridge
  • any machine capable of executing instructions 824 (sequential or otherwise) that specify actions to be taken by that machine.
  • machine shall also be taken to include any collection of machines that individually or jointly execute instructions 824 to perform any one or more of the methodologies discussed herein.
  • the example computer system 800 includes a processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 804, and a static memory 806, which are configured to communicate with each other via a bus 808.
  • the computer system 800 may further include graphics display unit 810 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)).
  • graphics display unit 810 e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)
  • the computer system 800 may also include alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 816, a signal generation device 818 (e.g., a speaker), and a network interface device 820, which also are configured to communicate via the bus 808.
  • alphanumeric input device 812 e.g., a keyboard
  • a cursor control device 814 e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument
  • a storage unit 816 e.g., a disk drive, or other pointing instrument
  • a signal generation device 818 e.g., a speaker
  • network interface device 820 which also are configured to communicate via the bus 808.
  • the storage unit 816 includes a machine-readable medium 822 on which is stored instructions 824 (e.g., software) embodying any one or more of the methodologies or functions described herein.
  • the instructions 824 (e.g., software) may also reside, completely or at least partially, within the main memory 804 or within the processor 802 (e.g., within a processor's cache memory) during execution thereof by the computer system 800, the main memory 804 and the processor 802 also constituting machine-readable media.
  • the instructions 824 (e.g., software) may be transmitted or received over a network 826 via the network interface device 820.
  • machine-readable medium 822 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 824).
  • the term “machine -readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 824) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein.
  • the term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media. Additional Configuration Considerations
  • Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules.
  • a hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner.
  • one or more computer systems e.g., a standalone, client or server computer system
  • one or more hardware modules of a computer system e.g., a processor or a group of processors
  • software e.g., an application or application portion
  • a hardware module may be implemented mechanically or electronically.
  • a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations.
  • a hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
  • hardware module should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
  • “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general- purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
  • Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
  • a resource e.g., a collection of information
  • processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
  • the modules referred to herein may, in some example embodiments, comprise processor- implemented modules.
  • the methods described herein may be at least partially processor- implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations. [0102] The one or more processors may also operate to support performance of the relevant operations in a "cloud computing" environment or as a "software as a service" (SaaS).
  • SaaS software as a service
  • At least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
  • a network e.g., the Internet
  • APIs application program interfaces
  • the performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
  • the one or more processors or processor- implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
  • connection along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still cooperate or interact with each other. The embodiments are not limited in this context.
  • the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion.
  • a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
  • "or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).

Abstract

Allocation of resources across multiple consumers allows efficient utilization of shared resources. Observed usages of resources by consumers over time intervals are used to determine a total throughput of resources by the consumers. The total throughput of resources is used to determine allocation of resources for a subsequent time interval. The consumers are associated with priorities used to determine their allocations. Minimum and maximum resource guarantees may be associated with consumers. The resource allocation aims to allocate resources based on the priorities of the consumers while aiming to avoid starvation by any consumer. The resource allocation allows efficient usage of network resources in a database storage system storing multiple virtual databases.

Description

ADAPTIVE RESOURCE MANAGEMENT
BACKGROUND
[001] This invention relates generally to resource management for storage systems, and in particular to adaptive management of resources shared by multiple consumers.
[002] Virtualization technologies allow hardware resources to be used and shared by multiple consumers. A consumer can be a process running on a computer system that accesses resources to perform certain tasks. An example of consumer is a task related to database operations on a system hosting databases, for example, query processing, data manipulations, reporting, replication, backup, restore, or export. These tasks can require significant amount of system resources. An example of a shared hardware resource is network resource that allows consumers to communicate with external systems. Another example is a bandwidth of storage subsystem. Shared resources are allocated between various consumers. The allocation of resources to individual consumers determines the overall utilization of the hardware resources in a system.
[003] Consumers of resources may be associated with priorities based on the
consumer's importance to an end user. For example, certain consumers perform tasks that have higher priority than other consumers or have tighter service level agreements (SLA) requirements. Allocation of hardware resources between consumers need to consider their priorities. Allocations aim to ensure that higher priority tasks get a larger share of resources than lower priority tasks. However, even a low priority task should be able to make progress over time, although its progress may be slow compared to high priority tasks. Improper allocation of resources to consumers may result in starvation of some consumers and hoarding of resources by other consumers. Starvation of a consumer results when the consumer is perpetually denied resources that it needs.
[004] Various strategies are utilized to share resources between consumers. A fixed resource allocation strategy can allocate fixed amount of resources to different consumers based on their priorities. In many cases these fixed amounts are determined upfront or are results of explicit operator input. Fixed resource allocation strategies may not be able to automatically adjust to dynamic changes in consumer needs. A proportional fairness based resource allocation strategy allocates an amount of resources for each consumer proportionate to its anticipated resource consumption. Another resource allocation strategy is a round robin strategy that iterates through consumers in a round robin fashion to allocate resources. Other types of allocation strategies include first come first served type of allocation, fair queuing (max-min fairness) and weighted queuing. SUMMARY
[005] Virtualization of databases allows consolidation of multiple virtual databases on the same database storage system. Multiple tasks associated with the virtual databases may execute on the storage system including, loading of the databases, provisioning of the virtual databases, and serving of requests and tasks related to the virtual databases. These tasks are consumers of system and hardware resources, for example, network resources and storage bandwidth. The goal is the allocation of resources for the consumers optimizes that optimizes the overall utilization of the resources for the system across multiple virtual databases with respect to their SLAs and priorities. Resources are distributed among various consumers depending on their dynamic needs and required SLAs.
[006] Embodiments of the invention enable allocation of network resources to consumers of different priorities in a computer system. A metric representing the aggregate needs of a low priority set of consumers of the network resources is determined based on observed usage of the network resources by the consumers. The metric representing the needs of the low priority set of consumers is compared to a threshold value. If the needs of the low priority consumers are above a threshold value, allocations of the network resource are first determined for a high priority set of consumers. After allocating the resources to the high priority set of consumers, a remaining amount of left over allocations is determined and allocated to the low priority set of consumers. In an embodiment, resources can be allocated to the high-priority customers up to the total amount of resources minus the amounts guaranteed to the lower priority consumers.
[007] In an embodiment, if the metric representing the needs of the consumers is below a threshold value, the allocations of the low priority consumers are determined first and the remaining leftover resources are allocated to the high-priority consumers. Any resources still remaining are distributed over all the consumers.
[008] Embodiments of the invention enable computation of total throughput of network resources used by consumers. Multiple usage values of the network resource that are cumulative over time are determined. Each cumulative usage value is associated with a time interval and is based on observed usages of network resource by consumers over the time interval. The total throughput of the network resource is determined based on an aggregate value based on the multiple cumulative usage values. The total throughput value is increased by a predetermined factor. Allocations of the network resource for each consumers of the network resource are determined based on the increased total throughput value. [009] Each allocation for a consumer determines the availability of the network resource to the consumer for a subsequent time interval. The system assumes certain guarantees for individual users and for priority groups. If these guaranteed amounts are unlikely to be consumed based on the forecasting of the described method, the surplus part of the resource will be allocated to other consumers.
[010] The features and advantages described in this summary and the following detailed description are not all-inclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[011] FIG. 1 is diagram illustrating how information is copied from a production database to a database storage system and provisioned as virtual databases using a file sharing system, in accordance with an embodiment of the invention.
[012] FIG. 2 is a schematic diagram of the architecture of a system that makes stores virtual databases and optimizes the shared resources for tasks related to the virtual databases, in accordance with an embodiment of the invention.
[013] FIG. 3 illustrates a hierarchy of priority groups and assignment of consumers to priority groups, in accordance with an embodiment of the invention.
[014] FIG. 4 illustrates network links and flows associated with consumers, in accordance with an embodiment of the invention.
[015] FIG. 5 shows a flowchart of the process used for computing the total throughput of a link, in accordance with an embodiment of the invention.
[016] FIG. 6 shows a flowchart of the process used for allocating the resources among consumers, in accordance with an embodiment of the invention.
[017] FIG. 7 shows a flowchart of the process used for allocating the resources among consumers of a priority group based on a greedy or a fair share strategy, in accordance with an embodiment of the invention.
[018] FIG. 8 illustrates an embodiment of a computing machine that can read instructions from a machine-readable medium and execute the instructions in a processor or controller.
[019] The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein. DETAILED DESCRIPTION
Virtual Databases as Consumers of Resources
[020] Creation of virtual databases allows storage of multiple virtual databases in a database storage system. Storage of multiple virtual databases on a database storage system requires execution of multiple tasks related to the virtual databases on the database storage system. These tasks include creation of virtual databases, tasks related to use of virtual databases including query processing, data manipulations, replication, backup, restore, export of virtual databases and the like. These tasks share hardware resources available on the database storage systems and act as consumers of the shared resources. Different tasks can be associated with different priority levels which may be determined by a system
administrator. The resources shared by different consumers need to be allocated between the consumers appropriately, for example, higher priority consumers may be given larger share of resources compared to lower priority consumers. In an embodiment, the allocation of resources ensures that lower priority tasks are not starved of resources. In another embodiment, some lower priority tasks may be starved but are allowed to continue to exist in the system. The system aims at optimizing the overall usage of the shared resources across various consumers with respect to their priorities.
[021] In an embodiment, usage of shared resources is optimized across multiple modules of virtual database systems stored on a database storage system. Virtual databases can be created based on the state of a production database at a particular point in time, and the virtual databases can then be individually accessed and modified as desired. A database comprises data stored in a computer or storage subsystem for use by computer implemented applications. A database server is a computer program that can interact with the database and provides database services, for example, access to the data stored in the database. Database servers include commercially available programs, for example, database servers included with database management systems provided by ORACLE, SYBASE, MICROSOFT SQL SERVER, IBM's DB2, MYSQL, and the like. The term "production database" is used in particular examples to illustrate a useful application of the technology; however, it can be appreciated that the techniques disclosed can be used for any database, regardless of whether the database is used as a production database. The virtual databases are "virtual" in the sense that the physical implementation of the database files is decoupled from the logical use of the database files by a database server. Systems and methods for creating virtual databases and using them in workflows are disclosed in U.S. Application No. 12/603,545 filed on October 21, 2009. [022] In one embodiment, information from the production database is copied to a storage system at various times, such as periodically. This enables reconstruction of the database files associated with the production database for these different points in time. The information may be managed in the storage system in an efficient manner so that copies of information are made only if necessary. For example, if a portion of the database is unchanged from a version that was previously copied, that unchanged portion need not be copied. A virtual database created for a point in time is stored as a set of files that contain the information of the database as available at that point in time. Each file includes a set of database blocks and the data structures for referring to the database blocks stored for earlier copies. A virtual database may be created on a database server by creating the database files for the production database corresponding to the state of the production database at a previous point in time, as required for the database server. The files corresponding to the virtual database are made available to the database server using a file sharing mechanism, which links the virtual database to the appropriate database blocks stored on the storage system. The process of making the virtual database available to a database server is called "provisioning" the virtual database. In some embodiments, provisioning the virtual database includes managing the process of creating a running database server based on virtual database. Multiple VDBs can be provisioned based on the state of the production database at the same point in time. On the other hand, different VDBs can be based on different point in time state of the same production database or different production databases. The database server on which a virtual database has been provisioned can then read from and write to the files stored on the storage system. A database block may be shared between different files each file associated with a different VDB.
[023] FIG. 1 illustrates one embodiment illustrating how information may be copied from a production database to a database storage system and provisioned as virtual databases using a file sharing system. The production database systems 110 manage data for an organization. The database storage system 100 retrieves data associated with databases from one or more production database systems 110 and stores the data in an efficient manner, further described below. A database administrator user interface 140 allows a database administrator to perform various actions supported by the database storage system 100.
[024] In response to a request from the administrator system 140, or based on a predefined schedule, the database storage system 100 may send a request 150 for data to a production database system 110. The production database system 110 responds by sending information stored in the production database as a stream of data 160. The request 150 is sent periodically and the production database system 110 responds by sending information representing changes of data stored in the production database since the last response 160 sent by the production database system 110. The database storage system 100 receives the data 160 sent by the production database system 110 and stores the data. The database storage system 100 may analyze the data 160 received to determine whether to store the information or skip the information if the information is not useful for reconstructing the database at previous time points. The database storage system 100 stores the information efficiently, for example, by keeping versions of database blocks that have changed and reusing database blocks that have not changed.
[025] To create a virtual database, the database storage system 100 creates files that represent the information corresponding to the production database system 110 at a given point in time. The database storage system 100 exposes 170 the corresponding files to a virtual database system 130 using a file sharing system 120. The virtual database system 130 runs a database server that can operate with the files exposed 170 by the database storage system 100. Hence, a virtual copy of the production database is created for the virtual database system 130 for a given point in time in a storage efficient manner.
[026] Modules in the database storage system 100 require resources to perform tasks. The resources can be network resources for communicating with external systems, computing resources or other resources. For example, the virtual database manager 275 may need resources for provisioning a VDB, the point-in-time copy manager 210 may need network resources for retrieving a point-in-time copy of a database from the production database system 110, the transaction log manager 220 may need network resources for retrieving log updates from a production database system 110, the virtual database manager 275 may need resources for exporting the data in a VDB to an external system. A task performed by a module utilizing a resource is a consumer of the resource.
System Architecture
[027] FIG. 2 shows is a high level block diagram illustrating a system environment suitable for managing virtual databases on a database storage system 100 and optimizing overall resources used by the VDBs stored on the database storage system 100. The system environment comprises one or more production database systems 110, a database storage system 100, an administration system 140, and one or more virtual database systems 130. Systems shown in FIG. 2 can communicate with each other if necessary via a network.
[028] A production database system 110 is typically used by an organization for maintaining its daily transactions. For example, an online bookstore may save all the ongoing transactions related to book purchases, book returns, or inventory control in a production system 110. The production system 110 includes a database server 245 and a production DB data store 250. The production DB data store 250 stores data associated with a database that may represent for example, information representing daily transactions of an enterprise. The database server 245 processes requests that access data stored in the production DB data store 250. In alternative configurations, different and/or additional modules can be included in a production database system 110.
[029] The database storage system 100 retrieves information available in the production database systems 110 and stores it. The information retrieved includes database blocks comprising data stored in the database, transaction log information, metadata information related to the database, information related to users of the database and the like. The information retrieved may also include configuration files associated with the databases. For example, databases may use vendor specific configuration files to specify various
configuration parameters including initialization parameters associated with the databases.
[030] The data stored in the storage system data store 290 can be exposed to a virtual database system 130 allowing the virtual database system 130 to treat the data as a copy of the production database stored in the production database system 110. The database storage system 100 includes a point-in-time copy manager 210, a transaction log manager 220, a interface manager 230, a file sharing manager 270, a virtual database manager 275, a storage system data store 290, and an adaptive resource manager 215. The adaptive resource manager 215 comprises various modules including an allocation manager 225, a scheduler 235, a consumer store 255, a metrics manager 265 and a resource usage store 270. In alternative configurations, different and/or additional modules can be included in the database storage system 100.
[031] The point-in-time copy manager 210 interacts with the production database system 110 by sending a request to retrieve information representing a point-in-time copy (also referred to as a "PIT copy") of a database stored in the production DB data store 250. The point-in-time copy manager 210 stores the data obtained from the production database system 110 in the storage system data store 290. The data retrieved by the point-in-time copy manager 210 corresponds to database blocks (or pages) of the database being copied from the production DB data store 250. After a first PIT copy request to retrieve information production DB data store 250, a subsequent PIT copy request may need to retrieve only the data that changed in the database since the previous request. The data collected in the first request can be combined with the data collected in a second request to reconstruct a copy of the database corresponding to a point in time at which the data was retrieved from the production DB data store 250 for the second request.
[032] The transaction log manager 220 sends request to the production database system 110 for retrieving portions of the transaction logs stored in the production database system 110. The data obtained by the transaction log manager 220 is stored in the storage system data store 290. In one embodiment, a request for transaction logs retrieves only the changes in the transaction logs in the production database system 110 since a previous request for the transaction logs was processed. The database blocks retrieved by a point in time copy manager 210 combined with the transaction logs retrieved by the transaction log manager 220 can be used to reconstruct a copy of a database in the production system 110 corresponding to times in the past in between the times as which point-in-time copies are made.
[033] The file sharing manager 270 allows files stored in the storage system data store 290 to be shared across computers that may be connected with the database storage system 100 over the network. The file sharing manager 270 uses the file sharing system 120 for sharing files. An example of a system for sharing files is a network file system (NFS). A system for sharing files may utilize fibre channel Storage area networks (FC-SAN) or network attached storage (NAS) or combinations and variations thereof. The system for sharing files may be based on small computer system interface (SCSI) protocol, internet small computer system interface (iSCSI) protocol, fibre channel protocols or other similar and related protocols.
[034] The virtual database manager 275 receives requests for creation of a virtual database for a virtual database system 130. The request for creation of a virtual database may be sent by a database administrator using the administration system 140 and identifies a production database system 110, a virtual database system 130, and includes a past point-in- time corresponding to which a virtual database needs to be created. The virtual database manager 275 creates the necessary files corresponding to the virtual database being created and shares the files with the virtual database system 130 using the file sharing manager 270.
[035] The interface manager 230 renders for display information necessary for display using the administration system 140. A database administrator user can see information available in the storage system data store 290 as well as take actions executed by the database storage system. For example, the database administrator can request the database storage system 100 to make a PIT copy of a database stored on a production database system 1 10 at a particular point-in-time. In an embodiment, the interface manager allows a system administrator to set various priorities associated with different tasks. The system administrator can also set minimum and maximum guarantees of allocation associated with various tasks.
[036] The adaptive resource manager 215 contains various modules necessary to allocate shared resources between tasks representing consumers of the shared resources. The consumer store 255 maintains data structures representing consumers in the database storage system 100. The consumer store 255 stores the priority and sub-priority associated with each consumer. Consumers may be added to or deleted from the consumer store 255. A consumer may have a status, for example, pending or active. The resource usage store 270 stores information related to various resources available to the consumers in the database storage system 100 and information representing the usage of the resources.
[037] The allocation manager 225 determines the allocations of various consumers for a given time interval. The allocation manager performs an allocation run comprising analysis of usage of resources based on information available in the resource usage store 270 and of consumer information available in consumer store 255 to determine allocations of resources across different consumers. In an embodiment, the allocation manager determines allocations of resources periodically, where results of each allocation run are used for a subsequent time interval.
[038] The scheduler 235 periodically invokes the allocation manager 225 to execute a run of the allocation including collection and analysis of usages of resources by various consumers and to determine allocation of the resources for the next time interval. In an embodiment, the allocation manager 225 invokes the scheduler to schedule the next run of the allocation manager 225. The scheduler may get scheduling requests from other modules, for example, from the interface manager 230 that forwards requests made by a system administrator using the administration system 140. The scheduler 235 may be implicitly invoked by execution of specific tasks, for example, when a consumer is created or deleted.
[039] The metrics manager 265 gathers statistics for use by other modules or for reporting via the user interface 295. Examples of data reported include observed usage per consumer, 'unhappiness' index associated with consumers described herein, overall resource usage and the like. In an embodiment, the metrics manager maintains a cache that stores frequently accessed information for fast access. The metrics manager 265 may receive and process requests for information from the user interface 295 for display via the user interface 295.
[040] A virtual database system 130 includes a database server 260. The database server 260 is similar in functionality to the database server 245 and is a computer program that provides database services and application programming interfaces (APIs) for managing data stored on a data store 250. The data managed by the database server 260 may be stored on the storage system data store 290 that is shared by the database storage system 100 using a file sharing system 120. In alternative configurations, different and/or additional modules can be included in a virtual database system 130. Some data can be stored on local storage. Consumer Priority Hierarchy
[041] A consumer is assigned to a priority group that determines the preference in allocation of resources for the consumer. There can be multiple priority groups that each consumer can be assigned to. Each consumer is assigned to only one priority group at a time. The consumer can be dynamically reassigned to a different priority group if necessary. The assignment of priority groups can be performed based on a default priority group when the consumer is added to the system or by a database administrator using the user interface 295. Alternatively consumers can be automatically mapped to priority groups based on attributes of the consumer. Automatic assignments can be subject to change by a database
administrator.
[042] FIG. 3 illustrates an embodiment in which a consumer can be assigned to one of two priority groups, PI (high priority group 310) and P2 (regular priority group 315). By default all consumers can be assigned to the priority group P2. A database administrator can reassign a consumer from P2 to PI priority group if necessary. Each of the priority group may be sub-divided into sub-groups.
[043] As illustrated in FIG. 3, each priority is divided into sub-groups, for example, high sub-group 320, medium sub-group 325, and low subgroup 330. The high sub-group 320 includes consumers with priority higher than the consumers in medium sub-group 325 which in turn have priority higher than consumers in low subgroup 330. Similar to a default priority group being assigned to a consumer, a default sub-group within the priority group can be assigned to each consumer. A database administrator can reassign the sub-group of a consumer if necessary. FIG. 3 shows a root group 305 that includes all priority groups underneath. In some embodiments, the root group 305 can be used as the default priority group for the resources. Note that other embodiments can have a hierarchy of priority groups and sub-groups of arbitrary depth and width.
[044] In one embodiment a consumer 350(e) is assigned to the lowest level of priority group in the hierarchy of priority-groups as shown in FIG. 3. In other embodiments, the consumer 350 can be assigned to any priority group in the hierarchy. For example, a consumer can be assigned to the PI group 310, and may be assigned to a sub-group assigned by default. The parent of a consumer 350 is the group that the consumer belongs to in the priority group hierarchy.
Resources used by Consumers
[045] FIG. 4 illustrates network and bandwidth resources used by consumers in the database storage system 100, for example, network links and flows associated with consumers. The database storage system 100 is connected to one or more external consumers 430. For example, a consumer can be the task of retrieving a point-in-time copy from a production database system 110 or the task of exporting a virtual database system 130. A network resource of the database storage system 100 that is shared by multiple external consumers is called a network link 410. Multiple external consumers that share a network link 410 can be executing on the same remote computer or on different remote computers. Although a remote computer can share multiple network links, each external consumer 420 is assigned to a single network link 430. If there are multiple links 410 connected to a remote computer, different external consumers 420 on the remote computer can be assigned to different links. Typically, there is bidirectional network traffic between an external consumer 420 and the database storage system 100. In an embodiment, each network link 410 can be associated with an aggregate of network interface controllers (NICs) or a single NIC.
[046] Each network link 430 has a stated linkcapacity that specifies the bandwidth supported by the network link 430. The stated link capacity of the network link 430 may be specified by the vendor of the network link 410. However the actual bandwidth that is obtained when the network link 410 is used in a system can be different from the stated bandwidth since the actual bandwidth may depend on several factors, including network configurations, configuration and capacity of storage of the database storage system 100, nature of the workload, and the caching properties of the consumer tasks.
[047] The portion of the resource associated with a network link 410 that is assigned to a consumer is called a flow 430. Hence, each external consumer 420 is assigned a flow 430 as shown in FIG. 4. A flow 430 is associated with attributes including, a network link 410 used by the flow, a priority value associated with the flow, and a network port on the database storage system 100 used by the flow. Typically, there is bidirectional network traffic associated with the flow 430 between the external consumer 420 associated with the flow and the database storage system 100. The database storage system 100 can enforce limits on the bandwidth available to a flow 430. The priority associated with a flow 430 typically depends on the priority of the associated consumer. The database storage system 100 throttles the network traffic through each flow to guarantee specific bandwidth to each consumer.
[048] In an embodiment, corresponding to each external consumer 430 task, there is a consumer task executing on the database storage system 100. The information related to the consumer in the database storage system 100 is stored in the consumer store 255.
Information related to the resources including network links is stored in the resource usage store 270.
Total Throughput Discovery
[049] A link's total throughput is the aggregated network bandwidth available to all consumers using this particular link. Portions of the network bandwidth available on a link are allocated to the consumers associated with the link. The appropriate portion allocated to a consumer is calculated based on the total throughput. However, as described above, the total throughput depends on the actual bandwidth available using the link that depends on several factors and needs to be estimated. Also, the total throughput can change over time based on the changes in the factors that affect the overall bandwidth of the link.
[050] The metrics manager 265 of the adaptive resource manager 215 stores the previously estimated resource usages of the network links 410 in the resource usage store 270. The previously estimated resource usage data is used to estimate the total throughput for network links 410. The significance and influence of the values of the past observations of resource usage are diminished over time to accommodate for changes in workloads, and storage or network configurations that affect the total throughput.
[051] In an embodiment, a predetermined parameter lookback determines the length of historic time interval used to estimate the total throughput. All observed resource usages between the present time t and the previous time point (t-lookback) are used to determine the total throughput. However resource usage data prior to the time (t-lookback) is not considered. In another embodiment, a decay parameter is considered that reduces the contribution due to older values of resource usage. The decay parameter may reduce the importance of previous values by a factor depending on the age of the age of the data. For example, the older the data is, the smaller the contribution of the data.
[052] FIG. 5 shows a flowchart of the process used for computing the total throughput of a link. The allocation manager 225 initially assigns 505 total throughput to a value determined to be a low estimate of the stated link capacity LowEstimateBW. In one embodiment, the low estimate of the stated link capacity is determined to be a fraction of the stated capacity of the network link, for example, half of the stated capacity of the network link. The total throughput value is estimated periodically. Accordingly, the scheduler 235 causes the allocation manager 225 to wait 510 a predetermined interval of time before recomputing the observed usage of links and the value of the total throughput.
[053] The observed usage of a link is determined by estimating the usage of the link by each consumer served by the link. The usage may be estimated based on the consumer's inbound as well as outbound usage of the link. For example, the usage may be based on the total amount of data sent using the link in either direction during a time interval. The time interval for measuring the usage of a link by a consumer can be the predetermined time interval that the allocation manager 225 waits 510 before re-computing the
TotalThroughput value for the link. For example, the time interval for measuring the usage of a link by a consumer can be 30 seconds and the data transferred measures using kilobytes. The observed usage for a link during a time interval is the total of the current usage of all consumers of the link during the time interval. In case of resources that are network links, the usage is measured in both directions, sending and receiving.
[054] Based on the observed usage ObservedUsage of the link in the current time interval as well as previous time intervals, the allocation manager 225 re-computes 520 the total throughput value using the following equation:
TotalThroughput (link) =
MAX {LowEstimateBW (link), MAX (DiscountValue(ObservedUsage(s), t))} ^
0<= s <=lookba ck
[055] The variable lookback is a parameter to determine the length of historic time interval over which the observed usages are considered for evaluating the total throughput for a link for the current time. The variable t is the present time and variable s represents any time point between t and lookback for which observed usage was determined.
The example of Discounted Value function is
DiscountValue(ObservedUsage(s), t) = ObservedUsage(s) * e~a*(t . The value e is a constant. Historical values determined earlier than t-lookback time are not considered in the above equation (1) for evaluation ofTotalThroughput . Specifically, equation (1) computes the TotalThroughput of a link based on all observed usage values ObservedUsage over the previous time interval of size lookback.
[056] The ObservedUsage values of previous time points are weighted to reduce the influence of old values on the computation of TotalThroughput . The factor e~ax{t ~s) exponentially reduces the weight of the older values. The above equation keeps the computation of TotalThroughput dynamic so that although the value of TotalThroughput is based on historical values, recent values have more significant impact on the value of TotalThroughput than older values. Accordingly, a temporary increase in observed usage will increase the TotalThroughput value but unless the increase is sustained over significant period of time or observed again, its influence on the computation of TotalThroughput is exponentially reduced over time until it is completely eliminated from the computation after lookback time interval.
[057] Alternative embodiments may utilize other functions to reduce the weight of older observed usages, for example a linear function or non-linear functions can be used. In some embodiments, the weight of all previous observed usages considered is the same and the older observed usages get eliminated after lookback time. The equation (1) ensures that even if observed usage values reduce significantly, the value of TotalThroughput is not reduced below LowEstimateBW . In some embodiments, the value of the lookback parameter can be dynamically adjusted. The value of the lookback parameter can be manually changed by a system administrator or determined based by the allocation manager 225. For example, if the observed usages in the system are changing very slowly, the value of lookback can be increased, whereas if the observed usages in the system are changing more frequently, the value of lookback parameter can be reduced. In an embodiment, changes to lookback parameter can be driven by various 'lookback policies,' for example absolute time (e.g. lookback for a month/quarter/year worth of data), or/and by the amount of data processed, e.g. lookback goes as far as needed to account for 100TB of data). These lookback policies can be either manual or automatic.
[058] An alternative embodiment uses the following recursive equation for computing the TotalThroughput for the current time indicated by time t and the computation of TotalThroughput for a previous time s.
TotalThroughput (link ,t) =
MAX {LowEstimateBW (link), ObservedUsage(t), TotalThroughput (link, s) x e~ax(t )}
[059] For the initial time tO, there is no time s before time t for which TotalThroughput value is available. The computation of TotalThroughput for time tO is based on the value of LowEstimateBW for the link as follows.
TotalThroughput(link,tO) = LowEstimateBW (link) (3) [060] The equation (2) computes the TotalThroughput value for time t based on the TotalThroughput value for a previous time point weighted by an exponential factor depending on the time difference between t and s. Alternative embodiments can use a different function to determine weight applied to the previous TotalThroughput value. For example, the weight applied to the previous TotalThroughput value can be a linear function of the time difference between present time and the previous time, a non-liner function or even a constant value. Typical functions used for computing the weights applied to the TotalThroughput value of previous time points attempt at reducing the significance of previous TotalThroughput values in computation of TotalThroughput for current time point.
[061] In another embodiment, an estimate of the true total throughput for the link, TrueTotalThroughputilink) is computed based on the following equation:
TrueTotalThroughputilink) = MAX (ObservedUsage(s) x e-ax{t~s)) (4)
0<=s<=lookback
[062] The true total throughput value assumes the Low EstimateBW {link) =0, i.e., it ignores the effect of LowEstimateBW{link) in equation (1). Since equation (1) uses
LowEstimateBW ilink) , if the maximum of the weighted past observed usage values is too low, the TotalThroughput (link) value obtained from equation (1) can be higher than the value computed using equation (4). The TrueTotalThroughputilink) value can be used for reporting purposes.
[063] Periodically, the value of all allocations is increased 530 by a factor (called fudge factor), for example, by 10%. The increase of the allocations is intended to cause the allocations to increase and reach a true maximum value of the allocations. The additional amount of resource allocated by the fudge factor may cause the ObservedUsage for the next iteration to increase compared to the previous iteration if the increase in allocation can be consumed. If each iteration increases the allocations by the fudge factor, the
TotalThroughput increases in each iteration until the aggregate needs of all consumers of the resources are satisfied or the actual maximum throughput value based on the constraints of the resources is reached. When the needs of all consumers of the resources are satisfied or the actual maximum throughput value based on the constraints of the resources is reached, the additional resources introduced by the fudge factor are not consumed. As a result the observed TotalThroughput is not increased at time t. [064] If the TotalThroughput value determined by increasing 530
the TotalThroughput by the fudge factor is determined 535 to be higher than an upper estimate of the stated link capacity, the TotalThroughput value is assigned 540 to the upper estimate of the stated link capacity. The upper estimate of the stated link capacity may be determined from the stated link capacity, for example, 90% of the stated link capacity for each link. Typical inefficiencies of any practical system disallow the system to reach stated link capacities for the available links. Therefore, the TotalThroughput value for a link is limited to a maximum value based on the upper estimate of the stated link capacity. Whether the TotalThroughput is limited to the upper estimate of the stated link capacity or determined by applying the fudge factor to the re-computed 520 TotalThroughput value, the allocation manager 225 allocates 545 resources to consumers based on the total throughput. Since the total throughput is increased by a predetermined factor, the consumers may receive additional resources compared to their observed usage. The allocation manager 225 waits 510 for the predetermined interval and determines 515 the observed usages for the link and also determines 520 the TotalThroughput value. Some consumers may be able to utilize the additional allocated resources whereas other consumers may not need the additional allocated resources.
[065] It is possible that the value of TotalThroughput for an iteration is over estimated. For example, the value of TotalThroughput can be overestimated if the system is
reconfigured to change the network or storage resources available or there is a significant change in the load distribution. A change in load distribution may occur, for example, if the load is switched from sequential input/output (IO) used for analytical applications to transactional load dominated by smaller IO operations that are randomly occurring. Since TotalThroughput is determined based on historical observations, the estimated
TotalThroughput value may be larger than the changed throughput value available to the resources on a link. The overestimate of the available resources may lead to additional resources being allocated to the consumers, based on phantom portion of resource that does not actually exist. However the decay of historical TotalThroughput values over time accounted for in equations (1 ,2) and the elimination of historical values prior to the lookback time interval causes the extra allocation of resources to reduce and get eliminated over time causing the TotalThroughput value to reach a realistic estimate. In an embodiment, a system administrator is allowed to reset the TotalThroughput value to initial default value, causing the allocation manager 225 to re-compute the TotalThroughput value from scratch. An embodiment allows the allocation manager 225 to automatically reset the
TotalThroughput value to initial default value either periodically or based on detection of particular events, for example, changes in network configurations or events that indicate significant load changes, for example, addition or deletion of a production database system 110 from the database storage system 100 configuration.
Resource Guarantees
[066] Typical consumers of resources in a system similar to the system illustrated in FIG. 1 may require a minimum amount of resources to operate. For example, a module acting as a consumer may be required to send a periodic message stating its status. The status signal may be required to detect system failures, for example, modules may send a signal that indicates "I am alive" to another module in-charge of monitoring the health of various subsystems or modules. If no signal is received from a module or sub-system, the system 100 may activate procedures to detect hardware or software failures in order to take appropriate action.
[067] There may be other reasons for guaranteeing minimum availability of resources to specific systems. For example, a virtual database manager 275 interacting with a virtual database system 130 may need minimum amount of resources to continue a meaningful mode of processing for a particular task. Although the allocation manager 225 allocates minimum amount of resources to specific consumers, the usage of these consumers may need to be minimized to favor higher priority consumers. In an embodiment, a survival level resource allocation may be guaranteed to each consumer process created in the system and the consumer process needs to be suspended or deleted to reclaim the survival minimum resources allocated to the consumer. Note that suspension of a consumer process only stops real time activity of this consumer (data access, network traffic, etc) and frees all resources associated/guaranteed to this consumer but does not destroy storage of data associated with this consumer. For example, deleting a consumer process associated with a virtual database does not require deletion of the storage associated with the VDB.
[068] In an embodiment, the survival minimum resource allocation guaranteed to a consumer is configurable by a system administrator. In another embodiment, certain default values may be assigned to different categories of consumers based on their priorities in the system. [069] The minimal resource guarantee for a consumer in the system 100 is the minimal amount of resource that is made available by the allocation manager 225 to the consumer. If the consumer does not need its allocated minimal resources, the leftover portions of the resources are allocated by the allocation manager to other consumers based on their priority. On the other hand, if the allocation manager 225 determines after allocating higher priority consumers that there are leftover resources for lower priority consumers, the allocation manager 225 can provide additional allocations to the lower priority consumers, over and above the guaranteed minimum allocation. In an embodiment, a system administrator is allowed also to set maximum allocation values for individual consumers. A default value for minimum allocation of consumer resources can be zero, and a default value for the maximum allocation of consumer resources can be infinity.
[070] In an embodiment, in addition to individual guarantees, the system can be configured to have a minimum guarantee for an entire set of consumers as a group, for example, the P2 group 315 shown in FIG. 3. The overall minimum guarantee for the P2 group corresponds to an amount of resources to be distributed among P2 consumers, if the P2 consumers are able to consume the resources. If the P2 consumers are unable to consume all the resources allocated by group minimum guarantee, the unused resources may be allocated to other consumers. The benefit of being able to configure a minimum guarantee for a group of consumers is to prevent the group of consumers (for example, P2 group) from getting starved of resources by another group of consumers that has higher priority (for example, PI). The value of the minimum guarantee for a group of consumers can be specified by a system administrator or predetermined to a default value, for example, zero. An embodiment automatically derives the minimum guarantee automatically based on historical data. For example, group guarantee can be set as a fixed percentage of the historically observed total group usage. Alternatively, the resource needs of the group are observed in the time periods when the workload is not dominated by the high priority consumers (unconstrained periods). Based on that resource needs of the group, the group guarantee is determined so as to always provide the group with at least 65% of its estimated total need.
[071] The overall minimum guarantee for a group may be either set individually for each link or set globally and then distributed across links. In the later case, the embodiment does this in proportion to the group traffic on the link.
GroupGuarantee x GroupThroughput(link)
GroupGuarantee{link) (5)
GroupThroughput{link) [072] In the above equation (4), the GroupGuarantee(link) is the minimum guarantee for a group, for a specific link. The GroupGuarantee is the overall minimum guarantee for the group. The GroupThroughput(link) is the total throughput of the traffic generated by the group for a specific link. The value ^ GroupThroughput(link) represents the sum of the linkeLlNKS
GroupThroughput(link) values for all links, where LINKS is the entire set of links.
[073] When a new guarantee value is set for a consumer, the allocation manager 225 may check various constraints including the following: (1) The sum of individual guarantees and survival guarantees for all the consumers in a group (for example P2), does not exceed the overall guarantee for the group. (2) The sum of the individual guarantees and survival guarantees for all the consumers in a group is below the low estimate for bandwidth for the link LowEstimateBW(link) which is determined as a predetermined fraction of the stated capacity of the link. (2) The overall guarantee specified for the group is below the
LowEstimateBW {link) value. If any of the above checks fails, a warning may be generated, for example, to inform the system administrator of a constraint violation related to guarantees. These checks ensure that the resources guaranteed are definitely available, for example if the overall capacity estimate is reduced. In an embodiment, the guarantees are in absolute terms, not as percentage of the estimate.
Resource Allocation
[074] FIG. 6 shows a flowchart of the process used for allocating the resources among consumers. The process illustrated in FIG. 6 assumes consumers are classified into two sets, PI, and P2. The consumers in priority group PI are higher priority consumers compared to consumers in priority group P2. The process in FIG. 6 guarantees that the PI consumers are given priority over P2 consumers while the guarantees for P2s are preserved. Allocations are determined for PI consumers before P2 consumers, unless it is known that the needs of the P2 consumers are very low and are not causing any risk to the allocations of PI consumers.
[075] Initially, the survival guarantees of the consumers in set P2 are allocated 605. The group guarantees of the consumers of P2 priority group represent the amount of resources available to the consumers collectively if they can use the resources made available. The unused amount is returned to a common allocation pool. The guarantees of the consumers of P2 priority group are designed to protect the consumers of the lower priority P2 group from being starved by the consumers of the higher priority PI group.
[076] The needs of the P2 consumers are determined 605 to check 610 if the needs of the P2 consumers are below the P2 guarantees. The needs of a consumer are determined based on the observed usage of the consumer. In an embodiment, a consumer is given an
additional margin over and above the observed usage. The addition of the margin allows
identification of consumers whose needs are growing. In an embodiment, the value of the margin by which the observed usage is increased for a consumer depends on the priority and sub-priority of the consumer as shown in FIG. 3. The following table shows an example of margins determined based on the priority and sub-priority of a consumer.
Figure imgf000021_0001
[077] Each row of the above table shows the margin value (third column) as a
percentage of increase for a consumer with priority P (first column) and sub-priority S
(second column). The values shown in the above table are example values. Each system
may determine different set of values by tuning the parameters appropriately. In other
embodiments the margin can be a function depending on observed usage. The new usage
NewUsage(C) of a consumer C is determined by increasing the observed usage
ObservedUsage(C) by the margin percentage. In an embodiment, the computation of
NewUsage(C) value for a particular consumer can enforce a maximum value MAX(C). The calculation is shown in the equatios (6,6a) below. The components of the formula are: the minimum individual guarantee for the consumer, the survival minimum value assigned to the consumer and projected need of the consumer with the margin. The needs of the P2
consumers is the total of NewUsage(C) values for all consumers of the priority group P2.
The margin for priority P and sub-priority S is indicated by MARGIN(P,S).
NewUsage(C) = MAX{MinGuarantee(C),MinSurvival(C), ObservedUsage(C) x (1 + MARGIN(P, S) (6)
NewUsage(C) = MIN{NewUsage(C),Max(Q) (6a)
[078] If P2 consumers needs are determined to be below the guarantees of the P2
consumers, first the allocations for P2 consumers are determined 615 based on their needs, followed by allocations of PI consumers determined 620 based to their needs. Since the needs of the P2 consumers are known to be below their guarantees, their allocations can be determined before the allocations of PI consumers. Since the P2 consumers are expected to consume less resources then the amount they are guaranteed they are not causing any risk to PI consumers of being allocated less resources. In an embodiment, a greedy algorithm described herein is used for determining 615 the allocations for P2 consumers. The greedy algorithm sequentially allocates the resources to the different consumers, going thorough the list of the consumers in order of decreasing priority. Since the needs of the P2 consumers were determined 610 to be less than the guarantees for the P2 consumers, it is likely that after allocating all resources for the P2 and PI consumers, there are leftover resources. The remaining (leftover) resources are allocated 645 to all the consumers. In an embodiment, the remaining resources may be allocated 645 proportional to the needs of the consumers. In another embodiment, the allocation 645 of the remaining resources may be weighted by the priority/sub-priority of the consumer.
[079] If the P2 needs are above the P2 guarantees, first the amount of resources equal to (TotalThroughput-P2Guarantees-AllSurvivalMinimumAllocations) is allocated to PI consumers 625 based on their needs. Since the needs of the P2 consumers are higher than their guarantees, it is possible that if the P2 consumers are allocated resources before the PI consumers, there may not be sufficient resources left for PI consumers. After the resources required for PI consumers are determined 625. the remaining resources are checked 630 to determine if there are sufficient resources left for P2 consumers. If there are sufficient resources left for the P2 consumers, the allocations for the P2 consumers are determined 635 based on their needs, for example, based on a greedy algorithm. After allocations for P2 consumers are also determined 635, the remaining leftover resources can be allocated 645. If after determining 625 allocations for the PI consumers, it is determined that the remaining resources are not sufficient for the P2 consumers, the resources are allocated to P2 consumers based on a fair share strategy described below. In this situation, it is highly likely that there are no more leftover resources. However, if any leftover resources are found, they are allocated 645. After the leftover resources are allocated, a fudge factor may be introduced to find the maximum capacity as described above for step 535 in FIG. 5.
Allocation Strategies
[080] FIG. 7 shows a flowchart of the process used for allocating the resources among consumers of a priority group based on a greedy or a fair share strategy. A variable
RemainingResource is initialized to a value representing the remaining resources at the stage at which the process shown in FIG. 7 is executed. The process iterates over all the consumers of the priority group in order of decreasing priority. For example, the consumers within the priority group P may be divided into subgroups that are assigned sub-priorities as shown in FIG. 3. The consumers of priority P2 may be processed in the order: consumers with priority P2 and sub-priority high, followed by consumers with priority P2 and sub-priority medium, followed by consumers with priority P2 and sub-priority low. Accordingly, a consumer C with the highest priority/sub-priority that hasn't been processed is selected 710.
[081] The allocation for the consumer selected 710 is determined 715. The strategy used for determining 715 the allocation for consumer C is different for the greedy allocation compared to the fair share allocation. For greedy allocation, the allocation for consumer C is determined to be the NewUsage(C) see formula (6). Therefore, in the greedy allocation strategy, the consumer is allocated as much as the consumer needs based on its NewUsage value which already takes into account the guarantees
[082] In the fair share allocation strategy, the allocation for the consumer may be less than the NewUsage value determined for the consumer. First a FairShare(C) value is determined for the consumer using the equation (7) below:
FairShare(C) = R x— — (7)
∑W(ci)
ci<≡Consumers
[083] The fair share value for a consumer FairShare(C) is determined based on the fraction of resources R allocated to the consumer C compared to the total resource allocated for all consumers ci in the set Consumers of the priority group. W(c) is a weight assigned to consumer C. Weights are designed to reflect the priority and subpriority of the consumers. The allocation for consumer C is determined to be min(NewUsage(C),FairShare(C)) .
Therefore, the allocation of a consumer C may be limited by the FairShare(C) value computed for the consumer, even if the consumer C needs NewUsage(C) resources.
[084] The leftover allocation 645 divides remaining allocations after resource allocations for all consumers have been determined based on greedy or fair share allocation strategies. Leftover allocations may not be provided to consumers that have reached their maximum allocations. In an embodiment, if the resources are determined to be lightly loaded, the leftover resources are divided equally among all consumers. The previously determined allocations of all consumers are incremented by the amount obtained by equally dividing the leftover resources among all consumers. In one embodiment the system may be determined to be lightly loaded for a link if the number of consumers using the link is low and the observed usages of consumers using the link is also determined to be low. For example, the system may be considered lightly loaded for a link if there are fewer than 50 consumers using the link and the overall observed usage of the link is less than a quarter of the stated link capacity.
[085] If the lightly loaded conditions are not met, the resources are divided between consumers in proportion of usage and weight of the consumers. The weight associated with a consumer is based on priority preferences, for example, the weight may be determined based on the priority and sub-priority associated with the consumer. The following equation shows how the share Share(C) of a consumer C is determined for leftover resources R.
2^ W{ci) x NewUsage(ci)
ci^Consumers
[086] The share of a consumer Share (C) of the remaining resource R is determined based on the weighted fraction of the usage of consumer C compared to the weighted usage for all consumers ci in the set Consumers of the priority group. Based on the equation (8) above, the total of all Share(C) for all consumers adds up to the remaining total resource. Shares of consumers with the same usage are proportional to weights determined by their priorities and sub-priorities. Furthermore, shares of consumers within the same priority/sub- priority groups are proportional to their usage. The allocations of resources for consumers determined previously are updated by adding the corresponding Share(C) value to each allocation. If the resulting value exceeds the maximum limit configured for the consumer, the allocation is limited to the maximum limit. Based on the above updates to allocations, the value of remaining resources is computed again. If for any reason there are still remaining resources, the above allocation can be repeated.
Metrics for Reporting
[087] In an embodiment, an unhappiness index is determined by the metrics manager 265 as a measure of potential of starvation for a particular consumer. The metric is based on the fraction of a time interval during which the usage of the consumer exceeds a
predetermined percentage of allocation, for example, 85% of allocation. In an embodiment, the unhappiness index is measures over a fixed time window, for example, 24 hours. During the fixed time window, there can be several allocation runs during which the allocation manager 225 re-computes the allocations for the next time interval. The time interval between two allocation runs is called an allocation interval and corresponds to the time during which the previous allocation was enforced. [088] The unhappiness index is determined as the sum of all allocation intervals ti within the time window TWINDOW when the usage of consumer C usage (C,ti) was greater than N% divided by the size of the TWINDOW. In an embodiment, N=85%. Equation (9) below shows the computation of the unhappiness index for a consumer C during the time window Unhappiness(C, TWINDOW) .
∑if(usage(C, ti) > N%)THEN(ti)ELSE(0)
Unhappiness(C, TWINDOW) = ti-TWINDOW (9)
TWINDOW
[089] The summation in the numerator of (9) adds the time interval ti when the usage of consumer C during ti, usage (C,ti) is greater than N%. The denominator of equation (9) adds all the time intervals ti within the window TWINDOW thereby providing the time of the entire time window TWINDOW. In an embodiment, the consumers are ordered in decreasing order of their unhappiness index to obtain their unhappiness rank. The unhappiness tank and unhappiness index of the consumers may be reported by the metrics manager 265 to a system administrator, for example, via a user interface 295. The system administrator may decide to change the priority or sub-priority of the consumer based on the unhappiness index combined with other criteria for example, the type of the task. In an embodiment, the metrics manager 265 may provide the information regarding unhappiness of consumers to allow the allocation manager to make automatic adjustments to the priority or sub-priority of the consumers. In an embodiment, the unhappiness measure of a consumer is used for automatic adjustment of priority/sub-priority of the consumer. For example, if a consumer is unhappy most of the time, the consumer may be automatically promoted to a higher priority.
Computing Machine Architecture
[090] FIG. 8 is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller). Specifically, FIG. 8 shows a diagrammatic representation of a machine in the example form of a computer system 800 within which instructions 824 (e.g., software) for causing the machine to perform any one or more of the methodologies discussed herein may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
[091] The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 824 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term
"machine" shall also be taken to include any collection of machines that individually or jointly execute instructions 824 to perform any one or more of the methodologies discussed herein.
[092] The example computer system 800 includes a processor 802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 804, and a static memory 806, which are configured to communicate with each other via a bus 808. The computer system 800 may further include graphics display unit 810 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). The computer system 800 may also include alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 816, a signal generation device 818 (e.g., a speaker), and a network interface device 820, which also are configured to communicate via the bus 808.
[093] The storage unit 816 includes a machine-readable medium 822 on which is stored instructions 824 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 824 (e.g., software) may also reside, completely or at least partially, within the main memory 804 or within the processor 802 (e.g., within a processor's cache memory) during execution thereof by the computer system 800, the main memory 804 and the processor 802 also constituting machine-readable media. The instructions 824 (e.g., software) may be transmitted or received over a network 826 via the network interface device 820.
[094] While machine-readable medium 822 is shown in an example embodiment to be a single medium, the term "machine-readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 824). The term "machine -readable medium" shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 824) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term "machine-readable medium" includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media. Additional Configuration Considerations
[095] Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
[096] Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
[097] In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
[098] Accordingly, the term "hardware module" should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, "hardware-implemented module" refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general- purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
[099] Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
[0100] The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor- implemented modules.
[0101] Similarly, the methods described herein may be at least partially processor- implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations. [0102] The one or more processors may also operate to support performance of the relevant operations in a "cloud computing" environment or as a "software as a service" (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
[0103] The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor- implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
[0104] Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an "algorithm" is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to these signals using words such as "data," "content," "bits," "values," "elements," "symbols," "characters," "terms," "numbers," "numerals," or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
[0105] Unless specifically stated otherwise, discussions herein using words such as "processing," "computing," "calculating," "determining," "presenting," "displaying," or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non- volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information. [0106] As used herein any reference to "one embodiment" or "an embodiment" means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
[0107] Some embodiments may be described using the expression "coupled" and
"connected" along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term "connected" to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term "coupled" to indicate that two or more elements are in direct physical or electrical contact. The term "coupled," however, may also mean that two or more elements are not in direct contact with each other, but yet still cooperate or interact with each other. The embodiments are not limited in this context.
[0108] As used herein, the terms "comprises," "comprising," "includes," "including," "has," "having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or" refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
[0109] In addition, use of the "a" or "an" are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
[0110] Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for creating virtual databases from point-in-time copies of production databases stored in a storage manager. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.

Claims

What is claimed is:
1. A method for computing total throughput of a network resource used by consumers, wherein the total throughput is a measure of the aggregated network bandwidth available to the plurality consumers using the network resource, the method comprising:
determining a plurality of cumulative usage values of the network resource,
wherein each cumulative usage value is associated with a time interval and is based on observed usages of the network resource by a plurality of consumers over the time interval;
determining a total throughput of the network resource based on an aggregate of the plurality of cumulative usage values of the network resource;
increasing the total throughput of the network resource by a predetermined factor; and
determining allocations of the network resource for each consumer in the plurality of consumers based on the increased total throughput of the network resource, wherein an allocation of the network resource for a consumer determines availability of network resource to the consumer for a subsequent time interval.
2. The method of claim 1, wherein determining the total throughput discounts the cumulative usage values of the network resource such that cumulative usage values associated with older time intervals are discounted more than cumulative usage values associated with newer time intervals.
3. The method of claim 1, wherein the total throughput is determined based on cumulative usage values of the network resource discounted by a factor which is a function of age increasing with the age of the time interval associated with a cumulative usage value.
4. The method of claim 1 , wherein the total throughput is determined based on cumulative usage values of the network resource discounted by a factor linearly increasing with the age of the time interval associated with a cumulative usage value.
5. The method of claim 1, wherein determining the total throughput value excludes cumulative usage values outside a predetermined time interval.
6. The method of claim 1, further comprising:
responsive to determining the total throughput is below a predetermined threshold value, using a low estimate of bandwidth based on a fraction of the stated link capacity as the total throughput.
7. The method of claim 1, further comprising: responsive to determining the total throughput is above a predetermined threshold value, using a high estimate of bandwidth based on a fraction of the stated link capacity as the total throughput.
8. A method for allocating a network resource to a plurality of consumers, the method comprising:
determining a metric representing needs of a low priority set of consumers of a network resource based on observed usage of the network resource by each consumer;
responsive to the metric representing the needs of the low priority set of consumers being above a threshold value, determining allocations of the network resource for a high priority set of consumers;
responsive to determining allocations of the network resource for the high priority set of consumers, determining a remaining amount of network resource not allocated to the high priority set of consumers; and
allocating the remaining amount of network resource to the low priority set of consumers.
9. The method of claim 8, wherein the determining allocations of the network resource for the second set of consumers allocates an amount of resources based on an estimate of the requirement of each consumer.
10. The method of claim 8, further comprising:
responsive to an estimate of requirements of the first set of consumers being
below the threshold value, determining allocations of the network resource for the first set of consumers based on needs of each consumer before determining allocations for the second set of consumers.
11. The method of claim 8, wherein allocating the remaining network resource to the first set of consumers further comprises:
responsive to determining that the remaining network resource is more than the needs of the first set of consumers, allocating amount of resources needed by each consumer in the first set of consumers.
12. The method of claim 8, wherein allocating the remaining network resource to the first set of consumers further comprises:
responsive to determining that the remaining network resource is less than the needs of the first set of consumers, dividing the remaining resources between the first set of consumers, wherein the resource allocated to each consumer is based on a priority of the consumer.
13. A method for controlling resources allocated to databases and database applications using a virtual database system, the method comprising:
storing on a storage system, database blocks for a plurality of different point-in- time copies of a source database, wherein at least some of the stored database blocks are associated with multiple point-in-time copies of the source database;
provisioning a plurality of virtual databases to one or more systems, wherein provisioning each virtual database to a system comprises:
creating a set of files linked to the stored database blocks on the
storage system,
mounting the set of files to the system to allow a database server running on a system to access the set of files;
determining allocation of resources of the storage system for a task associated with a virtual database and a system, wherein the task is associated with a priority and determining allocation of resources comprises:
estimating a requirement of the task for a network resource of the storage system; and
allocating the network resource to the task based on the requirements of the task and the priority of the task.
14. The method of claim 13, wherein a first subset of tasks is associated with a high priority and a second subset is associated with a low priority and the network resource allocated to the first subset of tasks is higher than the second subset of tasks.
15. The method of claim 14, wherein resources are allocated to the first subset based on an estimate of the requirements of the first subset and the remaining amount of resources are allocated to the second subset of tasks.
16. A computer program product having a computer-readable storage medium storing computer-executable code allocating a network resource to a plurality of consumers, the code comprising:
a metrics manager configured to: determine a metric representing needs of a low priority set of consumers of a network resource based on observed usage of the network resource by each consumer;
an allocations manager configured to:
responsive to the metric representing the needs of the low priority set of
consumers being above a threshold value, determine allocations of the network resource for a high priority set of consumers;
responsive to determining allocations of the network resource for the high priority set of consumers, determine a remaining amount of network resource not allocated to the high priority set of consumers; and allocate the remaining amount of network resource to the low priority set of consumers.
17. The computer program product of claim 16, wherein the allocations manager is further configured to determine allocations of the network resource for the second set of consumers and allocate an amount of resources based on an estimate of the requirement of each consumer.
18. The computer program product of claim 16, wherein the allocations manager is further configured to:
responsive to an estimate of requirements of the first set of consumers being
below the threshold value, determine allocations of the network resource for the first set of consumers based on needs of each consumer before determining allocations for the second set of consumers.
19. The computer program product of claim 16, wherein allocating the remaining network resource to the first set of consumers further comprises:
responsive to determining that the remaining network resource is more than the needs of the first set of consumers, allocating amount of resources needed by each consumer in the first set of consumers.
20. The method of claim 16, wherein allocating the remaining network resource to the first set of consumers further comprises:
responsive to determining that the remaining network resource is less than the needs of the first set of consumers, dividing the remaining resources between the first set of consumers, wherein the resource allocated to each consumer is based on a priority of the consumer.
PCT/US2010/060536 2009-12-24 2010-12-15 Adaptive resource management WO2011078998A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP10839995.7A EP2517115A4 (en) 2009-12-24 2010-12-15 Adaptive resource management

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/647,337 2009-12-24
US12/647,337 US9106591B2 (en) 2009-12-24 2009-12-24 Adaptive resource management using survival minimum resources for low priority consumers

Publications (1)

Publication Number Publication Date
WO2011078998A1 true WO2011078998A1 (en) 2011-06-30

Family

ID=44189096

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/060536 WO2011078998A1 (en) 2009-12-24 2010-12-15 Adaptive resource management

Country Status (3)

Country Link
US (2) US9106591B2 (en)
EP (1) EP2517115A4 (en)
WO (1) WO2011078998A1 (en)

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8150808B2 (en) 2009-10-21 2012-04-03 Delphix Corp. Virtual database system
US8161077B2 (en) 2009-10-21 2012-04-17 Delphix Corp. Datacenter workflow automation scenarios using virtual databases
US8213243B2 (en) * 2009-12-15 2012-07-03 Sandisk 3D Llc Program cycle skip
KR101644800B1 (en) * 2010-01-07 2016-08-02 삼성전자주식회사 Computing system and method
US8548944B2 (en) 2010-07-15 2013-10-01 Delphix Corp. De-duplication based backup of file systems
US8429659B2 (en) * 2010-10-19 2013-04-23 International Business Machines Corporation Scheduling jobs within a cloud computing environment
US8468174B1 (en) * 2010-11-30 2013-06-18 Jedidiah Yueh Interfacing with a virtual database system
US8825840B2 (en) * 2011-02-22 2014-09-02 Intuit Inc. Systems and methods for self-adjusting logging of log messages
US8959221B2 (en) * 2011-03-01 2015-02-17 Red Hat, Inc. Metering cloud resource consumption using multiple hierarchical subscription periods
WO2012120655A1 (en) * 2011-03-08 2012-09-13 富士通株式会社 Scheduling method and scheduling system
US20130014119A1 (en) * 2011-07-07 2013-01-10 Iolo Technologies, Llc Resource Allocation Prioritization Based on Knowledge of User Intent and Process Independence
US9141887B2 (en) 2011-10-31 2015-09-22 Hewlett-Packard Development Company, L.P. Rendering permissions for rendering content
US8949197B2 (en) 2011-10-31 2015-02-03 Oracle International Corporation Virtual full backups
US9098344B2 (en) * 2011-12-27 2015-08-04 Microsoft Technology Licensing, Llc Cloud-edge topologies
KR101733117B1 (en) * 2012-01-31 2017-05-25 한국전자통신연구원 Task distribution method on multicore system and apparatus thereof
US9727383B2 (en) 2012-02-21 2017-08-08 Microsoft Technology Licensing, Llc Predicting datacenter performance to improve provisioning
EP2826209A4 (en) * 2012-03-14 2015-10-21 Hewlett Packard Development Co Allocating bandwidth in a network
US20130290511A1 (en) * 2012-04-27 2013-10-31 Susan Chuzhi Tu Managing a sustainable cloud computing service
US9462080B2 (en) * 2012-04-27 2016-10-04 Hewlett-Packard Development Company, L.P. Management service to manage a file
US9531607B1 (en) 2012-06-20 2016-12-27 Amazon Technologies, Inc. Resource manager
WO2014047912A1 (en) * 2012-09-28 2014-04-03 华为技术有限公司 User grouping method and apparatus
US9817834B1 (en) * 2012-10-01 2017-11-14 Veritas Technologies Llc Techniques for performing an incremental backup
US8788461B2 (en) 2012-10-04 2014-07-22 Delphix Corp. Creating validated database snapshots for provisioning virtual databases
GB2507338A (en) * 2012-10-26 2014-04-30 Ibm Determining system topology graph changes in a distributed computing system
US11270325B2 (en) * 2013-03-13 2022-03-08 Eversight, Inc. Systems and methods for collaborative offer generation
US10574748B2 (en) * 2013-03-21 2020-02-25 Infosys Limited Systems and methods for allocating one or more resources in a composite cloud environment
US9614794B2 (en) * 2013-07-11 2017-04-04 Apollo Education Group, Inc. Message consumer orchestration framework
US10275268B2 (en) * 2013-08-26 2019-04-30 Red Hat, Inc. Providing entropy to a guest operating system
US20150106649A1 (en) * 2013-10-11 2015-04-16 Qualcomm Innovation Center, Inc. Dynamic scaling of memory and bus frequencies
CN104750558B (en) * 2013-12-31 2018-07-03 伊姆西公司 The method and apparatus that resource allocation is managed in quota system is layered
US9471371B2 (en) 2014-02-27 2016-10-18 International Business Machines Corporation Dynamic prediction of concurrent hardware transactions resource requirements and allocation
US9621439B2 (en) * 2014-02-28 2017-04-11 International Business Machines Corporation Dynamic and adaptive quota shares
US9448843B2 (en) 2014-03-26 2016-09-20 International Business Machines Corporation Allocating a global resource to consumers across different regions of a distributed grid environment based on use data and demand data of each consumer
US9710039B2 (en) * 2014-07-17 2017-07-18 International Business Machines Corporation Calculating expected maximum CPU power available for use
WO2016018208A1 (en) * 2014-07-28 2016-02-04 Hewlett-Packard Development Company, L.P. Accessing resources across multiple tenants
JP6513984B2 (en) * 2015-03-16 2019-05-15 株式会社スクウェア・エニックス PROGRAM, RECORDING MEDIUM, INFORMATION PROCESSING DEVICE, AND CONTROL METHOD
US9507636B2 (en) * 2015-04-20 2016-11-29 International Business Machines Corporation Resource management and allocation using history information stored in application's commit signature log
US10102031B2 (en) * 2015-05-29 2018-10-16 Qualcomm Incorporated Bandwidth/resource management for multithreaded processors
CN104935657A (en) * 2015-06-15 2015-09-23 清华大学深圳研究生院 Method for actively pushing information and embedded node operating system
CN105183591A (en) * 2015-09-07 2015-12-23 浪潮(北京)电子信息产业有限公司 High-availability cluster implementation method and system
US10025718B1 (en) 2016-06-28 2018-07-17 Amazon Technologies, Inc. Modifying provisioned throughput capacity for data stores according to cache performance
CN106302020B (en) * 2016-08-18 2019-08-16 上海帝联信息科技股份有限公司 Network bandwidth statistical method and device
US10503548B2 (en) 2016-10-11 2019-12-10 Microsoft Technology Licensing, Llc Resource and latency estimation-based scheduling in a distributed computing environment
US11941659B2 (en) 2017-05-16 2024-03-26 Maplebear Inc. Systems and methods for intelligent promotion design with promotion scoring
US10445208B2 (en) * 2017-06-23 2019-10-15 Microsoft Technology Licensing, Llc Tunable, efficient monitoring of capacity usage in distributed storage systems
US11144341B2 (en) * 2017-07-13 2021-10-12 Hitachi, Ltd. Management apparatus and management method
WO2019135703A1 (en) * 2018-01-08 2019-07-11 Telefonaktiebolaget Lm Ericsson (Publ) Process placement in a cloud environment based on automatically optimized placement policies and process execution profiles
US10901806B2 (en) * 2018-05-01 2021-01-26 International Business Machines Corporation Internet of things resource optimization
US11144973B2 (en) * 2018-06-29 2021-10-12 Paypal, Inc. Optimization of data queue priority for reducing network data load speeds
US11500825B2 (en) * 2018-08-20 2022-11-15 Intel Corporation Techniques for dynamic database access modes
US11003504B2 (en) * 2019-06-28 2021-05-11 Cohesity, Inc. Scaling virtualization resource units of applications
KR102496115B1 (en) * 2019-11-28 2023-02-06 한국전자통신연구원 Apparatus and Method of Altruistic Scheduling based on Reinforcement Learning
US11544230B2 (en) * 2020-06-23 2023-01-03 Citrix Systems, Inc. Cross environment update of cloud resource tags
CN112235737A (en) * 2020-11-02 2021-01-15 北京蜂窝科技有限公司 Flow management method and system for Internet of things card
CN114442910A (en) * 2020-11-06 2022-05-06 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for managing storage system
US11539635B2 (en) 2021-05-10 2022-12-27 Oracle International Corporation Using constraint programming to set resource allocation limitations for allocating resources to consumers
US11502971B1 (en) 2021-11-15 2022-11-15 Oracle International Corporation Using multi-phase constraint programming to assign resource guarantees of consumers to hosts
CN116074262B (en) * 2023-01-07 2023-10-31 廊坊奎达信息技术有限公司 Resource optimization allocation method based on big data platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090080398A1 (en) * 1995-06-07 2009-03-26 Mahany Ronald L Hierarchical Communication System Providing Intelligent Data, Program and Processing Migration
US20090132611A1 (en) * 2007-11-19 2009-05-21 Douglas Brown Closed-loop system management method and process capable of managing workloads in a multi-system database environment
US20090292734A1 (en) * 2001-01-11 2009-11-26 F5 Networks, Inc. Rule based aggregation of files and transactions in a switched file system

Family Cites Families (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4853843A (en) * 1987-12-18 1989-08-01 Tektronix, Inc. System for merging virtual partitions of a distributed database
US5680618A (en) * 1993-05-26 1997-10-21 Borland International, Inc. Driver query and substitution for format independent native data access
US5680608A (en) * 1995-02-06 1997-10-21 International Business Machines Corporation Method and system for avoiding blocking in a data processing system having a sort-merge network
US5634053A (en) * 1995-08-29 1997-05-27 Hughes Aircraft Company Federated information management (FIM) system and method for providing data site filtering and translation for heterogeneous databases
US5842222A (en) 1996-10-04 1998-11-24 Taiwan Semiconductor Manufacturing Company, Ltd. Production information system enhanced for availability
US6304882B1 (en) 1998-05-05 2001-10-16 Informix Software, Inc. Data replication system and method
JP2000047919A (en) 1998-07-30 2000-02-18 Hitachi Ltd Virtual database replication system
ATE290743T1 (en) * 1998-07-31 2005-03-15 Cit Alcatel METHOD, SEQUENCER, INTELLIGENT BUFFER MEMORY, PROCESSOR AND TELECOMMUNICATIONS SYSTEM FOR DISTRIBUTING AVAILABLE BANDWIDTH
US6771595B1 (en) * 1999-08-31 2004-08-03 Intel Corporation Apparatus and method for dynamic resource allocation in a network environment
US7197491B1 (en) * 1999-09-21 2007-03-27 International Business Machines Corporation Architecture and implementation of a dynamic RMI server configuration hierarchy to support federated search and update across heterogeneous datastores
US6557012B1 (en) 2000-04-22 2003-04-29 Oracle Corp System and method of refreshing and posting data between versions of a database table
US6523036B1 (en) * 2000-08-01 2003-02-18 Dantz Development Corporation Internet database system
US7310653B2 (en) 2001-04-02 2007-12-18 Siebel Systems, Inc. Method, system, and product for maintaining software objects during database upgrade
US20020143764A1 (en) * 2001-04-03 2002-10-03 Martin Andrew R. Data management system and method for intercepting and changing database instructions between a database back end and an application front end
DE50101548D1 (en) * 2001-05-17 2004-04-01 Presmar Peter Virtual database of heterogeneous data structures
US6829617B2 (en) 2002-02-15 2004-12-07 International Business Machines Corporation Providing a snapshot of a subset of a file system
US7340489B2 (en) 2002-04-10 2008-03-04 Emc Corporation Virtual storage devices
JP2003316522A (en) * 2002-04-26 2003-11-07 Hitachi Ltd Computer system and method for controlling the same system
US6857001B2 (en) 2002-06-07 2005-02-15 Network Appliance, Inc. Multiple concurrent active file systems
JP4124331B2 (en) 2002-09-17 2008-07-23 株式会社日立製作所 Virtual volume creation and management method for DBMS
US6981114B1 (en) 2002-10-16 2005-12-27 Veritas Operating Corporation Snapshot reconstruction from an existing snapshot and one or more modification logs
US7243093B2 (en) * 2002-11-27 2007-07-10 International Business Machines Corporation Federated query management
US6883083B1 (en) * 2002-12-20 2005-04-19 Veritas Operating Corporation System and method for maintaining and accessing information regarding virtual storage devices
DE60335298D1 (en) 2003-01-14 2011-01-20 Ericsson Telefon Ab L M RESOURCES ALLOCATION MANAGEMENT
US7809693B2 (en) 2003-02-10 2010-10-05 Netapp, Inc. System and method for restoring data on demand for instant volume restoration
US7457982B2 (en) 2003-04-11 2008-11-25 Network Appliance, Inc. Writable virtual disk of read-only snapshot file objects
US7539748B2 (en) * 2003-05-16 2009-05-26 Time Warner Cable, A Division Of Time Warner Entertainment Company, L.P. Data transfer application monitor and controller
US7269607B2 (en) * 2003-09-29 2007-09-11 International Business Machines Coproartion Method and information technology infrastructure for establishing a log point for automatic recovery of federated databases to a prior point in time
US7346923B2 (en) * 2003-11-21 2008-03-18 International Business Machines Corporation Federated identity management within a distributed portal server
US7409511B2 (en) 2004-04-30 2008-08-05 Network Appliance, Inc. Cloning technique for efficiently creating a copy of a volume in a storage system
US7334095B1 (en) 2004-04-30 2008-02-19 Network Appliance, Inc. Writable clone of read-only volume
US7953749B2 (en) * 2004-05-11 2011-05-31 Oracel International Corporation Providing the timing of the last committed change to a row in a database table
US7653665B1 (en) * 2004-09-13 2010-01-26 Microsoft Corporation Systems and methods for avoiding database anomalies when maintaining constraints and indexes in presence of snapshot isolation
US7363444B2 (en) 2005-01-10 2008-04-22 Hewlett-Packard Development Company, L.P. Method for taking snapshots of data
US7757056B1 (en) 2005-03-16 2010-07-13 Netapp, Inc. System and method for efficiently calculating storage required to split a clone volume
US7546431B2 (en) 2005-03-21 2009-06-09 Emc Corporation Distributed open writable snapshot copy facility using file migration policies
US20060230243A1 (en) 2005-04-06 2006-10-12 Robert Cochran Cascaded snapshots
US9152823B2 (en) * 2005-04-22 2015-10-06 Storagecraft Technology Corporation Systems, methods, and computer readable media for computer data protection
US20070055710A1 (en) 2005-09-06 2007-03-08 Reldata, Inc. BLOCK SNAPSHOTS OVER iSCSI
US20070260628A1 (en) * 2006-05-02 2007-11-08 Tele Atlas North America, Inc. System and method for providing a virtual database environment and generating digital map information
US20080037553A1 (en) 2005-12-22 2008-02-14 Bellsouth Intellectual Property Corporation Systems and methods for allocating bandwidth to ports in a computer network
US7552295B2 (en) 2006-01-03 2009-06-23 International Business Machines Corporation Maintaining consistency when mirroring data using different copy technologies
JP4822889B2 (en) * 2006-03-20 2011-11-24 富士通株式会社 Database integrated reference program, database integrated reference method, and database integrated reference device
US7747831B2 (en) 2006-03-20 2010-06-29 Emc Corporation High efficiency portable archive and data protection using a virtualization layer
US7653794B2 (en) * 2006-05-08 2010-01-26 Microsoft Corporation Converting physical machines to virtual machines
US7809769B2 (en) * 2006-05-18 2010-10-05 Google Inc. Database partitioning by virtual partitions
US7849114B2 (en) * 2006-06-19 2010-12-07 International Business Machines Corporation Method, system, and program product for generating a virtual database
US7587563B1 (en) 2006-07-11 2009-09-08 Network Appliance, Inc. Method and system to make a read-only file system appear to be writeable
US7953704B2 (en) 2006-08-18 2011-05-31 Emc Corporation Systems and methods for a snapshot of data
US7836267B1 (en) 2006-08-30 2010-11-16 Barracuda Networks Inc Open computer files snapshot
JP5068062B2 (en) * 2006-10-30 2012-11-07 インターナショナル・ビジネス・マシーンズ・コーポレーション System, method, and program for integrating databases
US8255915B1 (en) * 2006-10-31 2012-08-28 Hewlett-Packard Development Company, L.P. Workload management for computer system with container hierarchy and workload-group policies
US8935206B2 (en) * 2007-01-31 2015-01-13 Hewlett-Packard Development Company, L.P. Snapshots in distributed storage systems
US7792802B1 (en) 2007-03-02 2010-09-07 3Par, Inc. Archiving logs with snapshots for data recovery
US8375461B2 (en) 2007-03-14 2013-02-12 Rovi Solutions Corporation Apparatus for and a method of copy-protecting a content carrying recording medium
US7760643B2 (en) 2007-04-09 2010-07-20 Telcordia Technologies, Inc. Automatic policy change management scheme for DiffServ-enabled MPLS networks
US7953946B2 (en) * 2007-04-16 2011-05-31 Microsoft Corporation Controlled anticipation in creating a shadow copy
US8281293B2 (en) 2007-07-24 2012-10-02 International Business Machines Corporation Copy-on-write optimization of immutable objects for objects oriented languages
US7949692B2 (en) 2007-08-21 2011-05-24 Emc Corporation Systems and methods for portals into snapshot data
US8315999B2 (en) * 2007-08-29 2012-11-20 Nirvanix, Inc. Policy-based file management for a storage delivery network
US7725440B2 (en) 2007-09-26 2010-05-25 Yahoo! Inc. Restoring a database using fuzzy snapshot techniques
US7779051B2 (en) * 2008-01-02 2010-08-17 International Business Machines Corporation System and method for optimizing federated and ETL'd databases with considerations of specialized data structures within an environment having multidimensional constraints
US20090177697A1 (en) * 2008-01-08 2009-07-09 International Business Machines Corporation Correlation and parallelism aware materialized view recommendation for heterogeneous, distributed database systems
JP5288334B2 (en) * 2008-02-04 2013-09-11 日本電気株式会社 Virtual appliance deployment system
US10127059B2 (en) 2008-05-02 2018-11-13 Skytap Multitenant hosted virtual machine infrastructure
US7937548B2 (en) 2008-05-21 2011-05-03 Hewlett-Packard Development Company, L.P. System and method for improved snapclone performance in a virtualized storage system
US20100058106A1 (en) 2008-08-27 2010-03-04 Novell, Inc. Virtual machine file system and incremental snapshot using image deltas
US8732417B1 (en) 2008-10-15 2014-05-20 Symantec Corporation Techniques for creating snapshots of a target system
US9542222B2 (en) * 2008-11-14 2017-01-10 Oracle International Corporation Resource broker system for dynamically deploying and managing software services in a virtual environment based on resource usage and service level agreement
US20100131959A1 (en) * 2008-11-26 2010-05-27 Spiers Adam Z Proactive application workload management
EP2200230B1 (en) * 2008-12-16 2014-03-12 Alcatel Lucent Method and device for performing traffic control in telecommunication networks
US8187191B2 (en) 2009-01-08 2012-05-29 Volcano Corporation System and method for equalizing received intravascular ultrasound echo signals
US8452930B2 (en) 2009-03-27 2013-05-28 Hitachi, Ltd. Methods and apparatus for backup and restore of thin provisioning volume
US8195611B2 (en) 2009-03-31 2012-06-05 International Business Machines Corporation Using a sparse file as a clone of a file
US8478801B2 (en) 2009-05-20 2013-07-02 Vmware, Inc. Efficient reconstruction of virtual disk hierarchies across storage domains
US10120767B2 (en) * 2009-07-15 2018-11-06 Idera, Inc. System, method, and computer program product for creating a virtual database
US8341119B1 (en) 2009-09-14 2012-12-25 Netapp, Inc. Flexible copies having different sub-types
US8356148B2 (en) 2009-09-22 2013-01-15 Lsi Corporation Snapshot metadata management in a storage system
US8244685B2 (en) 2010-02-24 2012-08-14 Autonomy, Inc. Data restoration utilizing forward and backward deltas

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090080398A1 (en) * 1995-06-07 2009-03-26 Mahany Ronald L Hierarchical Communication System Providing Intelligent Data, Program and Processing Migration
US20090292734A1 (en) * 2001-01-11 2009-11-26 F5 Networks, Inc. Rule based aggregation of files and transactions in a switched file system
US20090132611A1 (en) * 2007-11-19 2009-05-21 Douglas Brown Closed-loop system management method and process capable of managing workloads in a multi-system database environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2517115A4 *

Also Published As

Publication number Publication date
US20150312169A1 (en) 2015-10-29
EP2517115A4 (en) 2015-04-01
US9106591B2 (en) 2015-08-11
US20110161973A1 (en) 2011-06-30
US10333863B2 (en) 2019-06-25
EP2517115A1 (en) 2012-10-31

Similar Documents

Publication Publication Date Title
US10333863B2 (en) Adaptive resource allocation based upon observed historical usage
US11073999B2 (en) Extent migration in multi-tier storage systems
US9489137B2 (en) Dynamic storage tiering based on performance SLAs
US8856484B2 (en) Mass storage system and methods of controlling resources thereof
US10191771B2 (en) System and method for resource management
US8682955B1 (en) Fully automated cloud tiering controlled by an orchestration layer based on dynamic information
US8943269B2 (en) Apparatus and method for meeting performance metrics for users in file systems
JP5744707B2 (en) Computer-implemented method, computer program, and system for memory usage query governor (memory usage query governor)
US9317207B2 (en) Cache migration
JP2015517147A (en) System, method and computer program product for scheduling processing to achieve space savings
US10419305B2 (en) Visualization of workload distribution on server resources
US8914582B1 (en) Systems and methods for pinning content in cache
US10489074B1 (en) Access rate prediction in a hybrid storage device
Batsakis et al. Ca-nfs: A congestion-aware network file system
US10761726B2 (en) Resource fairness control in distributed storage systems using congestion data
US9760306B1 (en) Prioritizing business processes using hints for a storage system
US10169817B2 (en) Dynamic storage bandwidth allocation
Singh et al. Microfuge: A middleware approach to providing performance isolation in cloud storage systems
US11940923B1 (en) Cost based cache eviction
US20240069614A1 (en) Cold data storage energy consumption evaluation and response
US20240045698A1 (en) Storage device energy consumption evaluation and response
Ananthanarayanan et al. Big data analytics systems.
Murugan et al. Software Defined Energy Adaptation in Scale-Out Storage Systems
US10776030B2 (en) Quota arbitration of a distributed file system
JP2013205962A (en) Subsystem control node, control method for subsystem, program, and service provision system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10839995

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010839995

Country of ref document: EP