US20150169619A1 - System and method for creating storage containers in a data storage system - Google Patents

System and method for creating storage containers in a data storage system Download PDF

Info

Publication number
US20150169619A1
US20150169619A1 US14/562,611 US201414562611A US2015169619A1 US 20150169619 A1 US20150169619 A1 US 20150169619A1 US 201414562611 A US201414562611 A US 201414562611A US 2015169619 A1 US2015169619 A1 US 2015169619A1
Authority
US
United States
Prior art keywords
data storage
data
storage container
records
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/562,611
Inventor
Spencer Eldon Pingry
Jonathan Bartholomew Mulieri
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Optimizely North America Inc
Original Assignee
Zaius Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zaius Inc filed Critical Zaius Inc
Priority to US14/562,611 priority Critical patent/US20150169619A1/en
Publication of US20150169619A1 publication Critical patent/US20150169619A1/en
Priority to US16/204,008 priority patent/US11249991B2/en
Assigned to Zaius, Inc. reassignment Zaius, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Mulieri, Jonathan Bartholomew, Pingry, Spencer Eldon
Assigned to OPTIMIZELY NORTH AMERICA INC. reassignment OPTIMIZELY NORTH AMERICA INC. MERGER (SEE DOCUMENT FOR DETAILS). Assignors: ZAUIS, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F17/30182
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F17/30424

Definitions

  • the invention is generally related to data storage and more particularly, to a highly scalable, highly available online data storage system.
  • a method for creating storage containers in a data storage system receives, via a computing processor, a new data record to be stored in a data storage container, the data storage container configured to store a fixed number of stored data records, the data storage container storing a plurality of stored data records; determines whether a number of the plurality of stored data records in the data storage container is within a certain threshold of the fixed number of stored data records for the data storage container; for the data storage container that resides in a sequential data space: opens a new data storage container, stores the new data record in the new data storage container, and closes the data storage container to new data records; and for the data storage container that resides in a finite data space: opens a new data storage container, splits the plurality of stored data records in the data storage container into a first plurality of stored data records and a second plurality of stored data records, wherein each of the first plurality of stored data records has a data value within a first range, and wherein each of the second plurality of
  • FIG. 1 illustrates a data storage system according to various implementations of the invention.
  • FIG. 2 illustrates an operation of a data storage system according to various implementations of the invention.
  • FIG. 3 illustrates a spill mechanism over to a new data storage container once an existing data storage container reaches its capacity according to various implementations of the invention.
  • FIGS. 4 and 5 illustrate a split mechanism over to one or more new data storage container(s) once an existing data storage container reaches its capacity according to various implementations of the invention.
  • FIG. 6 illustrates an operation of a data storage system as data storage containers approach capacity according to various implementations of the invention.
  • FIG. 7 illustrates a data storage system including a number of data storage containers hosted by a number of data storage assets according to various implementations of the invention.
  • FIG. 8 illustrates an operation of a load balancing mechanism for data storage system according to various implementations of the invention.
  • FIG. 1 illustrates a data storage system 100 according to various implementations of the invention.
  • Data storage system 100 includes a processor 120 and at least two data storage containers, illustrated in FIG. 1 as a first data storage container 135 in a first data storage space 130 and a second data storage container 155 in a second data storage space 150 .
  • a data record 110 is stored in both first data storage container 135 and second data storage container 155 based on one or more data values in data record 110 as will be described in further detail below.
  • Data storage containers 135 , 155 and data storage spaces 130 , 150 refer to logical data storage elements which may be stored on one or more physical data storage assets (not otherwise illustrated in FIG. 1 ).
  • physical data storage assets may include, but are not limited to servers, disks, memories, other non-transitory computer readable media, or other physical data storage assets including banks or farms of such physical data storage assets.
  • processor 120 may be any general purpose hardware computing processor configured via various executable programming instructions stored internally to or externally from processor 120 in a computer readable medium, where when such programming instructions are executed by the computing processor, they cause the computing processor to perform various functions as would be appreciated. When configured with such programming instructions, the general purpose hardware computing processor becomes a particular processor that performs functions attributed to processor 120 as described herein. According to various implementations of the invention, processor 120 may be a single hardware computing processor or a plurality of hardware computing processors.
  • processor 120 may be a dedicated hardware computing processor configured to perform various functions of processor 120 as described herein or a plurality of hardware computing processors distributed throughout data storage system 100 , each configured to perform one or more of the functions of processor 100 as described herein.
  • first data storage space 130 and second data storage space 150 define separate spaces for aggregating, organizing and storing data records in a manner that optimizes responses to queries that may be applied against the data records in the respective data storage spaces 130 , 150 .
  • first data storage space 130 aggregates, organizes and/or stores data records in data storage container 135 based on a sequentially changing data value in each of the data records. Such sequentially changing data value may be a sequentially increasing data value or a sequentially decreasing data value.
  • data records with sequentially increasing data values are stored in increasing sequential order; and data records with sequentially decreasing data values are stored in decreasing sequential order.
  • This first data storage space is sometimes referred to as a “sequential data storage space,” or as will become apparent below, a “spill space.”
  • a date-time stamp in a data record is a sequentially increasing data value—date-time stamps in data records will have progressively greater data values over time.
  • Other sequentially increasing data values may include, but are not limited to, other temporal data values (i.e., time-based data values other than date-time stamps), transaction numbers, order numbers, or similar numeric orderings of data values in the data records, or other sequentially increasing data values.
  • second data storage space 150 aggregates, organizes and/or stores data records in data storage container 155 based on a data value in the data record that resides in a “finite” space, and not “infinite” such as the sequentially changing data value.
  • a social security number is a data value that resides in a finite space as it has a finite range of values, namely 000-00-0000 to 999-99-9999.
  • a last name stored as alphabetic characters in a fixed data field is a data value that resides in a finite space as it has a finite range of values, namely “A” to “ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ”
  • a hash function computes a hash value based on one or more data values in a data record; the hash value typically has a fixed number of bits and hence resides in a finite space as would be appreciated.
  • This second data storage space is sometimes referred to as a “finite data storage space,” or as will become apparent below, a “split space.”
  • second data storage space 150 aggregates, organizes and/or stores data records in data storage container 155 based on those data records sharing a common data value (or sharing a common range of data values) that resides in the finite space such as, but not limited to, those data records sharing the same user name, social security number, website visited, or other common data value in the finite space.
  • second data storage space 150 aggregates, organizes and/or stores data records that share the common data value (or common range of data values) in data storage container 155 based on an ordering of another data value in the data record, typically, a date-time stamp (or similar time-based value), although any other ordering may be used as would be appreciated.
  • data records 140 stored in data storage container 135 are organized based on sequentially increasing data values x i , x i+1 , xi+ 2 , . . . x i+n in first data storage space 130 .
  • x i ⁇ x i+1 ⁇ x i+2 ⁇ . . . ⁇ X i+n .
  • each data record in plurality of data records 140 also includes a data value that is not a sequentially increasing data value (e.g., as illustrated, a , b , c . . . ).
  • data records 140 in data storage container 135 are ordered strictly based on the sequentially increasing data values.
  • data records 140 in data storage container 135 are ordered loosely based on the sequentially increasing data values based on when such data records are received and stored; in other words, some of the data records 140 may be received “out of order” such that the data record having the data value x i+2 is stored in the data base within a few records before the data record having the data value x i+1 . In some implementations of the invention, this minor misordering is tolerated because “append” operations require less processing than “insert” operations as would be appreciated and can be addressed when responding to queries. In some implementations, data records 140 received out of order are re-ordered, either periodically, or as they are received, as would be appreciated.
  • data records 160 stored in data storage container 155 are organized based on a common data value or common range of data values in the finite space of second data storage space 150 .
  • data records 160 are ordered based on the common data value or within the common range of data values.
  • the data records may be ordered, for example, alphabetically or following some other ordering.
  • data records 160 are ordered based on when such data records are received. In these implementations, the data records are ordered based on their receipt by data storage system 100 and not necessarily following some other ordering.
  • One benefit of various implementations of the invention is that having and maintaining separate data storage containers 135 , 155 from separate data storage spaces 130 , 150 , respectively, improves response to different types of queries. For example, when the sequentially changing data value of first data storage container 135 is a date-time stamp, data storage container 135 more readily services temporal-based queries. More particularly, data records before or after a certain time or within or outside of a certain time range are more readily made against a data storage that organizes its data records based on time. In essence, data storage container 135 is “tuned” to temporal-base queries.
  • data storage system 100 includes a plurality of data storages 135 in different data storage spaces 130 to aggregate, organize and/or store data records based on one or more different sequentially changing data value(s) in the data records.
  • data storage system 100 includes a plurality of data storages 155 in different data storage spaces 150 to aggregate, organize and/or store data records based on the one or more different data value(s) in the data records, where such different data value(s) reside in a finite data space.
  • the number and/or characteristics of data storages 135 , 155 may be selected based on the expected types of queries applied against data storage system 100 as would be appreciated.
  • processor 120 receives a query 170 from, for example, a user of data storage system 100 .
  • processor 120 determines which data storage space 130 , 150 would be best to respond to query 170 . For example, if query 170 is based on a sequentially changing data value such as a date-time stamp, for example, query 170 may be applied against first data storage container 135 . If query 170 is based on a data value in a finite data space such as a hash value, query 170 may be applied against second data storage container 155 , for example.
  • FIG. 2 illustrates an operation 200 of data storage system 100 according to various implementations of the invention.
  • processor 120 receives new data record 110 .
  • processor 120 causes new data record 110 to be stored in data storage container 135 in first data storage space 130 .
  • processor 120 causes new data record 110 to be stored in data storage container 155 in second data storage space 150 .
  • query 170 which is to be applied against data storage system 100 , is received.
  • a decision operation 250 determines whether query 170 is best evaluated in the sequential data storage space or in the finite data storage space.
  • query 170 is applied against data storage container 135 in first data storage space 130 and a retrieved plurality of data records corresponding to query 170 is retrieved from data storage container 135 . If query 170 is best evaluated in the finite data storage space, in an operation 270 , query 170 is applied against data storage container 155 in second data storage space 150 and a retrieved plurality of data records corresponding to query 170 is retrieved from data storage container 155 .
  • data storage containers 135 , 155 may eventually reach their respective capacities. Some implementations of the invention provide a mechanism for handling additional new records as data storage containers 135 , 155 approach their respective capacities.
  • new data storage containers 135 N, 155 N are created, opened or otherwise brought on-line in data storage system 100 based on whether existing data storage container 135 E, 155 E that is approaching capacity corresponds to first data storage space 130 or second data storage space 150 .
  • data storage containers 135 , 155 typically have a predetermined capacity, either physically or logically, as would be appreciated.
  • each data storage container 135 , 155 may store twenty (20) million data records or other number of data records.
  • a new data storage container 135 N, 155 N must be brought online (e.g., created, instantiated, etc.) to aggregate, organize and store new data records.
  • a new data storage container 135 N is brought online to store additional new data records. Because data records in data storage space 130 are stored sequentially based on a data value in the data record, the new data record (and each subsequent one thereafter until the new data storage container 135 N reaches its capacity) is simply stored in the new data storage container 135 N. According to various implementations of the invention, when data storage container 135 E reaches its capacity, the new data record in data storage space 130 “spills” into a new data storage container 135 N.
  • one or more new data storage containers 155 N are brought online to accommodate the new data records. Because data records in data storage space 150 are based on data values in the data record that reside in a finite data space, the finite data space is “split” into one or more contiguous “subspaces,” each of which is stored in its own data storage container 155 E, 155 N. For example, if existing data storage container 155 E presently stores data records based on a user last name and the entire finite space resides in data storage container 155 (i.e., last names beginning with ‘A’ to ‘Z’), the entire finite space is split, into one or more contiguous subspaces.
  • data records 160 A having user last names beginning with ‘A’ to ‘M’ might be stored on the existing data storage container 155 E and data records 160 B having user last names beginning with ‘N’ to ‘Z’ might be stored on the new data storage container 155 N.
  • data records 160 A having user last names beginning with ‘A’ to ‘M’ might be stored on the existing data storage container 155 E and data records 160 B having user last names beginning with ‘N’ to ‘Z’ might be stored on the new data storage container 155 N.
  • data records 160 A having user last names beginning with ‘A’ to ‘H’ might be stored on the existing data storage container 155 E
  • data records 160 B having user last names beginning with ‘I’ to ‘P’ might be stored on a first new data storage container 155 N i
  • data records 160 C having user last names beginning with ‘Q’ to ‘Z’ are stored on a second new data storage container 155 N 1 .
  • other numbers of subspaces may be used as well as different partitions for the subspaces.
  • the data storage containers 155 E, 155 N,, and their respective subspaces themselves may be further partitioned as they reach their respective capacities into one or more sub-subspaces as would be appreciated.
  • the new data record forces data storage space 150 may be “split” into one or more data storage subspaces 150 , each stored in a separate data storage container 155 as would be appreciated.
  • each of the data records 160 in the existing data storage container 155 is positioned or moved to the appropriate data storage container 155 E, 155 N i , after the data storage space 150 is split into one or more subspaces as discussed above.
  • FIG. 6 illustrates an operation of a data storage system as data storage containers approach capacity according to various implementations of the invention.
  • data storage system 100 receives a new data record 110 to be stored in one or more data storage containers 135 E, 155 E.
  • data storage system 100 determines whether a particular data storage container 135 E, 155 E is at capacity.
  • data storage system 100 determines whether the particular data storage container 135 E, 155 E stores data records based on a sequentially changing data value in the data records (i.e., a “spill” storage container) or whether the particular data storage container 135 E, 155 E stores data records based on a data value residing in a finite data space (i.e., a “split” storage container). If the particular data storage container 135 E, 155 E is a “spill” storage container, in an operation 640 , new data record “spills” over to a new data storage container 135 N.
  • the particular data storage container 135 E, 155 E is a “split” storage container
  • the plurality of stored data records 160 in existing data storage container 155 E is “split” with new data storage container 155 N and in an operation 660 , new data record 110 is added to either existing data storage container 155 E or new data storage container 155 N as discussed above.
  • data storage system 100 may provide a load balancing mechanism for distributing data storage containers 135 E, 135 N, 155 E, and 155 N i across a plurality of data storage assets.
  • FIG. 7 illustrates data storage system 100 in different detail for purposes of describing this load balancing mechanism in accordance with various implementations of the invention.
  • Data storage system 100 includes processor 120 and a plurality of data storage assets 710 (illustrated as a data storage asset 710 A, a data storage asset 710 B, a data storage asset 710 C, and a data storage asset 710 D).
  • each data storage asset 710 comprises physical hardware configured to host one or more data storage containers 720 (illustrated in FIG.
  • data storage asset 710 A hosting a data storage container 720 A 1 , a data storage container 720 A 2 , . . . and a data storage container 720 A m
  • data storage asset 710 B hosting a data storage container 720 B 1 , a data storage container 720 B 2 , . . . and a data storage container 720 B n
  • data storage asset 710 C hosting a data storage container 720 C 1 , a data storage container 7200 2 , . . . and a data storage container 720 C p
  • data storage asset 710 D hosting a data storage container 720 D 1 , a data storage container 720 D 2 , . . .
  • Data storage containers 720 collectively refer to the various data storage containers 135 (including data storage containers 135 N and data storage containers 135 E) and data storage containers 155 (including data storage containers 155 N and data storage containers 155 E) described above.
  • data storage system 100 distributes data storage containers 720 across data storage assets to provide high availability to data records 110 stored in data storage containers 720 and/or to provide fast response time to queries made against these data records 110 .
  • data storage system 100 provides high availability and fast response time to a plurality of customers spread across data storage assets 710 without having to dedicate individual data storage assets to a given customer. By spreading out data records across data storage assets 710 , queries may become highly parallelized thereby reducing response time to the queries while simultaneously reducing an overall number of data storage assets 710 required by data storage system to meet a given performance level.
  • each data storage container 720 may have one or more corresponding replica data storage container (not otherwise illustrated) that mirrors data storage container 720 for purposes of redundancy and backup.
  • replica data storage containers may also be used to provide further parallelization of data storage system 100 .
  • replica data storage containers may be used to directly respond to the query, or portions of the query, to increase the number of potential data storage assets 710 responding to the query. This ensures that if the data storage asset 710 hosting a particular data storage container 720 is busy responding to an unrelated query, the data storage asset 710 hosting a replica of the particular data storage container 720 may respond to the query.
  • each data storage container 720 is associated with one replica data storage container.
  • each data storage container is associated with two or more replica data storage containers.
  • a number of replicas utilized by data storage system 100 is based, in part, on necessary performance requirements of data storage system 100 and costs of data storage assets 710 as would be appreciated.
  • data storage containers and replica data storage containers operate within data storage system 100 in a similar manner. In other words, no distinction is made by data storage system 100 as to whether any given data storage container 720 is a replica, and for purposes of this description, data storage containers and their replicas are referred to collectively as data storage containers 720 .
  • data storage system 100 may distribute data storage containers 720 across data storage assets 710 .
  • a new data storage container 720 such as discussed above with regard to new data storage containers 135 N, 155 N
  • data storage system 100 via processor 120 , determines which of data storage assets 710 hosts data storage containers 720 that are “farthest away” from the data storage container 720 that reached its capacity (i.e., data storage container 135 E, 155 E). This ensures that a given data storage asset 710 does not host contiguous data storage containers 720 .
  • a new data storage container 720 should preferably not be hosted by the same data storage asset 710 as the existing data storage container 720 .
  • a data storage container 720 and its replica should preferably not be hosted together by the same data storage asset 710 .
  • the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts “the farthest of the closest” data storage container 720 (described in further detail below). In some implementations of the invention, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts the fewest number of data storage containers 720 or that hosts the least amount of data or data records 110 . Other mechanisms for determining which data storage asset 710 is deemed “farthest away” may be used as would be appreciated.
  • additional information based on the data records 110 and/or data storage containers 720 is used to determine which data storage asset 710 is deemed “farthest away.” For example, in some implementations where data storage system 100 hosts data for two or more customers, the data storage asset 710 deemed “farthest away” is determined relative to data storage containers 720 for the relevant customer. This ensures that a given customers' data records 110 (and their data storage containers 720 ) are distributed throughout data storage system 100 . Thus, in some implementations of the invention, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts “a farthest of the closest” data storage containers 720 for the particular customer.
  • the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts the fewest number of data storage containers 720 for the particular customer or that hosts the least amount of data or data records 110 for the particular customer.
  • Determining which data storage asset 710 is deemed “farthest away” may rely on other information from data records 110 in addition to or instead of the particular customer associated with the data records 110 as in the example described above. Such information may include, but is not limited to, a buyer, a website host, a website owner, a website user, or other information by which data records 110 can be discriminated.
  • determining which data storage asset 710 is deemed “farthest away” may rely on whether data storage container 720 is a split space data storage container or a spill space data storage container.
  • the data storage asset 710 deemed “farthest away” from an existing split space data storage container 720 i.e., data storage container 155 E
  • the data storage asset 710 deemed “farthest away” from an existing split space data storage container 720 may be selected by determining which data storage asset 710 hosts the fewest number of data storage containers 720 or stores the least amount of data or data records 110 ;
  • the data storage asset 710 deemed “farthest away” from an existing spill space data storage container i.e., data storage container 135 E
  • the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts “the farthest of the closest” data storage container 720 . In some implementations of the invention, this may be accomplished by first determining, for each of data storage assets 710 , which data storage container 720 hosted on the respective data storage asset 710 is closest, in terms of sequential range, to the existing data storage container 720 .
  • each data storage container in a sequential data storage space stores a sub-range of the data values in the sequential data space. These sub-ranges are more or less “distant” from the sub-range of the existing storage container 720 (for example, by the sum of the sub-ranges that lie in between).
  • the data storage containers 720 on each data storage asset 710 are ordered based on the relative “distance” of their respective sub-ranges to that of the existing data storage container 720 from closest to farthest. More specifically, data storage containers 720 A 1 , 720 A 2 , . . .
  • data storage containers 720 B 1 , 720 B 2 , . . . , and 720 B n hosted on data storage asset 710 B are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest;
  • data storage containers 7200 , 7200 2 , . . . , 7200 p hosted on data storage asset 7100 are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest; and data storage containers 720 D 1 , 720 D 2 , . .
  • 720 D q hosted on data storage asset 710 D are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest.
  • the farthest of these from the existing data storage container is then determined.
  • the data storage asset 710 that hosts the farthest of the closest data storage containers may be selected to host the new data storage container 720 and a new data storage container 720 may be created on the selected data storage asset 710 .
  • two or more data storage assets 710 may be determined as hosting the farthest of the closest data storage container 720 .
  • the data storage asset 710 that stores the least amount of data or data records, or that hosts the fewest number of data storage containers 720 is determined between these two or more data storage assets 710 .
  • the data storage asset 710 that stores the least amount of data or data records, or that hosts the fewest number of data storage containers 710 may be selected to host the new data storage container 720 and a new data storage container 720 may be created on the selected data storage asset 710 .
  • two or more data storage assets 710 may be determined as storing the least amount of data or data records 110 , or that host the fewest number of data storage containers. In such situations, any one of the data storage assets 710 may be selected to host the new data storage container 720 and a new data storage container 720 may be created on the selected data storage asset 710 .
  • FIG. 8 illustrates an operation 800 of a load balancing mechanism for data storage system 100 according to various implementations of the invention.
  • processor 120 determines whether to create a new spill space container 135 N or a new split space container 155 N. If a new split space container 155 N is to be created, processing continues at an operation 820 ; if a new spill space container 135 N, processing continues at an operation 850 .
  • processor 120 determines which data storage asset(s) 710 hosts the fewest number of data storage containers 720 . In a decision operation 830 , processor 120 determines whether two or more data storage assets 710 host the fewest number of data storage containers 720 . If so, processing continues at an operation 835 . If not, processing continues at an operation 890 .
  • processor 120 selects one of the two or more data storage assets 710 to host the new split space data storage container 155 N.
  • processor 120 creates a new data storage container 720 (corresponding to new data storage container 155 N) on the selected data storage asset 710 .
  • processor 120 selects the single determined data storage asset 710 to host the data storage container 720 , and processing continues at operation 840 .
  • processor 120 determines which data storage asset(s) 710 hosts the farthest of the closest data storage containers 710 .
  • a decision operation 860 if two or more data storage assets 710 host the farthest of the closest data storage container 710 , then processing continues at an operation 870 ; otherwise processing continues at operation 890 .
  • processor 120 determines which data storage asset(s) 710 hosts the fewest number of data storage containers 710 .
  • a decision operation 880 if two or more data storage assets 710 host the fewest number of data storage containers 710 , then processing continues at an operation 835 where one of them is selected; otherwise processing continues at operation 890 .

Abstract

Various implementations of the invention create storage containers in a data storage system. A computing processor receives a new data record to be stored in a data storage container which is configured to store a fixed number of stored data records. The computing processor determines whether a number of the plurality of stored data records in the data storage container is within a certain threshold of the fixed number of stored data records for the data storage container. For data storage containers residing in a sequential data space, when the number of records is within the certain threshold, the computing processor opens a new data storage container, stores the new data record in the new data storage container, and closes the data storage container to new data records. For the data storage containers residing in a finite data space, when the number of records is within the certain threshold, the computing processor opens a new data storage container, splits the plurality of stored data records in the data storage container into a first plurality of stored data records and a second plurality of stored data records, where each of the first plurality of stored data records has a data value within a first range, and where each of the second plurality of stored data records has the data value within a second range. The computing processor stores the first plurality of stored data records in the data storage container, stores the second plurality of the stored data records in the new data storage container. The computing processors stores the new data record in either the data storage container or the new data storage container based on whether the data value in the new data record corresponds to the first range or the second range.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Patent Application No. 61/913,227, which was filed on Dec. 6, 2013, and is incorporated herein by reference as if reproduced below in its entirety.
  • FIELD OF THE INVENTION
  • The invention is generally related to data storage and more particularly, to a highly scalable, highly available online data storage system.
  • BACKGROUND OF THE INVENTION
  • Various conventional data storage systems attempt to manage both scalability and availability of online data storage assets. However, these conventional data storage systems typically are either overly complex or dramatically over specify the number of data storage assets required by the system.
  • Furthermore, these conventional data storage systems are typically cumbersome to query, particularly when configured to optimize aspects of the data storage system rather than ease of querying.
  • What are needed are improved systems and methods for storing and retrieving data, especially in an online, real-time data storage system. What are further needed are such systems that are optimized for various forms of querying.
  • SUMMARY OF THE INVENTION
  • In various implementations of the invention, a method for creating storage containers in a data storage system receives, via a computing processor, a new data record to be stored in a data storage container, the data storage container configured to store a fixed number of stored data records, the data storage container storing a plurality of stored data records; determines whether a number of the plurality of stored data records in the data storage container is within a certain threshold of the fixed number of stored data records for the data storage container; for the data storage container that resides in a sequential data space: opens a new data storage container, stores the new data record in the new data storage container, and closes the data storage container to new data records; and for the data storage container that resides in a finite data space: opens a new data storage container, splits the plurality of stored data records in the data storage container into a first plurality of stored data records and a second plurality of stored data records, wherein each of the first plurality of stored data records has a data value within a first range, and wherein each of the second plurality of stored data records has the data value within a second range, stores the first plurality of stored data records in the data storage container, stores the second plurality of the stored data records in the new data storage container, and stores the new data record in either the data storage container or the new data storage container based on whether the data value in the new data record corresponds to the first range or the second range.
  • These implementations, their features and other aspects of the invention are described in further detail below.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a data storage system according to various implementations of the invention.
  • FIG. 2 illustrates an operation of a data storage system according to various implementations of the invention.
  • FIG. 3 illustrates a spill mechanism over to a new data storage container once an existing data storage container reaches its capacity according to various implementations of the invention.
  • FIGS. 4 and 5 illustrate a split mechanism over to one or more new data storage container(s) once an existing data storage container reaches its capacity according to various implementations of the invention.
  • FIG. 6 illustrates an operation of a data storage system as data storage containers approach capacity according to various implementations of the invention.
  • FIG. 7 illustrates a data storage system including a number of data storage containers hosted by a number of data storage assets according to various implementations of the invention.
  • FIG. 8 illustrates an operation of a load balancing mechanism for data storage system according to various implementations of the invention.
  • DETAILED DESCRIPTION
  • FIG. 1 illustrates a data storage system 100 according to various implementations of the invention. Data storage system 100 includes a processor 120 and at least two data storage containers, illustrated in FIG. 1 as a first data storage container 135 in a first data storage space 130 and a second data storage container 155 in a second data storage space 150. According to various implementations of the invention, a data record 110 is stored in both first data storage container 135 and second data storage container 155 based on one or more data values in data record 110 as will be described in further detail below. Data storage containers 135, 155 and data storage spaces 130, 150 refer to logical data storage elements which may be stored on one or more physical data storage assets (not otherwise illustrated in FIG. 1). According to various implementations of the invention, physical data storage assets may include, but are not limited to servers, disks, memories, other non-transitory computer readable media, or other physical data storage assets including banks or farms of such physical data storage assets.
  • According to various implementations of the invention, processor 120 may be any general purpose hardware computing processor configured via various executable programming instructions stored internally to or externally from processor 120 in a computer readable medium, where when such programming instructions are executed by the computing processor, they cause the computing processor to perform various functions as would be appreciated. When configured with such programming instructions, the general purpose hardware computing processor becomes a particular processor that performs functions attributed to processor 120 as described herein. According to various implementations of the invention, processor 120 may be a single hardware computing processor or a plurality of hardware computing processors. According to various implementations of the invention, processor 120 may be a dedicated hardware computing processor configured to perform various functions of processor 120 as described herein or a plurality of hardware computing processors distributed throughout data storage system 100, each configured to perform one or more of the functions of processor 100 as described herein.
  • According to various implementations of the invention, first data storage space 130 and second data storage space 150 define separate spaces for aggregating, organizing and storing data records in a manner that optimizes responses to queries that may be applied against the data records in the respective data storage spaces 130, 150. According to various implementations of the invention, first data storage space 130 aggregates, organizes and/or stores data records in data storage container 135 based on a sequentially changing data value in each of the data records. Such sequentially changing data value may be a sequentially increasing data value or a sequentially decreasing data value. In some implementations of the invention, data records with sequentially increasing data values are stored in increasing sequential order; and data records with sequentially decreasing data values are stored in decreasing sequential order. This first data storage space is sometimes referred to as a “sequential data storage space,” or as will become apparent below, a “spill space.”
  • For example, a date-time stamp in a data record is a sequentially increasing data value—date-time stamps in data records will have progressively greater data values over time. Other sequentially increasing data values may include, but are not limited to, other temporal data values (i.e., time-based data values other than date-time stamps), transaction numbers, order numbers, or similar numeric orderings of data values in the data records, or other sequentially increasing data values.
  • According to various implementations of the invention, second data storage space 150 aggregates, organizes and/or stores data records in data storage container 155 based on a data value in the data record that resides in a “finite” space, and not “infinite” such as the sequentially changing data value. For example, a social security number is a data value that resides in a finite space as it has a finite range of values, namely 000-00-0000 to 999-99-9999. As another example, a last name stored as alphabetic characters in a fixed data field (e.g., 15 characters wide, etc.) is a data value that resides in a finite space as it has a finite range of values, namely “A” to “ZZZZZZZZZZZZZZZ” As another example, a hash function computes a hash value based on one or more data values in a data record; the hash value typically has a fixed number of bits and hence resides in a finite space as would be appreciated. This second data storage space is sometimes referred to as a “finite data storage space,” or as will become apparent below, a “split space.”
  • In some implementations of the invention, second data storage space 150 aggregates, organizes and/or stores data records in data storage container 155 based on those data records sharing a common data value (or sharing a common range of data values) that resides in the finite space such as, but not limited to, those data records sharing the same user name, social security number, website visited, or other common data value in the finite space. In some implementations of the invention, second data storage space 150 aggregates, organizes and/or stores data records that share the common data value (or common range of data values) in data storage container 155 based on an ordering of another data value in the data record, typically, a date-time stamp (or similar time-based value), although any other ordering may be used as would be appreciated.
  • For example, data records 140 stored in data storage container 135 are organized based on sequentially increasing data values xi, xi+1, xi+2, . . . xi+n in first data storage space 130. In other words, xi<xi+1<xi+2 <. . . <Xi+n. As illustrated, each data record in plurality of data records 140 also includes a data value that is not a sequentially increasing data value (e.g., as illustrated, a , b , c . . . ). In some implementations, data records 140 in data storage container 135 are ordered strictly based on the sequentially increasing data values. In some implementations, data records 140 in data storage container 135 are ordered loosely based on the sequentially increasing data values based on when such data records are received and stored; in other words, some of the data records 140 may be received “out of order” such that the data record having the data value xi+2 is stored in the data base within a few records before the data record having the data value xi+1. In some implementations of the invention, this minor misordering is tolerated because “append” operations require less processing than “insert” operations as would be appreciated and can be addressed when responding to queries. In some implementations, data records 140 received out of order are re-ordered, either periodically, or as they are received, as would be appreciated.
  • According to various implementations of the invention, data records 160 stored in data storage container 155 are organized based on a common data value or common range of data values in the finite space of second data storage space 150. In some implementations, data records 160 are ordered based on the common data value or within the common range of data values. In these implementations, the data records may be ordered, for example, alphabetically or following some other ordering. In some implementations of the invention, data records 160 are ordered based on when such data records are received. In these implementations, the data records are ordered based on their receipt by data storage system 100 and not necessarily following some other ordering.
  • One benefit of various implementations of the invention is that having and maintaining separate data storage containers 135, 155 from separate data storage spaces 130, 150, respectively, improves response to different types of queries. For example, when the sequentially changing data value of first data storage container 135 is a date-time stamp, data storage container 135 more readily services temporal-based queries. More particularly, data records before or after a certain time or within or outside of a certain time range are more readily made against a data storage that organizes its data records based on time. In essence, data storage container 135 is “tuned” to temporal-base queries. While the same query could be made against data storage container 155, such a query would undoubtedly take longer to service as each data record would have to be evaluated based on the temporal based query. Similarly, when the other data value is, for example, a user name, and data storage container 155 stores its plurality of data records 160 based on user name, data queries based on user name are more readily serviced by such data storage container 155 as would be appreciated.
  • In some implementations of the invention, data storage system 100 includes a plurality of data storages 135 in different data storage spaces 130 to aggregate, organize and/or store data records based on one or more different sequentially changing data value(s) in the data records. In some implementations of the invention, data storage system 100 includes a plurality of data storages 155 in different data storage spaces 150 to aggregate, organize and/or store data records based on the one or more different data value(s) in the data records, where such different data value(s) reside in a finite data space. In some implementations, the number and/or characteristics of data storages 135, 155 may be selected based on the expected types of queries applied against data storage system 100 as would be appreciated.
  • According to various implementations of the invention, processor 120 receives a query 170 from, for example, a user of data storage system 100. According to various implementations of the invention, processor 120 determines which data storage space 130, 150 would be best to respond to query 170. For example, if query 170 is based on a sequentially changing data value such as a date-time stamp, for example, query 170 may be applied against first data storage container 135. If query 170 is based on a data value in a finite data space such as a hash value, query 170 may be applied against second data storage container 155, for example.
  • FIG. 2 illustrates an operation 200 of data storage system 100 according to various implementations of the invention. In an operation 210, processor 120 receives new data record 110. In an operation 220, processor 120 causes new data record 110 to be stored in data storage container 135 in first data storage space 130. In an operation 230, processor 120 causes new data record 110 to be stored in data storage container 155 in second data storage space 150. In an operation 240, query 170, which is to be applied against data storage system 100, is received. A decision operation 250 determines whether query 170 is best evaluated in the sequential data storage space or in the finite data storage space. If query 170 is best evaluated in the sequential data storage space, in an operation 260, query 170 is applied against data storage container 135 in first data storage space 130 and a retrieved plurality of data records corresponding to query 170 is retrieved from data storage container 135. If query 170 is best evaluated in the finite data storage space, in an operation 270, query 170 is applied against data storage container 155 in second data storage space 150 and a retrieved plurality of data records corresponding to query 170 is retrieved from data storage container 155.
  • As new data records are added to data storage system 100, data storage containers 135, 155 may eventually reach their respective capacities. Some implementations of the invention provide a mechanism for handling additional new records as data storage containers 135, 155 approach their respective capacities. In reference to FIGS. 3, 4 and 5 and in accordance with various implementations of the invention, new data storage containers 135N, 155N are created, opened or otherwise brought on-line in data storage system 100 based on whether existing data storage container 135E, 155E that is approaching capacity corresponds to first data storage space 130 or second data storage space 150.
  • According to various implementations of the invention, data storage containers 135, 155 typically have a predetermined capacity, either physically or logically, as would be appreciated. For example, each data storage container 135, 155 may store twenty (20) million data records or other number of data records. When an existing data storage container 135E, 155E fills, a new data storage container 135N, 155N must be brought online (e.g., created, instantiated, etc.) to aggregate, organize and store new data records.
  • According to various implementations of the invention, when data storage container 135E reaches its capacity (i.e., at capacity, within a certain threshold of its capacity, etc.), a new data storage container 135N is brought online to store additional new data records. Because data records in data storage space 130 are stored sequentially based on a data value in the data record, the new data record (and each subsequent one thereafter until the new data storage container 135N reaches its capacity) is simply stored in the new data storage container 135N. According to various implementations of the invention, when data storage container 135E reaches its capacity, the new data record in data storage space 130 “spills” into a new data storage container 135N.
  • According to various implementations of the invention, when existing data storage container 155E reaches its capacity, one or more new data storage containers 155N are brought online to accommodate the new data records. Because data records in data storage space 150 are based on data values in the data record that reside in a finite data space, the finite data space is “split” into one or more contiguous “subspaces,” each of which is stored in its own data storage container 155E, 155N. For example, if existing data storage container 155E presently stores data records based on a user last name and the entire finite space resides in data storage container 155 (i.e., last names beginning with ‘A’ to ‘Z’), the entire finite space is split, into one or more contiguous subspaces. For example, as illustrated in FIG. 4, if the finite space is split into two subspaces (illustrated as a subspace 150A and a subspace 150B), data records 160A having user last names beginning with ‘A’ to ‘M’ might be stored on the existing data storage container 155E and data records 160B having user last names beginning with ‘N’ to ‘Z’ might be stored on the new data storage container 155N. In another example illustrated in FIG. 5, if the finite space is split into three contiguous subspaces (illustrated as a subspace 150A, a subspace 150B, and a subspace 150C), data records 160A having user last names beginning with ‘A’ to ‘H’ might be stored on the existing data storage container 155E, data records 160B having user last names beginning with ‘I’ to ‘P’ might be stored on a first new data storage container 155Ni, and data records 160C having user last names beginning with ‘Q’ to ‘Z’ are stored on a second new data storage container 155N1. In some implementations of the invention, other numbers of subspaces may be used as well as different partitions for the subspaces. In some implementations, the data storage containers 155E, 155N,, and their respective subspaces themselves may be further partitioned as they reach their respective capacities into one or more sub-subspaces as would be appreciated. According to various implementations of the invention, when data storage container 155 reaches its capacity, the new data record forces data storage space 150 may be “split” into one or more data storage subspaces 150, each stored in a separate data storage container 155 as would be appreciated.
  • In some implementations of the invention, in order to accommodate a split of a data storage space 150 into one or more subspaces, each residing in a separate data storage container 155E, 155N,, each of the data records 160 in the existing data storage container 155 is positioned or moved to the appropriate data storage container 155E, 155Ni, after the data storage space 150 is split into one or more subspaces as discussed above.
  • FIG. 6 illustrates an operation of a data storage system as data storage containers approach capacity according to various implementations of the invention. In an operation 610, data storage system 100 receives a new data record 110 to be stored in one or more data storage containers 135E, 155E. In a decision operation 620, data storage system 100 determines whether a particular data storage container 135E, 155E is at capacity. In a decision operation 630, data storage system 100 determines whether the particular data storage container 135E, 155E stores data records based on a sequentially changing data value in the data records (i.e., a “spill” storage container) or whether the particular data storage container 135E, 155E stores data records based on a data value residing in a finite data space (i.e., a “split” storage container). If the particular data storage container 135E, 155E is a “spill” storage container, in an operation 640, new data record “spills” over to a new data storage container 135N. If the particular data storage container 135E, 155E is a “split” storage container, in an operation 650, the plurality of stored data records 160 in existing data storage container 155E is “split” with new data storage container 155N and in an operation 660, new data record 110 is added to either existing data storage container 155E or new data storage container 155N as discussed above.
  • According to various implementations of the invention, data storage system 100 may provide a load balancing mechanism for distributing data storage containers 135E, 135N, 155E, and 155Ni across a plurality of data storage assets. FIG. 7 illustrates data storage system 100 in different detail for purposes of describing this load balancing mechanism in accordance with various implementations of the invention. Data storage system 100 includes processor 120 and a plurality of data storage assets 710 (illustrated as a data storage asset 710A, a data storage asset 710B, a data storage asset 710C, and a data storage asset 710D). As discussed above, each data storage asset 710 comprises physical hardware configured to host one or more data storage containers 720 (illustrated in FIG. 7 as data storage asset 710A hosting a data storage container 720A1, a data storage container 720A2, . . . and a data storage container 720Am; data storage asset 710B hosting a data storage container 720B1, a data storage container 720B2, . . . and a data storage container 720Bn; data storage asset 710C hosting a data storage container 720C1, a data storage container 7200 2, . . . and a data storage container 720Cp; and data storage asset 710D hosting a data storage container 720D1, a data storage container 720D2, . . . and a data storage container 720Dq. Data storage containers 720 collectively refer to the various data storage containers 135 (including data storage containers 135N and data storage containers 135E) and data storage containers 155 (including data storage containers 155N and data storage containers 155E) described above.
  • According to various implementations of the invention, data storage system 100 distributes data storage containers 720 across data storage assets to provide high availability to data records 110 stored in data storage containers 720 and/or to provide fast response time to queries made against these data records 110. According to various implementations of the invention, data storage system 100 provides high availability and fast response time to a plurality of customers spread across data storage assets 710 without having to dedicate individual data storage assets to a given customer. By spreading out data records across data storage assets 710, queries may become highly parallelized thereby reducing response time to the queries while simultaneously reducing an overall number of data storage assets 710 required by data storage system to meet a given performance level.
  • According to various implementations of the invention, each data storage container 720 may have one or more corresponding replica data storage container (not otherwise illustrated) that mirrors data storage container 720 for purposes of redundancy and backup. In some implementations, replica data storage containers may also be used to provide further parallelization of data storage system 100. In other words, replica data storage containers may be used to directly respond to the query, or portions of the query, to increase the number of potential data storage assets 710 responding to the query. This ensures that if the data storage asset 710 hosting a particular data storage container 720 is busy responding to an unrelated query, the data storage asset 710 hosting a replica of the particular data storage container 720 may respond to the query. In some implementations of the invention, each data storage container 720 is associated with one replica data storage container. In some implementations of the invention, each data storage container is associated with two or more replica data storage containers. According to various implementations of the invention, a number of replicas utilized by data storage system 100 is based, in part, on necessary performance requirements of data storage system 100 and costs of data storage assets 710 as would be appreciated. For purposes of this description, data storage containers and replica data storage containers operate within data storage system 100 in a similar manner. In other words, no distinction is made by data storage system 100 as to whether any given data storage container 720 is a replica, and for purposes of this description, data storage containers and their replicas are referred to collectively as data storage containers 720.
  • In order to be highly available and highly scalable, data storage system 100 may distribute data storage containers 720 across data storage assets 710. When a request to create, instantiate, or otherwise bring online, a new data storage container 720 (such as discussed above with regard to new data storage containers 135N, 155N), data storage system 100, via processor 120, determines which of data storage assets 710 hosts data storage containers 720 that are “farthest away” from the data storage container 720 that reached its capacity (i.e., data storage container 135E, 155E). This ensures that a given data storage asset 710 does not host contiguous data storage containers 720. In other words, a new data storage container 720 should preferably not be hosted by the same data storage asset 710 as the existing data storage container 720. Likewise, a data storage container 720 and its replica should preferably not be hosted together by the same data storage asset 710.
  • In some implementations of the invention, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts “the farthest of the closest” data storage container 720 (described in further detail below). In some implementations of the invention, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts the fewest number of data storage containers 720 or that hosts the least amount of data or data records 110. Other mechanisms for determining which data storage asset 710 is deemed “farthest away” may be used as would be appreciated.
  • In some implementations of the invention, additional information based on the data records 110 and/or data storage containers 720 is used to determine which data storage asset 710 is deemed “farthest away.” For example, in some implementations where data storage system 100 hosts data for two or more customers, the data storage asset 710 deemed “farthest away” is determined relative to data storage containers 720 for the relevant customer. This ensures that a given customers' data records 110 (and their data storage containers 720) are distributed throughout data storage system 100. Thus, in some implementations of the invention, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts “a farthest of the closest” data storage containers 720 for the particular customer. Similarly, in some implementations of the invention, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts the fewest number of data storage containers 720 for the particular customer or that hosts the least amount of data or data records 110 for the particular customer.
  • Determining which data storage asset 710 is deemed “farthest away” may rely on other information from data records 110 in addition to or instead of the particular customer associated with the data records 110 as in the example described above. Such information may include, but is not limited to, a buyer, a website host, a website owner, a website user, or other information by which data records 110 can be discriminated.
  • In some implementations of the invention, determining which data storage asset 710 is deemed “farthest away” may rely on whether data storage container 720 is a split space data storage container or a spill space data storage container. For example, in some implementations of the invention, the data storage asset 710 deemed “farthest away” from an existing split space data storage container 720 (i.e., data storage container 155E) may be selected by determining which data storage asset 710 hosts the fewest number of data storage containers 720 or stores the least amount of data or data records 110; whereas the data storage asset 710 deemed “farthest away” from an existing spill space data storage container (i.e., data storage container 135E) may be selected by determining which data storage asset 710 hosts the farthest of the closest data storage container 720.
  • As referenced above, the data storage asset 710 deemed “farthest away” may be selected by determining which data storage asset 710 hosts “the farthest of the closest” data storage container 720. In some implementations of the invention, this may be accomplished by first determining, for each of data storage assets 710, which data storage container 720 hosted on the respective data storage asset 710 is closest, in terms of sequential range, to the existing data storage container 720.
  • In some implementations of the invention, each data storage container in a sequential data storage space stores a sub-range of the data values in the sequential data space. These sub-ranges are more or less “distant” from the sub-range of the existing storage container 720 (for example, by the sum of the sub-ranges that lie in between). In this context, in some implementations of the invention, the data storage containers 720 on each data storage asset 710 are ordered based on the relative “distance” of their respective sub-ranges to that of the existing data storage container 720 from closest to farthest. More specifically, data storage containers 720A1, 720A2, . . . , and 720Am hosted on data storage asset 710A are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest; data storage containers 720B1, 720B2, . . . , and 720Bn hosted on data storage asset 710B are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest; data storage containers 7200, 7200 2, . . . , 7200 p hosted on data storage asset 7100 are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest; and data storage containers 720D1, 720D2, . . . , 720Dq hosted on data storage asset 710D are ordered based on the distance between their respective sub-ranges and that of the existing data storage container from closest to farthest. In some implementations of the invention, once the closest data storage container 720 to the existing data storage container is determined for each data storage asset 710, the farthest of these from the existing data storage container is then determined. Then, in some implementations of the invention, the data storage asset 710 that hosts the farthest of the closest data storage containers may be selected to host the new data storage container 720 and a new data storage container 720 may be created on the selected data storage asset 710.
  • In some implementations of the invention, two or more data storage assets 710 may be determined as hosting the farthest of the closest data storage container 720. In such situations, the data storage asset 710 that stores the least amount of data or data records, or that hosts the fewest number of data storage containers 720 is determined between these two or more data storage assets 710. Then, in some implementations of the invention, the data storage asset 710 that stores the least amount of data or data records, or that hosts the fewest number of data storage containers 710 may be selected to host the new data storage container 720 and a new data storage container 720 may be created on the selected data storage asset 710.
  • In some implementations of the invention, two or more data storage assets 710 may be determined as storing the least amount of data or data records 110, or that host the fewest number of data storage containers. In such situations, any one of the data storage assets 710 may be selected to host the new data storage container 720 and a new data storage container 720 may be created on the selected data storage asset 710.
  • FIG. 8 illustrates an operation 800 of a load balancing mechanism for data storage system 100 according to various implementations of the invention. In a decision operation 810, processor 120 determines whether to create a new spill space container 135N or a new split space container 155N. If a new split space container 155N is to be created, processing continues at an operation 820; if a new spill space container 135N, processing continues at an operation 850.
  • In an operation 820, processor 120 determines which data storage asset(s) 710 hosts the fewest number of data storage containers 720. In a decision operation 830, processor 120 determines whether two or more data storage assets 710 host the fewest number of data storage containers 720. If so, processing continues at an operation 835. If not, processing continues at an operation 890.
  • In operation 835, processor 120 selects one of the two or more data storage assets 710 to host the new split space data storage container 155N. In an operation 840, processor 120 creates a new data storage container 720 (corresponding to new data storage container 155N) on the selected data storage asset 710.
  • In an operation 890, processor 120 selects the single determined data storage asset 710 to host the data storage container 720, and processing continues at operation 840.
  • In an operation 850, processor 120 determines which data storage asset(s) 710 hosts the farthest of the closest data storage containers 710. In a decision operation 860, if two or more data storage assets 710 host the farthest of the closest data storage container 710, then processing continues at an operation 870; otherwise processing continues at operation 890. In operation 870, processor 120 determines which data storage asset(s) 710 hosts the fewest number of data storage containers 710. In a decision operation 880, if two or more data storage assets 710 host the fewest number of data storage containers 710, then processing continues at an operation 835 where one of them is selected; otherwise processing continues at operation 890.
  • While the invention has been described herein in terms of various implementations, it is not so limited and is limited only by the scope of the following claims, as would be apparent to one skilled in the art. These and other implementations of the invention will become apparent upon consideration of the disclosure provided above and the accompanying figures. In addition, various components and features described with respect to one implementation of the invention may be used in other implementations as would be understood.

Claims (8)

What is claimed is:
1. A method for creating storage containers comprising:
receiving, via a computing processor, a new data record to be stored in a data storage container, the data storage container configured to store a fixed number of stored data records, the data storage container storing a plurality of stored data records;
determining whether a number of the plurality of stored data records in the data storage container is within a certain threshold of the fixed number of stored data records for the data storage container;
for the data storage container that resides in a sequential data space:
opening a new data storage container,
storing the new data record in the new data storage container, and
closing the data storage container to new data records; and
for the data storage container that resides in a finite data space:
opening a new data storage container,
splitting the plurality of stored data records in the data storage container into a first plurality of stored data records and a second plurality of stored data records, wherein each of the first plurality of stored data records has a data value within a first range, and wherein each of the second plurality of stored data records has the data value within a second range,
storing the first plurality of stored data records in the data storage container,
storing the second plurality of the stored data records in the new data storage container, and
storing the new data record in either the data storage container or the new data storage container based on whether the data value in the new data record corresponds to the first range or the second range.
2. The method of claim 1, wherein the retaining the first plurality of the stored data records in the data storage container comprises removing the second plurality of the stored data records from the data storage container.
3. The method of claim 1, wherein splitting the plurality of stored data records in the data storage container into a first plurality of stored data records and a second portion of stored data records comprises evenly splitting the plurality of stored data records in the data storage container into the first plurality of stored data records and the second portion of stored data records.
4. The method of claim 1, wherein splitting the plurality of stored data records in the data storage container into a first plurality of stored data records and a second portion of stored data records comprises splitting the plurality of stored data records in the data storage container into the first plurality of stored data records and the second portion of stored data records along an existing organizational boundary in the plurality of stored data records in the data storage container.
5. The method of claim 1, wherein splitting the plurality of stored data records in the data storage container into a first plurality of stored data records and a second plurality of stored data records comprises splitting the plurality of stored data records in the data storage container into the first plurality of stored data records, the second plurality of stored data records, and one or more third pluralities of stored data records.
6. The method of claim 5, wherein each of the first plurality of stored data records has a data value within the first range, wherein each of the second plurality of stored data records has the data value within the second range, and wherein each of the one or more third plurality of stored data records has the data value within a third range.
7. The method of claim 1, wherein splitting the plurality of stored data records in the data storage container into a first plurality of stored data records and a second plurality of stored data records comprises splitting a data storage space of the plurality of stored data records into a first subspace having the first range and a second subspace having the second range such that the first range and the second range correspond to an entire range of the data storage space.
8. A method for creating storage containers comprising:
receiving, via a computing processor, a first new data record to be stored in a first data storage container, the first data storage container comprising a plurality of first stored data records;
determining that a number of the plurality of first stored data records in the first data storage container is at or near capacity of the first data storage container;
determining that the first data storage container stores data records in a sequential data space and:
opening a new first data storage container,
storing the new data record in the new first data storage container, and
closing the first data storage container to new data records; and
receiving, via a computing processor, a second new data record to be stored in a second data storage container, the second data storage container comprising a plurality of second stored data records;
determining that the second data storage container stores data records in a finite data space and:
opening a new second data storage container,
splitting the plurality of second stored data records in the second data storage container into a first portion of the plurality of second stored data records and a second portion of the plurality of second stored data records,
storing the first portion of the plurality of the second stored data records in the second data storage container,
storing the second portion of the plurality of the second stored data records in the new second data storage container, and
storing the new data record in either the second data storage container or the new second data storage container.
US14/562,611 2013-12-06 2014-12-05 System and method for creating storage containers in a data storage system Abandoned US20150169619A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/562,611 US20150169619A1 (en) 2013-12-06 2014-12-05 System and method for creating storage containers in a data storage system
US16/204,008 US11249991B2 (en) 2013-12-06 2018-11-29 System and method for creating storage containers in a data storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361913227P 2013-12-06 2013-12-06
US14/562,611 US20150169619A1 (en) 2013-12-06 2014-12-05 System and method for creating storage containers in a data storage system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/204,008 Continuation US11249991B2 (en) 2013-12-06 2018-11-29 System and method for creating storage containers in a data storage system

Publications (1)

Publication Number Publication Date
US20150169619A1 true US20150169619A1 (en) 2015-06-18

Family

ID=53368688

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/562,611 Abandoned US20150169619A1 (en) 2013-12-06 2014-12-05 System and method for creating storage containers in a data storage system
US16/204,008 Active 2036-01-08 US11249991B2 (en) 2013-12-06 2018-11-29 System and method for creating storage containers in a data storage system

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/204,008 Active 2036-01-08 US11249991B2 (en) 2013-12-06 2018-11-29 System and method for creating storage containers in a data storage system

Country Status (1)

Country Link
US (2) US20150169619A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210200717A1 (en) * 2019-12-26 2021-07-01 Oath Inc. Generating full metadata from partial distributed metadata
CN116800733A (en) * 2023-08-18 2023-09-22 荣耀终端有限公司 Downloading method of differential packet and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030105754A1 (en) * 2001-09-26 2003-06-05 Alain De Cheveigne Method for the scaling of the indexing data of a multimedia document
US20030167380A1 (en) * 2002-01-22 2003-09-04 Green Robbie A. Persistent Snapshot Management System
US20040117572A1 (en) * 2002-01-22 2004-06-17 Columbia Data Products, Inc. Persistent Snapshot Methods
US6813312B2 (en) * 1999-01-29 2004-11-02 Axis, Ab Data storage and reduction method for digital images, and a surveillance system using said method
US20080270072A1 (en) * 2007-04-24 2008-10-30 Hiroshi Sukegawa Data remaining period management device and method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3183736B2 (en) * 1992-12-28 2001-07-09 富士通株式会社 Dynamic change method of database logical data structure
US7124124B1 (en) * 1999-05-24 2006-10-17 Quantum Corporation Data storage devices for large size data structures
US6636849B1 (en) * 1999-11-23 2003-10-21 Genmetrics, Inc. Data search employing metric spaces, multigrid indexes, and B-grid trees
US6857097B2 (en) * 2001-05-16 2005-02-15 Mitsubishi Electric Research Laboratories, Inc. Evaluating and optimizing error-correcting codes using a renormalization group transformation
US8918436B2 (en) * 2011-12-22 2014-12-23 Sap Ag Hybrid database table stored as both row and column store
US9880771B2 (en) * 2012-06-19 2018-01-30 International Business Machines Corporation Packing deduplicated data into finite-sized containers
US9235564B2 (en) * 2013-07-19 2016-01-12 International Business Machines Corporation Offloading projection of fixed and variable length database columns

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6813312B2 (en) * 1999-01-29 2004-11-02 Axis, Ab Data storage and reduction method for digital images, and a surveillance system using said method
US20030105754A1 (en) * 2001-09-26 2003-06-05 Alain De Cheveigne Method for the scaling of the indexing data of a multimedia document
US20030167380A1 (en) * 2002-01-22 2003-09-04 Green Robbie A. Persistent Snapshot Management System
US20030220949A1 (en) * 2002-01-22 2003-11-27 Columbia Data Products, Inc. Automatic deletion in data storage management
US20030220929A1 (en) * 2002-01-22 2003-11-27 Columbia Data Products, Inc. Managing finite data storage utilizing preservation weights
US20040117572A1 (en) * 2002-01-22 2004-06-17 Columbia Data Products, Inc. Persistent Snapshot Methods
US20080270072A1 (en) * 2007-04-24 2008-10-30 Hiroshi Sukegawa Data remaining period management device and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210200717A1 (en) * 2019-12-26 2021-07-01 Oath Inc. Generating full metadata from partial distributed metadata
CN116800733A (en) * 2023-08-18 2023-09-22 荣耀终端有限公司 Downloading method of differential packet and server

Also Published As

Publication number Publication date
US20190171635A1 (en) 2019-06-06
US11249991B2 (en) 2022-02-15

Similar Documents

Publication Publication Date Title
US9020892B2 (en) Efficient metadata storage
CN106201771B (en) Data-storage system and data read-write method
US9256633B2 (en) Partitioning data for parallel processing
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
US11249980B2 (en) Updating of in-memory synopsis metadata for inserts in database table
US20100312749A1 (en) Scalable lookup service for distributed database
US20140101167A1 (en) Creation of Inverted Index System, and Data Processing Method and Apparatus
US20140115252A1 (en) Block storage-based data processing methods, apparatus, and systems
US9807168B2 (en) Distributed shared log for modern storage servers
US11249991B2 (en) System and method for creating storage containers in a data storage system
US10572463B2 (en) Efficient handling of sort payload in a column organized relational database
CN107391544A (en) Processing method, device, equipment and the computer storage media of column data storage
US11544242B2 (en) System and method for storing and retrieving data in different data spaces
US10712943B2 (en) Database memory monitoring and defragmentation of database indexes
US11330054B2 (en) System and method for load balancing in a data storage system
US11531666B1 (en) Indexing partitions using distributed bloom filters
CN104883394A (en) Method and system for server load balancing
US11080301B2 (en) Storage allocation based on secure data comparisons via multiple intermediaries
CN115237960A (en) Information pushing method and device, storage medium and electronic equipment
CN108197164A (en) Business data storage method and device
CN109214884B (en) Demand matching method and device and electronic equipment
CN107491265B (en) Method and device for distributing internet protocol IP disk
US10783268B2 (en) Data allocation based on secure information retrieval
US10216748B1 (en) Segment index access management in a de-duplication system
CN108388406A (en) Data processing method and device

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: ZAIUS, INC., VIRGINIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PINGRY, SPENCER ELDON;MULIERI, JONATHAN BARTHOLOMEW;REEL/FRAME:062944/0927

Effective date: 20141205

AS Assignment

Owner name: OPTIMIZELY NORTH AMERICA INC., NEW HAMPSHIRE

Free format text: MERGER;ASSIGNOR:ZAUIS, INC.;REEL/FRAME:064612/0037

Effective date: 20230331