WO1999038094A1 - Database apparatus - Google Patents

Database apparatus Download PDF

Info

Publication number
WO1999038094A1
WO1999038094A1 PCT/IL1999/000038 IL9900038W WO9938094A1 WO 1999038094 A1 WO1999038094 A1 WO 1999038094A1 IL 9900038 W IL9900038 W IL 9900038W WO 9938094 A1 WO9938094 A1 WO 9938094A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
data
thε
key
block
Prior art date
Application number
PCT/IL1999/000038
Other languages
French (fr)
Inventor
Moshe Shadmon
Original Assignee
Ori Software Development Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ori Software Development Ltd. filed Critical Ori Software Development Ltd.
Priority to CA002319177A priority Critical patent/CA2319177A1/en
Priority to HU0101298A priority patent/HUP0101298A3/en
Priority to NZ505767A priority patent/NZ505767A/en
Priority to BR9907227-0A priority patent/BR9907227A/en
Priority to EP99901096A priority patent/EP1049990A4/en
Priority to JP2000528930A priority patent/JP2002501256A/en
Priority to AU20719/99A priority patent/AU759360B2/en
Priority to IL13734799A priority patent/IL137347A0/en
Publication of WO1999038094A1 publication Critical patent/WO1999038094A1/en
Priority to IL137347A priority patent/IL137347A/en
Priority to NO20003759A priority patent/NO20003759L/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Definitions

  • This invention relates to databases and database management systems.
  • a database system is a collection of interrelated data files, indexes and a set of programs that allow one or more users to add data retrieve and modify the data stored in these files.
  • the fundamental concept of a database system is to provide users with a so called “abstract” and simplified view of the data (referred to also as data model or conceptual structure) which exempts a conventional user from dealing with details such as how the data is physically organized and accessed.
  • relational model Other concepts introduced by the relational model are high level operators that operate on tables (i.e., both their parameters and results are tables) and comprehensive data languages (now called 4th generation languages) in which one specifies what are the required results rather than how these results are to be produced.
  • Such non-procedural languages SQL - Structured Query Language
  • SQL - Structured Query Language have become an industry standard.
  • the relational model suggests a very high level of data independence. There should not be any effect on the programs written in these languages due to changes in the manner data are organized, stored, indexed and ordered.
  • the relational model has become a de-facto standard for data analysts.
  • Network Model - In the relational model, data (and relationship between data) are regarded as a collection of tables. In distinction therefrom in the network model data are represented as a collection of records whereas relationship between the records (data) are represented as links.
  • a record in the network model is similar to an "entity" in the sense that it is a collection of fields each holding one type of data.
  • the links may be effectively viewed preferably (but not necessarily) as pointers.
  • a collection of records and the relation therebetween constitutes a collection of graphs.
  • Hierarchical Model resembles the network model in the manner that data and relations between data are treated, i.e. as records and links. However, in distinction from the network model, the records and the relations between them constitute a collection of trees rather than of arbitrary graphs.
  • the structure of the Hierarchical Model is simple and straightforward particularly in the case that the data that needs to be organized in a database are of inherent hierarchical nature.
  • the hierarchical model has some inherent shortcomings, e.g. in many real life scenarios data cannot be easily arranged in hierarchical manner. Moreover, even if data may be organized in hierarchical manner, it may require larger volumes as compared to other database models.
  • the object-oriented approach views all entities a objects. Each object belongs to a class, with each class there are associated methods and fields. - 4 -
  • the fields are private, accessible only to methods of the class while others axe public accessible to all.
  • "Joe Smith" belongs to the class of persons.
  • the private fields age can be defined.
  • Applying the class method update_age() to the object Joe will change his age.
  • the methodology allows to define sub-classes which inherit all the methods and fields of the super-class.
  • the employee class can be defined as a subclass of the person class.
  • the employee class could support a salary field, and the get_raise ( ) method.
  • Object Relational Model allows an object view on relational-organized data. Thus, one is able to operate on the data as if it is organized as objects and at the same time, support the relational approach.
  • data models deal with the conceptual or logical level of data representation and "hide” details such as how the data are physically airanged and accessed.
  • the latter characteristics are normally dealt with by a so-called database file management system.
  • the database file management system maps the logical structure (in terms of database model) to a data structure, pertinent operations and possibly other data.
  • the data structure includes index -and data records.
  • the index enables accessing or updating the data records by a key. .In the context of search, the term search key is used.
  • Database file management system should preferably operate on the data records so as to accomplish enhanced performance in terms of time (i.e. from the user's standpoint fast response time of the database), and space (i.e. to minimize the storage volume that is allocated for the database files). As is well known in the art, normally, there is a trade off between the time and space requirements.
  • the performance of the database depends on the efficiency of the data structures that are used to represent the data and how efficiently the system can operate on these data. A detailed discussion on conventional file and management systems is given for example in Chapters 7 (file system structure) and 8 (indexing ) in "Database System Concepts", ibid.
  • Known database file management systems typically utilize the following indexing schemes, which fall into the following main categories that include: Multi-way trees indexes and others.
  • Multi-way trees indexes- These techniques can be used to create a one or more access paths (referred to also as search paths) to the same data record.
  • the search paths form a multi-way tree.
  • Its main disadvantages are that it requires space (usually all the keys to the records plus some pointers) and maintenance (addition and/or deletion of keys whenever an update transaction (see definition below) occurs i.e. record is added and/or deleted.
  • the nature of the indexing scheme as well as the volume of the data held in the files determine the number of accesses that are required to find or update (update encompasses, insert, delete or modify) a given data record.
  • the storage medium under consideration is an external memory, the number of accesses is effectively the number of .I/O accesses.
  • a block of data is loaded into the memoiy.
  • Trie indexing scheme An example of the latter is the trie discussed in G. Wiederhold, “File organization for Database design”; Mcgraw-Hill, 1987, pp. 272, 273, or in D.E. Knuth, “The Art of Computer Programming”; Addison- Wesley Publishing Company, 1973, pp. 481-505, 681-687.
  • the trie indexing scheme enables a rapid search whilst avoiding the duplication of keys as manifested for example by the B tree technique.
  • the trie indexing scheme has the general structure of a tree wherein the search is based on partitioning the search according to search key portions (e.g. search key digit or bit).
  • search key portions e.g. search key digit or bit.
  • each node in the trie indexing file represents an offset of the search key and the link to any one of its children represents the character's value at said offset.
  • the trie structure affords efficient data structure in terms of the memory space that is allocated therefor, since, as specified before, the search-key is not held, as a whole, in internal nodes and hence the duplication that is exhibited for example in the B -tree indexing technique is avoided.
  • a trie indexing file should be built by selecting the digits (or bits) from the search key such that the best possible partition of the search space in obtained, or in other words so as to accomplish a tree which is as balanced as possible. This, however, requires a priori .knowledge of the data records of the trie and is accomplished at the penalty of obtaining an unsorted data, which in many real-life scenarios is inapplicable. It is noteworthy that if sorted data is mandatory, a balanced structure can not be guaranteed even if there is sufficient a prioiri knowledge of the data records of the trie. It should be noted that the specified trie does not support sequential sub-range processing.
  • the specified B-tree indexing scheme constitutes an inherent balanced tree structure, even after the tree has been subject to update transactions.
  • the inherent balanced (or essentially balanced) structure is accomplished, however, and as explained above, at the penalty of inflating the contents of the blocks in the tree and, consequently, unduly increasing the file size that holds the index, particularly insofar as large trees which hold multitude of data records are concerned.
  • the large volume of the files adversely affects the performance of the data management system in terms of number of accesses (and consequently in terms of accessing time) to the storage medium in order to reach a sought data record, which is obviously undesired.
  • the r ⁇ resentatives cf level i constitute the nodes of level i - 1 .
  • Level h+1 is the first empty level.
  • the index scheme includes here three index files. This obviously poses undesired overhead insofar as data volumes and additional integrity maintenance and checking are concerned.
  • r ⁇ moval of a giv ⁇ n book from the book file requires a preliminary t ⁇ st to inquire whether it exists in the borrower-book index file.
  • Block - a storage unit which can be access ⁇ d by a singl ⁇ I/O op ⁇ ration.
  • a block may contain data arrang ⁇ d in any d ⁇ sir ⁇ d mann ⁇ r, ⁇ .g. nod ⁇ s arrang ⁇ d as a tree and possibly also links to actual data records.
  • a block may reside in main (ref ⁇ rr ⁇ d to also as int ⁇ rnal) or s ⁇ condary (referred to also as ext ⁇ rnal) storag ⁇ .
  • Tree - a data structure which is cither empty or consists of a root node linked by means of d ⁇ d ⁇ ) pointers (or links) to d disjoint trees called subtrees of the root.
  • a node all the subtrees of which ar ⁇ ⁇ mpty is called a leaf node.
  • the nodes in th ⁇ tr ⁇ that ar ⁇ not l ⁇ av ⁇ s ar ⁇ d ⁇ signat ⁇ d as internal nodes.
  • leaf nodes are also nodes that are associated with data records.
  • tre ⁇ encompasses also a tre ⁇ of blocks wh ⁇ r ⁇ in each node constitutes a block.
  • desc ⁇ nd ⁇ nt blocks of a said block ar ⁇ all th ⁇ blocks that can be access ⁇ d from th ⁇ block.
  • tr ⁇ refer also to the book Cormen, L ⁇ is ⁇ rson and Riv ⁇ st, or L ⁇ wis and D ⁇ n ⁇ b ⁇ rg "Data structures and th ⁇ ir algorithms”.
  • th ⁇ association ⁇ .g. link
  • data r ⁇ cord encompasses any realization, which enabl ⁇ s to access data records from l ⁇ af nod ⁇ s.
  • a data record may be accessed directly (i.e. through pointer) from the leaf node.
  • th ⁇ l ⁇ af nod ⁇ points to data structure, (e.g. a table) which, in turn, enables to access data records.
  • Oth ⁇ r variants ar ⁇ of course, also feasible.
  • Depth of an index - is defin ⁇ d as th ⁇ maximum number of blocks from a root block to a block associated with a data record. - 11 -
  • An ind ⁇ x is balanced if th ⁇ r ⁇ ⁇ xists a constant c such that th ⁇ numb ⁇ r of accesses needed to reach any data record is at most clogrc , where n is the number of records in the structure.
  • Accessing in an index would be consider ⁇ d as a process of moving from a node to another node within a block or to another block usually, although not necessarily, in order to reach sought data records.
  • Navigating is consider ⁇ d as accessing data records, usually (although not necessarily), in order to collect them in an order ⁇ d mann ⁇ r by th ⁇ ir k ⁇ y.
  • Search scheme m ⁇ aning th ⁇ algorithm that is associated with an index that is used for accessing a given data record by key; intra-block search scheme meaning the algorithm that is us ⁇ d insid ⁇ th ⁇ block for accessing a given data record or another block. Th ⁇ data r ⁇ cord is not necessarily accommodated within said block.
  • the common key of a block is the long ⁇ st prefix of all k ⁇ ys of th ⁇ data r ⁇ cords that can b ⁇ accessed from the block by the rel ⁇ vant search scheme. If d ⁇ sir ⁇ d, part or all of th ⁇ common k ⁇ y may b ⁇ h ⁇ ld explicitly in the block.
  • Update transactions - transaction consisting of eith ⁇ r inserting a new data record, or del ⁇ ting an ⁇ xisting data r ⁇ cord or modifying an existing data record or portion ther ⁇ of .
  • Horizontal oriented trie structure having h l ⁇ v ⁇ ls of v ⁇ rtical orientated trie structures with the first lev ⁇ l standing for th ⁇ upp ⁇ rmost l ⁇ v ⁇ l and the h th lev ⁇ l standing for th ⁇ low ⁇ nost level (constituting the tri ⁇ that is susc ⁇ ptibl ⁇ to an unbalanced structure) which is normally associated with data r ⁇ cords, and allows to mov ⁇ from a block in the z ' th lev ⁇ l to a block in th ⁇ i + 1 st level according to a common key value of the block.
  • the h upper levels constitute a representativ ⁇ ind ⁇ x ov ⁇ r th ⁇ common k ⁇ ys of th ⁇ blocks of th ⁇ low ⁇ rmost level tre ⁇ .
  • Storage medium - Any medium that may be used to store data, including eith ⁇ r or both of int ⁇ mal and external memory.
  • Ext ⁇ rnal m ⁇ mory may b ⁇ one or more of the following: magnetic tape, magnetic disk, optical disk, or any oth ⁇ r physical medium used for storing data.
  • Int ⁇ rnal m ⁇ mory includes any known main memory including cache memory as well as any other physical storage medium that serr ⁇ as internal memory.
  • Short link - (ref ⁇ rr ⁇ d to also as near link) a link lab ⁇ l ⁇ d k b ⁇ tween a node a having the value r to node b in the same block such that the keys of the data records that include node b on their access path hav ⁇ th ⁇ value k at key position r.
  • Long link - (referred to also as far link) a link betw ⁇ n a nod ⁇ v in block B of level i to block W of level i - 1 or to a data record. If v has value r and the label of the link is k, then th ⁇ valu ⁇ of th ⁇ common k ⁇ y of block B' or th ⁇ k ⁇ y of the data record is k at position r.
  • the label of a short link or a far link is also referred as the value or direction of the link.
  • Aft ⁇ r th ⁇ split, th ⁇ split link is the link betw ⁇ n nod ⁇ a and block B (that is accommodating nod ⁇ b).
  • a split link is a lab ⁇ l ⁇ d link.
  • Direct link - a link betwe ⁇ n nod ⁇ v in block B of l ⁇ v ⁇ l i to block B' of level i - ⁇ , that includes a node v' such that nodes v and v' have the same value. If a search path to data record with a key k includes node v but does not include any of its near and far links then it should contain the dir ⁇ ct link to block B'. A dir ⁇ ct link has no label.
  • v is considered a duplicated node of v'.
  • a duplicated node maintains a direct link to the block that includes node v . (a duplicated node is also ref ⁇ rr ⁇ d as copied node).
  • Data records consist as a .rule of several fields, some of which are designat ⁇ d as keys. Som ⁇ tim ⁇ s th ⁇ records ar ⁇ ord ⁇ r ⁇ d by on ⁇ of th ⁇ keys, called the primary key. .An index (or index schem ⁇ ) ov ⁇ r th ⁇ keys of data records or over representativ ⁇ k ⁇ ys (for the definition of the latter se ⁇ b ⁇ low) is a data structure that facilitates search by one or more of the keys. Examples of index are any of the specified Multi-way tree index schemes. An index according to the invention may be constituted by using more than one index schem ⁇ .
  • Th ⁇ ind ⁇ x may be stored in a file or files that reside partially or entirely in the internal memory or ext ⁇ rnal m ⁇ mory.
  • an index that includes a partitioned index — a dynamic data structure - that allows search by key, and is partitioned into blocks, each of which contains a representative key.
  • the representative keys should be sufficient to find the block associated with a record whose key equals the s ⁇ arch k ⁇ y (if on ⁇ ⁇ xists). Having located the block, the data record may easily be retrieved.
  • the repr ⁇ s ⁇ ntative keys are not necessarily stored physically in the block.
  • partitioned index examples are:
  • partition ⁇ d index contains its key and its link.
  • Thes ⁇ pairs ar ⁇ ord ⁇ r ⁇ d by non-d ⁇ creasing value of th ⁇ k ⁇ y.
  • a partitioned index ⁇ s ov ⁇ r th ⁇ k ⁇ ys of data r ⁇ cords is called a basic partitioned index and is denot ⁇ d ind ⁇ x layer I..
  • This partitioned index might become non-balanc ⁇ d, thus giving rise to some long search paths.
  • an additional index layer (an index layer is denot ⁇ d in short also index) I x is constructed over the representativ ⁇ k ⁇ ys of I Q .
  • I x is also a partition ⁇ d ind ⁇ x th ⁇ n an additional index I. may be constructed over the repr ⁇ sentative k ⁇ ys of th ⁇ blocks of I ⁇ . This process may be rep ⁇ at ⁇ d until creating an index I h (her ⁇ inaft ⁇ r root ind ⁇ x) which preferably is fully contained within a single block.
  • Th ⁇ root ind ⁇ x I h is not necessarily a partitioned index.
  • the layered index is not necessarily a partitioned index.
  • I v ... ,I h constitute a so called representative index.
  • a search is perform ⁇ d as above to find the block B . Having found B in I. , r is added to B .
  • Th ⁇ ov ⁇ rflow of block B x in I x entails a splitting of B x and the repr ⁇ sentative of B x in I. is replaced by the representativ ⁇ s of th ⁇ new blocks etc. If the block of /,, overflows an additional layer I h+X is created and added to the layer ⁇ d ind ⁇ x. It should b ⁇ not ⁇ d that an "ov ⁇ rflow" stat ⁇ may b ⁇ d ⁇ t ⁇ rmined according to the particular application, and do ⁇ s not necessarily trigger ⁇ d wh ⁇ n block is r ⁇ nd ⁇ r ⁇ d full. Thus, for example, by one embodim ⁇ nt ov ⁇ rflow occurs wh ⁇ n a block is at least half size full.
  • Deletion is similar to insertion, and might involve merging — rev ⁇ rs ⁇ process of splitting.
  • the updat ⁇ or th ⁇ split n ⁇ d not n ⁇ cessarily be performed on the fly, but may b ⁇ d ⁇ lay ⁇ d (i.e. performed post factum).
  • const ction of the layer ⁇ d ind ⁇ x preferably retains a balanced index.
  • th ⁇ inh ⁇ r ⁇ nt limitations of a basic partitioned index e.g. trie
  • a basic partitioned index e.g. trie
  • memory ⁇ ⁇ fficient means that the number of accesses to the storage medium through the layer ⁇ d ind ⁇ x in ord ⁇ r to p ⁇ rform an update transaction (e.g. insert, del ⁇ t ⁇ or modify) on a data r ⁇ cord or access data record is smaller compared to the number of accesses to the storage medium through the basic partitioned index.
  • Numb ⁇ r of accesses should be construed such that in each access a block is handled (e.g. loaded or proc ⁇ ss ⁇ d) from th ⁇ storag ⁇ m ⁇ dium.
  • Th ⁇ r ⁇ may b ⁇ ⁇ xceptional scenarios where the latter "mor ⁇ ⁇ fficient" provision does not apply ⁇ .g. in th ⁇ cas ⁇ of v ⁇ ry small fil ⁇ having only f ⁇ w blocks, wh ⁇ r ⁇ accessing a data record through the basic partitioned index may requir ⁇ th ⁇ sam ⁇ or even l ⁇ ss op ⁇ rations than through said lay ⁇ r ⁇ d ind ⁇ x.
  • each k ⁇ y is r ⁇ gard ⁇ d as a character or bit string.
  • the trie if it cannot be accommodated in a single block, it is partition ⁇ d into blocks, such that ⁇ ach block contains a singl ⁇ subtree of the trie.
  • the repr ⁇ s ⁇ ntativ ⁇ k ⁇ y of the block is the string associated with the root node of the trie in th ⁇ block, i.e., the sequence of labels of the path from th ⁇ root of th ⁇ trie of /,. to the root of the trie of th ⁇ block.
  • the r ⁇ presentative k ⁇ ys of /,. are the k ⁇ ys of I i+ .
  • To search a key k in I M one s ⁇ arches for the longest prefix k in the blocks of I i+X and from there moves to the appropriate block of /,..
  • a r ⁇ cord ⁇ n tails adding its k ⁇ y to 7 0 , i.e., adding a value to the tri ⁇ of I- . If as a result a block overflows, the block is split — it is partitioned into typically two (in some implem ⁇ ntations mor ⁇ ) blocks, such that ⁇ ach block contains a (conn ⁇ ct ⁇ d) tri ⁇ . To accomplish this a link b ⁇ tw ⁇ n a nod ⁇ w and its child v is severed, and the subtre ⁇ root ⁇ d atv is mov ⁇ d to anoth ⁇ r block. The repr ⁇ s ⁇ ntative key of the n ⁇ w block, is add ⁇ d to I x . As in th ⁇ g ⁇ n ⁇ ral lay ⁇ r ⁇ d ind ⁇ x sch ⁇ m ⁇ , this process is continued to y..y.
  • th ⁇ s ⁇ savings affect the manner in which the search is perform ⁇ d. In such compressed tries usually only nodes of d ⁇ gr ⁇ e greater than or equal to two are maintained. If the s ⁇ arch k ⁇ y k do ⁇ s not b ⁇ long to compr ⁇ ss ⁇ d tri ⁇ , th ⁇ s ⁇ arch might t ⁇ rminat ⁇ at som ⁇ record r , and we have to check wheth ⁇ r k is ⁇ qual to the key of r . If the keys ar ⁇ different th ⁇ n th ⁇ tri ⁇ does not contain a record with key k .
  • Thes ⁇ links do not hav ⁇ a direction, and ar ⁇ taken when the appropriat ⁇ position of th ⁇ s ⁇ arch k ⁇ y do ⁇ s not agree with any one of the directions of the nod ⁇ .
  • Th ⁇ search continued from block of I t _ x pointed at by that direct link. (If no such node exists, we go to the first block of the index f_ x .)
  • each layer might r ⁇ quir ⁇ one extra access.
  • 3 layers ar ⁇ sufficient to address billions of r ⁇ cords and usually 2 layers can be maintained in the internal memory of a computer.
  • the split process also has to accommodate dir ⁇ ct links. Suppos ⁇ that th ⁇ access path to block B t _ of /,._, consists of blocks,, of layer I ; , £,._, - 19 -
  • Block B l has now to contain links to all its d ⁇ sc ⁇ nd ⁇ nt blocks in I t _ x . This can b ⁇ accomplished by the following non-limiting technique:
  • ky be the representative key of By, this key is insert ⁇ d to T, — th ⁇ compr ⁇ ss ⁇ d tri ⁇ of B, — so that th ⁇ s ⁇ arch to the keys of descend ⁇ nts of B reaches By, and the search for th ⁇ descend ⁇ nts of #,_, reaches B t _ x .
  • a non-limiting method to accomplishing split process is as follows:
  • At least one short link among the short links of a node (her ⁇ on split nod ⁇ ) in th ⁇ block is d ⁇ l ⁇ t ⁇ d (h ⁇ r ⁇ on split link) in a way that at least two tries exist in the block.
  • each of the sub-tre ⁇ s is mov ⁇ d to a separate block.
  • B l is cr ⁇ at ⁇ d and a copied node of the split node is cr ⁇ at ⁇ d in B t .
  • th ⁇ far link can b ⁇ r ⁇ plac ⁇ d by a dir ⁇ ct link from th ⁇ child nod ⁇ to block s .
  • a split of a block in I k , k>0 is performed such that the split links (of I k ) are links b ⁇ tween copi ⁇ d nod ⁇ s of - 20 -
  • the invention provides for in a storage m ⁇ dium us ⁇ d by a databas ⁇ file managem ⁇ nt system ex ⁇ cuted on data processing syst ⁇ m, a data structure that includes: a layered index arranged in blocks; the layer ⁇ d index includes a basic partitioned index that is associated with data records; the basic partitioned ind ⁇ x ⁇ nables accessing or updating the data records by key or keys, and b ⁇ ing susceptible to an unbalanced structure of blocks; said layer ⁇ d ind ⁇ x ⁇ nabl ⁇ s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
  • Th ⁇ inv ⁇ ntion furth ⁇ r provides for, in a storage m ⁇ dium used by a database file management system ex ⁇ cuted on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the keys of data records; the index includes a basic partitioned index that is associated with the data records; the basic partitioned index enabl ⁇ s accessing or updating the data records by key or keys, and being susceptibl ⁇ to an unbalanced structure of blocks; said index enables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
  • Still fu.rtl ⁇ er, th ⁇ invention provides for, in a storage m ⁇ dium us ⁇ d by a databas ⁇ file managem ⁇ nt system ex ⁇ cut ⁇ d on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the k ⁇ ys of data r ⁇ cords; the index includes a trie that is associated with the data records; the trie enables accessing or updating the data records by k ⁇ y or keys, and being susceptibl ⁇ to an unbalanced structure of blocks; said ind ⁇ x ⁇ nables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
  • the invention provides for in a database file management - 21 -
  • syst ⁇ m for accessing data records and being ex ⁇ cut ⁇ d on data processing system
  • the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium
  • the basic partition ⁇ d ind ⁇ x ⁇ nabl ⁇ s accessing or updating the data records by key or keys and being susceptibl ⁇ to an unbalanced structure of blocks
  • a method for constructing a layer ⁇ d ind ⁇ x arranged in blocks comprising the steps of:
  • Th ⁇ inv ⁇ ntion furth ⁇ r provid ⁇ s for in a databas ⁇ file management system for accessing data r ⁇ cords and being ex ⁇ cuted on data processing system;
  • the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium; the basic partitioned index enables accessing or updating th ⁇ data r ⁇ cords by k ⁇ y or keys and being susceptibl ⁇ to an unbalanced structure of blocks;
  • a method for constructing an index ov ⁇ r the keys of the data r ⁇ cords, th ⁇ ind ⁇ x b ⁇ ing arrang ⁇ d in blocks comprising the steps of:
  • th ⁇ r ⁇ is furth ⁇ r provid ⁇ d in a database file managem ⁇ nt system for accessing data records and being ex ⁇ cut ⁇ d on data processing system; the data records are associated with a tri ⁇ arrang ⁇ d in blocks and b ⁇ ing stor ⁇ d in a storag ⁇ medium; the trie enables accessing or updating the data records by key or k ⁇ ys and being susceptible to an - 22 -
  • Th ⁇ ind ⁇ x is pref ⁇ rably, although not necessarily constructed by on ⁇ or mor ⁇ of th ⁇ ind ⁇ xing schemes sel ⁇ ct ⁇ d from the specified index schem ⁇ s.
  • Typical, y ⁇ t not exclusive, examples of multi-way trees indexes being the B-tre ⁇ ind ⁇ xing sch ⁇ m ⁇ .
  • said basic partitioned search scheme being a tri ⁇ that is constituted by a digital tre ⁇ of th ⁇ .kind disclosed in U.S patent no. 5,495,609.
  • said trie is constituted by a so called Probabilistic Access Inde.xing File (PACF).
  • PAF Probabilistic Access Inde.xing File
  • a data structure that includes at least one probablistic access indexing file (P.AIF) having a plurality of nodes and links; the l ⁇ av ⁇ nodes of said P.AIF are associated each with at least one data record accessibl ⁇ to said user application program and wherein at least portion of said data record constitutes at least one search-k ⁇ y; sel ⁇ ct ⁇ d nodes in said PLAF represent, each, a given offset of a search key portion within said inset s ⁇ arch k ⁇ y; link(s) originat ⁇ d from ⁇ ach given node from among said selected nodes, represent, each, a unique valu ⁇ of said search key portion; the PLAF having at least two sub-PIAF's being arrang ⁇ d, each, in a block; - 23 -
  • said data base file managem ⁇ nt syst ⁇ m is furth ⁇ r capable of arranging said blocks as a balanced structure of blocks.
  • one or more of said nodes may include other information, such as portions of the keys and/or other information, all as requir ⁇ d and appropriat ⁇ .
  • the indexing schem ⁇ is constituted by a search scheme substantially identical to that of the PAIF trie.
  • a database fil ⁇ manag ⁇ m ⁇ nt system that employs a layer ⁇ d index of the invention is advantageous, in terms of enhanced perfoimance as compared to hitherto .
  • known techniques inter alia owing to the following characteristics:
  • the proposed layered index constitutes an advantage ov ⁇ r ⁇ .g. hashing scheme and some implem ⁇ ntations of digital trees.
  • furth ⁇ r provides for in a computer system having a storage medium of at least an internal m ⁇ mory that rang ⁇ s b ⁇ tween 10 to 20
  • Th ⁇ inv ⁇ ntion furth ⁇ r provides for In a computer system having a storage medium, a data structure that includes an index over th ⁇ k ⁇ ys of data r ⁇ cords; th ⁇ index is arranged in a balanced structure of blocks and enables to perform sequ ⁇ ntial op ⁇ rations on said data records; the index siz ⁇ is ⁇ ss ⁇ ntially not aff ⁇ cted from the size of said k ⁇ ys.
  • the data records may resid ⁇ in th ⁇ blocks of th ⁇ lay ⁇ red index, or may reside in separate data files (one or more). In th ⁇ latter embodiment the data records should be associated, of course, to the corre- - 25 -
  • a given data record may accommodate more than one search key.
  • Th ⁇ ind ⁇ x is pr ⁇ ferably, although not necessarily constructed by one or more of the ind ⁇ xing sch ⁇ m ⁇ s s ⁇ l ⁇ ct ⁇ d from the specified index schemes.
  • normally data consists of records of several types (e.g. in the exampl ⁇ abov ⁇ books and borrowers).
  • the type of the record determines its fields (attributes) and its keys.
  • th ⁇ typ ⁇ of each key is not kept with the r ⁇ cord and not considered part of the key.
  • Th ⁇ program "k. nows" th ⁇ typ ⁇ of the record, and therefrom the fields of the data records and their structure.
  • Each typ ⁇ of key is assigned with a designator — a string of bits, e.g. a series of one or more characters which, normally but not necessarily, (is) are add ⁇ d as a prefix to all keys of this type.
  • a designated key is a key with its designator.
  • the designator is treated as part of the key (for search or update purposes), and ther ⁇ for ⁇ is part of the index schem ⁇ .
  • th ⁇ d ⁇ signator of th ⁇ k ⁇ y by looking at th ⁇ d ⁇ signator of th ⁇ k ⁇ y, on ⁇ obtains th ⁇ d ⁇ signator h ⁇ nc ⁇ can d ⁇ duc ⁇ th ⁇ typ ⁇ of th ⁇ r ⁇ cord, on ⁇ need not .know the record type a priori.
  • Data records in which th ⁇ k ⁇ ys ar ⁇ d ⁇ signat ⁇ d ar ⁇ called designated data records.
  • a designated index is an index that enabl ⁇ s s ⁇ arch on designated data records.
  • th ⁇ r ⁇ follows a d ⁇ scription of another feature according to the second aspect — subordination of data records.
  • the designated key of R2 is the composite key K1',K2' , where K2' consists of th ⁇ k ⁇ y K2 pr ⁇ fix ⁇ d by a designator D2.
  • the subordination relationship is ⁇ xt ⁇ nd ⁇ d also to r ⁇ cords. If K2 is subordinated to Kl, the designator of K2' is D2 and the designator of R2 is also D2 (or Dl, D2). If R2 is subordinated to Rl, the key of R2 is composed by concatenating K2' to Kl . Note that in K2', D2 is prefixed to K2.
  • the type of record Rl and the type of r ⁇ cord R2 may stand in a one-to-many relationship, meaning that several records of type R2 may be related to a single record of type Rl.
  • Such a relation can be implem ⁇ nt ⁇ d by th ⁇ subordination r ⁇ lation: s ⁇ v ⁇ ral records of type R2 will be subordinat ⁇ d to a singl ⁇ r ⁇ cord of typ ⁇ ( ⁇ .g., s ⁇ v ⁇ ral books can b ⁇ borrow ⁇ d by th ⁇ sam ⁇ borrow ⁇ r).
  • this relationship is one-to-one (e.g.
  • th ⁇ subordinat ⁇ d record can itself have a record subordinated to it and accordingly n level of subordination may be accomplished.
  • ⁇ xampl ⁇ consider a banking database, wher ⁇ th ⁇ account r ⁇ cords are subordinated to the branch r ⁇ cords, and deposits records ar ⁇ subordinated to accounts.
  • l ⁇ t R b ⁇ a r ⁇ cord that is id ⁇ ntifi ⁇ d by ⁇ ith ⁇ r of two k ⁇ ys Kl and K2.
  • Th ⁇ n, th ⁇ designatored index should contain two search paths to R, one by the designated key Kl' and one by th ⁇ d ⁇ signat ⁇ d k ⁇ y K2'. Accordingly, R constitutes a multi-dimensional record.
  • a multi-dimensional index includes the desisnated index and the - 28 -
  • the above discussion and exampl ⁇ considered a multi-dimensional index wher ⁇ the data records do not necessarily exhibit subordination relationship.
  • the multidimensional index may optionally applied also to subordinat ⁇ d data r ⁇ cords.
  • For ⁇ xampl ⁇ consider a banking database, where the d ⁇ posits ar ⁇ subordinat ⁇ d to both accounts and depositors.
  • a single designated index provides access to accounts (by the designated key k x account-number), to depositors (by the d ⁇ signator ⁇ d k ⁇ y & 2 ' depositor-name) and to deposits by both k x k 2 and k 2 k (It is possible, of course, to use differ ⁇ nt designators for the k x when it is subordinated to k 2 and to k 2 when it is subordinated to k .)
  • the d ⁇ signator of a car r ⁇ cord (FIAT, 127) is A wh ⁇ n s ⁇ arching or updating th ⁇ r ⁇ cord by th ⁇ k ⁇ y AFIAT, and is B wh ⁇ n accessing it via the license plate number B 127.
  • the meta-data includes info ⁇ nation on the differ ⁇ nt r ⁇ cords as a function of th ⁇ ir typ ⁇ . Thus, it is needed to identify the designator and as a result the - 29 -
  • Th ⁇ s ⁇ arch scheme in the designated index is oblivious to the meta-data. It locates th ⁇ record, identifi ⁇ s th ⁇ d ⁇ signator (for ⁇ xample the designator can be prefixed to the record) and construct the (composite) designated key.
  • a data structure that includes: an index over the keys of data records; the data records b ⁇ ing of at l ⁇ ast two typ ⁇ s where data records of the s ⁇ cond typ ⁇ ar ⁇ subordinat ⁇ d to th ⁇ data r ⁇ cords of the first type.
  • ther ⁇ is provid ⁇ d in a storag ⁇ medium used by a database file management system executed on data processing system, a data structure that includes: a designat ⁇ d ind ⁇ x over designat ⁇ d k ⁇ ys of data records; the data r ⁇ cords, constituting designated data records, b ⁇ ing of at l ⁇ ast two types wher ⁇ d ⁇ signat ⁇ d data r ⁇ cords of th ⁇ s ⁇ cond typ ⁇ ar ⁇ subordinat ⁇ d to th ⁇ d ⁇ signat ⁇ d data r ⁇ cords of the first type.
  • the data structure that includes designated index and designat ⁇ d data can maintain the relations b ⁇ tw ⁇ n diff ⁇ rent data items.
  • the data structure that includes designated index and designat ⁇ d data can link logically related items.
  • the data structure that includes designated index and designat ⁇ d data can support s ⁇ v ⁇ ral data models simultaneously and efficiently.
  • the data structure that includes designat ⁇ d ind ⁇ x and d ⁇ signat ⁇ d data allows high efficiency in r ⁇ tri ⁇ ving relating data.
  • the data records may constitute part of the PAIF, or may resid ⁇ in on ⁇ or mor ⁇ s ⁇ parat ⁇ data fil ⁇ s.
  • th ⁇ latt ⁇ r ⁇ mbodim ⁇ nt th ⁇ data records should be linked, of course, to the corresponding P.AIF.
  • a giv ⁇ n data r ⁇ cord may accommodate more than one s ⁇ arch k ⁇ y.
  • a data structure that includes: an index being stored in the storage medium and constructed over the keys of said data r ⁇ cords that ar ⁇ stored in blocks; the index being arranged in blocks with th ⁇ l ⁇ af blocks being linked to data records by means of links; said index is characteriz ⁇ d in that at l ⁇ ast on ⁇ of said links is shared by at least two data records stored in th ⁇ same block.
  • the index b ⁇ ing constituted by a trie.
  • the invention provides for, in a storage medium used - 31 -
  • a data structure that includes: an index b ⁇ ing stored in a storag ⁇ m ⁇ dium and constructed over the keys of said data records that ar ⁇ stor ⁇ d in blocks; the index being arranged in blocks with the leaf blocks being link ⁇ d to data r ⁇ cords by means of links; said index is charact ⁇ riz ⁇ d in that at l ⁇ ast on ⁇ of said links is shared by at least two data records stored in the sam ⁇ block; said ind ⁇ x constituting a lay ⁇ r ⁇ d index according to claim 1, and blocks of said basic partitioned index ar ⁇ linked to said data records.
  • Fig. 1 shows a generalized block diagram of a system employing a database file management system
  • Fig. 2 shows a sampl ⁇ databas ⁇ structure r ⁇ pr ⁇ s ⁇ nt ⁇ d as an Entity R ⁇ lationship Diagram (ERD), and serving for illustrative purposes;
  • ERP Entity R ⁇ lationship Diagram
  • Fig. 3 shows the database of Fig. 2, represented as tables in accordance with the relational data model, with each table holding few data occurrences;
  • Fig. 4 shows the "CLIENT" table of Fig. 3, in accordance with file managem ⁇ nt syst ⁇ m employing conventional B + tre ⁇ ind ⁇ x sch ⁇ m ⁇ ;
  • Fig. 5 shows th ⁇ "CLIENT" tabl ⁇ of Fig. 3, in accordance with file manag ⁇ m ⁇ nt syst ⁇ m employing conventional trie index scheme;
  • Figs. 6A-6C show the "CLIENT" table of Fig. 3, in accordance with file managem ⁇ nt system employing a P.AIF index scheme; - 32 -
  • Figs. 7A-7H show schematic illustrations ex ⁇ mplifying construction of a lay ⁇ r ⁇ d ind ⁇ x, according to on ⁇ ⁇ mbodim ⁇ nt of th ⁇ inv ⁇ ntion;
  • Figs. 8A-B show schematic illustrations ex ⁇ mplifying construction of a lay ⁇ r ⁇ d ind ⁇ x, according to y ⁇ t another embodim ⁇ nt of th ⁇ invention
  • Figs. 9A-G show schematic illustrations ex ⁇ mplifying construction of a lay ⁇ r ⁇ d ind ⁇ x, according to y ⁇ t another ⁇ mbodim ⁇ nt of th ⁇ invention
  • Figs. 10A-B show schematic illustrations exemplifying construction of a layered index, according to another embodim ⁇ nt of the invention.
  • Fig. 11 shows a schematic illustration exemplifying construction of a layered index, according to still yet another ⁇ mbodim ⁇ nt of th ⁇ inv ⁇ ntion;
  • Fig. 12 shows a schematic illustration for exemplifying use of designators in a designated index in accordance with one embodiment of the invention
  • FIG. 13A-E show five schematic illustrations for exemplifying feature of subordination of data r ⁇ cords in a d ⁇ signat ⁇ d ind ⁇ x in accordance with one embodim ⁇ nt of th ⁇ inv ⁇ ntion;
  • Fig. 14 shows a schematic illustration of a designat ⁇ d ind ⁇ x ⁇ x ⁇ mplifying multi-dimension record according to an embodim ⁇ nt of the invention
  • Fig. 15 shows a schematic illustration of a designated index according to another embodiment of the invention.
  • Fig. 16 shows a schematic illustration for ex ⁇ mplifying feature of relations among data records provided in accordance with one embodiment of the invention
  • FIG. 17A-B show a schematic illustration of compress ⁇ d repres ⁇ ntation of links to data records in accordance with one embodiment of the invention
  • Fig. 18A-D show four benchmark graphs demonstrating the enhanced performance, in terms of response tim ⁇ and fil ⁇ siz ⁇ , of a databas ⁇ utilizing a fil ⁇ manag ⁇ m ⁇ nt system of the invention vs. commercially available Ctre ⁇ based database; and - 33 -
  • Fig. 19A-D show four b ⁇ nchmark graphs demonstrating the enhanced performance, in terms of r ⁇ spons ⁇ time and file size, of a databas ⁇ utilizing a file management system of the invention vs. commercially available Btree based database.
  • a gen ⁇ ral purpos ⁇ computer e.g. a p ⁇ rsonal computer (P.C.) employing a Pentium microprocessor 3 commercially available from Intel Co.rp. U.S.A, has an operating system module 5, ⁇ .g. Windows NT ® commercially available from Microsoft Inc. U.S.A., which communicates with processor 3 and controls the overall operation of computer 1.
  • P.C. p ⁇ rsonal computer
  • U.S.A has an operating system module 5, ⁇ .g. Windows NT ® commercially available from Microsoft Inc. U.S.A., which communicates with processor 3 and controls the overall operation of computer 1.
  • P.C. 1 further accommodates a plurality of user application programs of which only thre ⁇ 7, 9 and 11, r ⁇ sp ⁇ ctiv ⁇ ly ar ⁇ shown.
  • Th ⁇ us ⁇ r application programs ar ⁇ ⁇ x ⁇ cut ⁇ d by processor 3 under the control of operating system 5, in a .known per se manner, and are responsive to user input f ⁇ d tlirough keyboard 13 by the intermediary of I/O port 15 and th ⁇ op ⁇ rating syst ⁇ m 5.
  • the user application programs further communicate with monitor 16 for displaying data, by the intermediary of I/O port 17 and operating system 5.
  • the user application programs can access data stored in a database by means of database managem ⁇ nt system module 20.
  • the gen ⁇ raliz ⁇ d database management system includes high l ⁇ v ⁇ l manag ⁇ m ⁇ nt system 22 which views, as a rule, the und ⁇ rlying data in a "logical" manner and is responsive, to th ⁇ us ⁇ r application program by means .known per se such as, e.g., SQL Data Definition and Data Manipulation language (DDL and DML).
  • the databas ⁇ manag ⁇ m ⁇ nt syst ⁇ m typically exploits, in a .known per se manner, a data dictionary 24 that includes meta-data which maintains information on the underlying data.
  • DDL and DML SQL Data Definition and Data Manipulation language
  • Th ⁇ underlying structure of th ⁇ data is gov ⁇ rn ⁇ d by databas ⁇ file management system 26 which is associated with the ind ⁇ x sch ⁇ m ⁇ and actual data r ⁇ cords 28.
  • Th ⁇ "high-l ⁇ v ⁇ l” logical instructions e.g. SQL commands
  • Th ⁇ high-l ⁇ v ⁇ l manag ⁇ m ⁇ nt system 22 are converted into "lower level” commands that access or update the data records that are stored in the database file(s) and to this ⁇ nd th ⁇ databas ⁇ file managem ⁇ nt system considers the actual structure and organization of the data records.
  • the "high lev ⁇ l” and “low level” portions of the database file management system can communicate through a known per s ⁇ Application Programmers Interface (.API), e.g. the Microsoft op ⁇ n databas ⁇ connectivity (ODBC) interface commercially available from Microsoft.
  • .API Application Programmers Interface
  • ODBC Microsoft op ⁇ n databas ⁇ connectivity
  • the utilization of the ODBC enables "high lev ⁇ l” modules of the database fil ⁇ manag ⁇ m ⁇ nt syst ⁇ m or application program to transparently communicate with differ ⁇ nt "database file manag ⁇ m ⁇ nt systems" that support the ODBC standard.
  • Fig. 1 further shows, schematically, a storage medium in the form of internal memory module 29 ( ⁇ .g. 16 M ⁇ ga byt ⁇ and possibly ⁇ mploying a cache memory sub-module) and an ⁇ xt ⁇ rnal m ⁇ mory modul ⁇ 29' ( ⁇ .g. 1 gigabyt ⁇ ).
  • internal memory module 29 ⁇ .g. 16 M ⁇ ga byt ⁇ and possibly ⁇ mploying a cache memory sub-module
  • an ⁇ xt ⁇ rnal m ⁇ mory modul ⁇ 29' ⁇ .g. 1 gigabyt ⁇
  • ⁇ xt ⁇ rnal m ⁇ mory 29' is accessed through an ext ⁇ rnal, relatively slow communication bus (not shown), whereas the internal memory is normally accessed by means of a faster internal bus (not shown).
  • the internal memory is normally accessed by means of a faster internal bus (not shown).
  • database management system utilizes operating system services (i.e. an I/O operation) in order to load, through the ext ⁇ rnal communication bus, on ⁇ or mor ⁇ blocks of data from the eternal to the int ⁇ mal memory. If the sought data records are not found in the loaded blocks, additional I/O operations are requir ⁇ d until the sought data records are targeted.
  • operating system services i.e. an I/O operation
  • Comput ⁇ r 1 may serve as a workstation forming part of a L ⁇ AN Local .Area Network (L.AN) (not shown) which employs a server having also ess ⁇ ntially th ⁇ same structure of Fig. 1.
  • L.AN Local .Area Network
  • a predominant portion of said modules (including the database r ⁇ cords th ⁇ ms ⁇ lv ⁇ s 28) reside in th ⁇ server.
  • th ⁇ databas ⁇ may be an on-line database residing in an Int ⁇ m ⁇ t W ⁇ b sit ⁇ .
  • Th ⁇ invention is, of course, not limited to the specified partition of small internal m ⁇ mory and larg ⁇ ⁇ xternal memory.
  • a large internal and ext ⁇ rnal m ⁇ mori ⁇ s ar ⁇ employ ⁇ d and by yet another modified embodiment only internal m ⁇ mory is ⁇ mployed.
  • the ERD 30 of Fig. 2 consists of the entities "CLIENT” 32 and “ACCOUNT” 34 as well as an "n to m" "DEPOSIT" 36 relationship indicating that a given client may have more than one account and by th ⁇ sam ⁇ tok ⁇ n a giv ⁇ n account may be owned by more than one client.
  • the entity “CLIENT” has the following attributes (fields): "Client_Id” 38 b ⁇ ing a k ⁇ y attribute that uniquely identifies each client, "Name” 39 standing for the client's name and "Address” 40 standing for the client's address.
  • the ⁇ ntity “ACCOUNT” has th ⁇ following attribut ⁇ s (fi ⁇ lds): "Acc_No” 42 b ⁇ ing a key attribute that uniquely identifi ⁇ s ⁇ ach account, and "Balance” 43 holding the balance of the account.
  • the relationship “DEPOSIT” consists of pairs of keys of the "CLIENT” and “ACCOUNT” entities, such that each pair is indicative of particular account owned by specific client.
  • Fig. 3 ther ⁇ is shown a databas ⁇ of Fig. 2, r ⁇ pr ⁇ s ⁇ nt ⁇ d as three tables 50, 51 and 52 corresponding to th ⁇ relational data model, 32, 34 and 36, r ⁇ sp ⁇ ctiv ⁇ ly, with ⁇ ach tabl ⁇ holding a few data occurrenc ⁇ s for illustrative purposes.
  • the length of the key field ("Client D") of the "CLIENT” table is 5 digits
  • the l ⁇ ngth of the key field (“AccJD”) of the "ACCOUNT" tabl ⁇ is 6 digits.
  • Th ⁇ client table holds 5 data occurrences 55-59
  • th ⁇ account tabl ⁇ holds 2 data occurrences 65, 66 and the deposit table holds 3 data occurrences 70-72.
  • Fig. 4 illustrates an und ⁇ rlying ind ⁇ xing fil ⁇ of th ⁇ "CLIENT" tabl ⁇ of Fig. 3, in accordance with file managem ⁇ nt syst ⁇ m ⁇ mploying th ⁇ conventional B- tre ⁇ indexing schem ⁇ .
  • the indexing file 80 consists of three blocks 80a-c, standing for a root block and two leaf blocks respectively.
  • the data records are organized randomly in a separat ⁇ file 81 holding the five data records 83-87.
  • Each block consists of a succession of pair of fields (e.g. 82a-b and 83a-b in block 80a).
  • first fi ⁇ ld stands for a s ⁇ arch k ⁇ y value
  • the second field stands for a link such as number that identifies the next block to s ⁇ arch, or in the case of a leaf block a link to the data record such as a number identifying the data record.
  • a s ⁇ arch for a r ⁇ cord whos ⁇ k ⁇ y is 12355 (82a) starts in root block 80a and is dir ⁇ ct ⁇ d by th ⁇ link 82b to block 80b.
  • the search key 12355 (86a) is associated with link 86b indicating the address of the data record identifi ⁇ d by this s ⁇ arch k ⁇ y in th ⁇ data file 81.
  • the data record that is identified by search key "12355" (57 in Fig. 3) is the forth in order in data file 81.
  • the B ' tre ⁇ ind ⁇ xing fil ⁇ of Fig. 4 ⁇ xhibits on ⁇ of the significant shortcomings of this approach in that the keys (i.e. search k ⁇ ys) ar ⁇ duplicated, i.e. they are h ⁇ ld both in th ⁇ internal blocks (i.e. in the index scheme) and in the data records associated with the B- tre ⁇ ind ⁇ x.
  • th ⁇ search key of data record 57 (in Fig. 3) is not only held as an integral part of the data record 86 in fil ⁇ 81 but also in block 80b (s ⁇ arch k ⁇ y 86a) and sometimes in parent blocks such as 80a (s ⁇ arch k ⁇ y 82).
  • Fig. 5 illustrates a differ ⁇ nt ind ⁇ xing scheme of the "CLIENT" table of Fig. 3, in accordance with a file manag ⁇ m ⁇ nt syst ⁇ m ⁇ mploying a .known trie indexing schem ⁇ .
  • trie indexing file 90 includes plurality of nodes and links wh ⁇ r ⁇ in each node stands for an offset position and the link stands for a value at this offset.
  • Table 91 has four columns. Th ⁇ first column indicates which digit position is to be us ⁇ d, th ⁇ s ⁇ cond column th ⁇ valu ⁇ of that digit. A digit valu ⁇ partitions the key into two subs ⁇ ts. Columns thr ⁇ and four dir ⁇ ct th ⁇ s ⁇ arch procedure to the next step.
  • a digit at the position indicated by the root is compared to the value specified at the second column of the same line (valu ⁇ "5" indicated also by link 90b in the trie index). Since the digit at position 5 of the sought search key 12355 is inde ⁇ d 5, control is transferred to line 2 (as indicated by the third column of line 1 of table 91).
  • the digit at position 3 of the sought search key (90c in the tre ⁇ , b ⁇ ing also th ⁇ valu ⁇ of th ⁇ first column of th ⁇ s ⁇ cond lin ⁇ in tabl ⁇ 91) is compared to th ⁇ valu ⁇ 3 (link 90d, being also the second column in th ⁇ second line of the table 91). Since match occurs control is transferred to line 3 in the table.
  • the digit at position 4 of the sought search key do ⁇ s not match the value specified at the second column of line thre ⁇ (i. ⁇ . "5" vs. "4") and accordingly as indicated in the fourth column of table 91 ("not equal") a link to the sought data record 57 (86 in fig. 4) is obtained.
  • the above trie is associated with some shortcomings: it retains an ev ⁇ n distribution of th ⁇ data at th ⁇ cost of knowing - 39 -
  • a n ⁇ w trie index schem ⁇ d ⁇ signat ⁇ d P.AIF As will be shown below, the PAIF is not confined to a tre ⁇ structure.
  • various embodim ⁇ nts of lay ⁇ red index are described, with reference to FIG. 7-9, which include representative index constructed over th ⁇ representative keys of the PAIF.
  • th ⁇ ind ⁇ x scheme of the representative index and that of the basic partitioned index being substantially th ⁇ sam ⁇ PAIF.
  • th ⁇ r ⁇ is d ⁇ scrib ⁇ d y ⁇ t another embodim ⁇ nt of th ⁇ lay ⁇ r ⁇ d ind ⁇ x, with a diff ⁇ r ⁇ nt tri ⁇ .
  • This, how ⁇ v ⁇ r, is not obligatory and as is ⁇ xemplified, ⁇ .g. with refer ⁇ nce to Fig. 11, wher ⁇ the trie and th ⁇ repres ⁇ ntative index are differ ⁇ nt. - 40 -
  • FIGs. 6A-C there is shown a succession of schematic illustration of th ⁇ "CLIENT" tabl ⁇ of Fig. 3, in accordance with the file management system employing the P.AIF.
  • the terms “transaction” and “operation” are used interchangeably.
  • Th ⁇ Cli ⁇ nt's data record 103 (56 in table Client of Fig. 3) having search key "12345" (i.e. a 5-byt ⁇ -long s ⁇ arch k ⁇ y).
  • Th ⁇ P.AIF of Fig. 6A (100) is, of course, trivial and consists of a single node 101 (standing for both the root nod ⁇ and th ⁇ leaf node) linked by means of a long link 102 to data record 103.
  • the data record 103 is associated with a search path being a unit that consists of a nod ⁇ 101 and a link 102 which defines an offset and a pertinent search key portion valu ⁇ that conforms to th ⁇ coir ⁇ sponding search key portion value at that particular offset within the search key of the specified data record. More specifically, th ⁇ value of the on ⁇ -byt ⁇ search-key-portion at offset 0 within search key "12345" is inde ⁇ d
  • Fig. 6B-1 ther ⁇ is shown a P.AIF 108 aft ⁇ r the termination of a successive transaction in which the data record having Cli ⁇ nt_Id_No "12445" 107 has b ⁇ n ins ⁇ rt ⁇ d (data occurrence 58 in table Client of Fig. 3).
  • Th ⁇ search keys of data r ⁇ cords 103 and 107 are distinguished only in the third byte (offset 2), i.e. "3" and "4" resp ⁇ ctiv ⁇ ly.
  • root node 101 and the link 102 are not sufficient to - 41 -
  • FIG. 6B-2 and 6B-3 illustrate other two options of realizing the PAIF of Fig. 6B-1, where in Fig. 6B-2 the full key is repr ⁇ s ⁇ nt ⁇ d in th ⁇ P.AIF ( ⁇ .g. all th ⁇ digits of th ⁇ r ⁇ cord 12445 ar ⁇ sp ⁇ cifi ⁇ d in th ⁇ links comm ⁇ ncing from th ⁇ root nod ⁇ and ending at the data record).
  • Th ⁇ latter realization is more explicit and less efficient in terms of space, as compared to the sparse realization of Fig. 6B-3 where only the nodes which ar ⁇ absolut ⁇ ly necessary appear in th ⁇ tree.
  • Other variants are, of course, applicable
  • the pref ⁇ rr ⁇ d procedure for inserting a new data record into an existing P.AIF includes th ⁇ execution of the following steps: i. advancing along a reference path commencing from the root node and ending at a data record associated to a l ⁇ af node (referred to as "reference data record"); in each node in the ref ⁇ r ⁇ nc ⁇ path, advancing along a link originated from said node if the value repr ⁇ s ⁇ nted by the link equals the value of the 1-bit-long key portion at th ⁇ offs ⁇ t sp ⁇ cifi ⁇ d by said nod ⁇ ; in th ⁇ cas ⁇ that th ⁇ offs ⁇ t sp ⁇ cified in the node is beyond any corresponding key portion in the key, or if ther ⁇ is no link with said value, advancing along an arbitrary path to any ref ⁇ r ⁇ nc ⁇ data r ⁇ cord ; - 42 -
  • th ⁇ n ⁇ w nod ⁇ is assign ⁇ d with a value of the disceming offset
  • iii.2.2 connect the ref ⁇ r ⁇ nce data record and th ⁇ n ⁇ w nod ⁇ (which now b ⁇ com ⁇ s a l ⁇ af nod ⁇ ) and assign to the link (long link) a value of the search-k ⁇ y-portion at th ⁇ discerning offset taken from the search key of th ⁇ refer- ence data record
  • iii.2.3 connect by means of a link the n ⁇ w data r ⁇ cord and the new node and assign to the link (long link) a value of the search-k ⁇ y-portion at th ⁇ discerning offset taken from th ⁇ search key of the new data record; or iii.3 if conditions iii.0,iii.1 and iii.2 are not m ⁇ t, th ⁇ r ⁇ ⁇ x
  • iii.3.2 for cas ⁇ A and B connect by means of a link (long link) the new data record and said new internal nod ⁇ ; th ⁇ valu ⁇ assign ⁇ d to th ⁇ link is that of th ⁇ s ⁇ arch-k ⁇ y-portion at the discerning offset, as taken from the s ⁇ arch k ⁇ y of th ⁇ n ⁇ w data r ⁇ cord; iii.3.3 for cas ⁇ A and B, connect by means of a new link th ⁇ n ⁇ w node and for case A - the child node, for case B - the root nod ⁇ (i.e.
  • the new node becomes for case A - a new fath ⁇ r nod ⁇ , for cas ⁇ B - a n ⁇ w root nod ⁇ ), and the value assigned to said link is the s ⁇ arch-k ⁇ y-portion at th ⁇ offs ⁇ t indicated by the new node, taken from the search key of the ref ⁇ rence data record.
  • UUH It should b ⁇ not ⁇ d that for a different reference path a different PAIF may be obtained.
  • s ⁇ arch k ⁇ y "12546" (59 in tabl ⁇ Cli ⁇ nt of Fig. 3) is inserted to the P.AIF of Fig. 6B.
  • a mov ⁇ is mad ⁇ along th ⁇ r ⁇ f ⁇ r ⁇ nce path commencing from the root 101 and ending, for ⁇ xample, at data record 103 which stands for th ⁇ "reference data record”.
  • Th ⁇ comparison op ⁇ ration stipulated in step (ii) results in that the search key of the new data r ⁇ cord in distinguished from the search key of the reference data record (103) at offsets 2 ("5" vs. "3") and 4 ("6" vs. "5"). The smallest offs ⁇ t ("discerning offset”) is therefore 2.
  • step (iii) th ⁇ condition of step iii.1 is met since th ⁇ discerning offset is ⁇ qual to that assign ⁇ d to nod ⁇ 104. Accordingly, and as is shown in Fig. 6C-1, n ⁇ w link 111 connects node 104 to th ⁇ n ⁇ w data r ⁇ cord 112. Th ⁇ value assigned to link 111 is 5, b ⁇ ing th ⁇ byt ⁇ value at position 2 in the search key of the new data record 112. P.AIF 110 of Fig. 6C-1 is therefore the result of inserting the data record 112 into the PAIF 108 ofFig. 6B-l.
  • the CLIENT data record having Client_Id (or search k ⁇ y) "12355" (57 in tabl ⁇ Cli ⁇ nt of Fig. 3) is ins ⁇ rt ⁇ d into th ⁇ P.ALF of Fig. 6B-1. Steps i and ii, stipulated above result in a ref ⁇ r ⁇ nc ⁇ path starting at nod ⁇ 101 and ⁇ nding at data r ⁇ cord 103.
  • step iii.2 the condition of step iii.2 is m ⁇ t since the discerning offset 3 is larger than the offset 2 of l ⁇ af node 104 in the ref ⁇ r ⁇ nc ⁇ search path. Accordingly, in compliance with step iii.2 J and as is shown in the resulting PAIF 120 of Fig. 6C-2, th ⁇ link 106 is disconnected from reference data record 103 and is connected to a new node 121. The new node - 45 -
  • step iii.2.2 the ref ⁇ r ⁇ nce data record 103 and the new node 121 are connected by means of new link 122.
  • the n ⁇ w link is assign ⁇ d with th ⁇ valu ⁇ 4 (being the digit value at the disceming offset 3 taken from the search key "12345" of the reference data record 103); and finally, as stipulated in step iii.2.3, the new data record 123 is connected to node 121 by means of link 124 which is assigned with the valu ⁇ "5" (b ⁇ ing th ⁇ digit at th ⁇ disceming offset 3 taken from th ⁇ s ⁇ arch k ⁇ y "12355" of th ⁇ new data record 123).
  • PAIF 120 of Fig. 6C-2 is, therefore, the result of inserting the data record 123 into the PAIF 108 of Fig. 6B-1.
  • the third and last ⁇ xampl ⁇ concerns inserting the CLIENT data record having Client_Id (or s ⁇ arch key) "H346" (55 in table Cli ⁇ nt of Fig. 3) into th ⁇ PAIF of Fig. 6B-1.
  • Applying th ⁇ afor ⁇ mentioned st ⁇ ps i and ii result in advancing from node 101 to data record 103 (in Fig. 6B) and establishing that the disceming offset is 1.
  • step iii th ⁇ condition of step iii.3 is met. Accordingly, in compliance with step iii.3 J and as is shown in the r ⁇ sulting PAIF 130 of Fig. 6C-3, th ⁇ link 102 is shift ⁇ d to a n ⁇ w int ⁇ mal node 131.
  • the new internal node 131 is assigned with the value 1 (b ⁇ ing th ⁇ discerning offset).
  • the n ⁇ w data r ⁇ cord 132 and node 131 are directly connected by means of new link 133.
  • the value assigned to link 133 is 1 (being the digit at the disceming offset 1 taken from the search key "H346" of th ⁇ new data record 132), and finally, in compliance with step iii.3.3 the new internal nod ⁇ 131 is linked to node 104 by m ⁇ ans of link 134 assign ⁇ d with th ⁇ valu ⁇ 2 (being the digit at th ⁇ discerning offset (1) taken from the search key "12345" of the reference data record 103).
  • step i.l the value of the digit "I" at the offset assigned to the root nod ⁇ (offs ⁇ t 0) of th ⁇ sought data r ⁇ cord is compared to the one assigned to link 102 (being the sole link originated from node 101). Since a match is found, control is shifted to node 131.
  • step i.l the valu ⁇ of the digit ("2") at the offset assigned to node 131 (offset 1) of the sought data record is compared to the on ⁇ assign ⁇ d to link 134.
  • a match is found so control is shifted to node 104.
  • th ⁇ value of the digit "4" at the offs ⁇ t assign ⁇ d to nod ⁇ 104 (offset 2) of the sought data record is compared for ⁇ ach link originating from mode 104.
  • the comparison results in a match for link 105 and accordingly control is shifted to data record 107.
  • the leaf node that is linked to the sought data r ⁇ cord is r ⁇ f ⁇ rr ⁇ d to as th ⁇ "targ ⁇ t node".
  • the father of the target nod ⁇ is r ⁇ f ⁇ rr ⁇ d to as th ⁇ "predecessor target node”.
  • the link that connects the pred ⁇ cessor target node to th ⁇ targ ⁇ t nod ⁇ is refeir ⁇ d to as th ⁇ "pr ⁇ d ⁇ c ⁇ ssor link” and th ⁇ link that connects the target node to a child nod ⁇ thereof (or to a data record other than the sought data r ⁇ cord) is referred to as th ⁇ "targ ⁇ t link”.
  • the latter record is searched in the PAIF according to the procedure described above. Having found the data record 132 and in compliance with step i above, the data record as well as the link 133 leading thereto ar ⁇ both d ⁇ l ⁇ t ⁇ d. Sinc ⁇ aft ⁇ r the latter del ⁇ ting st ⁇ p, the target node 131 remains only with th ⁇ sol ⁇ targ ⁇ t link 134, st ⁇ p iii and iii.l apply, and accordingly th ⁇ predecessor link 102 bypasses targ ⁇ t nod ⁇ 131 and is directly linked to the child node th ⁇ reof 104.
  • step ii.2 target node 131 and the target link 134 are del ⁇ t ⁇ d ther ⁇ by obtaining th ⁇ ?A1 ⁇ shown in Fig. 6B-1.
  • step ii.2 target node 131 and the target link 134 are del ⁇ t ⁇ d ther ⁇ by obtaining th ⁇ ?A1 ⁇ shown in Fig. 6B-1.
  • .Another Example is given with reference to the P.AIF of Fig. 6C-1.
  • the latter record is searched in the P.AIF according to the procedur ⁇ described above.
  • the data record as well as the link (111) leading th ⁇ r ⁇ to are both del ⁇ t ⁇ d.
  • .Anoth ⁇ r common primitive is the "Modify existing data record", e.g. change the home address of an existing client.
  • the "Modify” primitive is normally realiz ⁇ d by s ⁇ lectively utilizing the aforemention ⁇ d primitives. For executing a "Modify" command one should distinguish b ⁇ tw ⁇ n th ⁇ following cases:
  • the "modify” applies to a search key fi ⁇ ld (e.g. change an account - 50 -
  • each search key is represented as a series of bytes and accordingly the search procedure is perform ⁇ d by partitioning th ⁇ s ⁇ arch-k ⁇ y into s ⁇ arch k ⁇ y portions ⁇ ach consisting of at l ⁇ ast on ⁇ byt ⁇ .
  • differ ⁇ nt links in a given PAIF may be assign ⁇ d with s ⁇ arch-k ⁇ y-portions of different length as long as the resp ⁇ ctiv ⁇ s ⁇ arch-k ⁇ y-portion is .known th ⁇ corresponding node.
  • th ⁇ data r ⁇ cords are h ⁇ ld in a sorted foim according to search key. Navigating , for example, in the PAIF of Fig. 63-C (from right to left) brings about the ordered series "11346", "12345” and " 12445". This characteristics constitutes y ⁇ t anoth ⁇ r advantag ⁇ which ⁇ as ⁇ data manipulation as compared to the tree of Fig. 5 wh ⁇ r ⁇ th ⁇ data r ⁇ cords ar ⁇ not sorted. As sp ⁇ cified before, a node in the P.AIF is not necessarily classified uniquely.
  • nod ⁇ 104 is at th ⁇ sam ⁇ time a leaf nod ⁇ (link ⁇ d, by m ⁇ ans of a long link 105 to data r ⁇ cord 107) and an internal node (linked by means of a short link 106 to node 121).
  • Fig. 7A-H ther ⁇ ar ⁇ shown schematic illustrations of a layer ⁇ d index constructed in response to a succession of split block operations, according to one embodim ⁇ nt of the invention.
  • Consid ⁇ r for example a block 140 in Fig. 7 A (in the basic partitioned index) which overflows in terms of memory space.
  • a "split block" procedur ⁇ is invok ⁇ d which results in a lay ⁇ r ⁇ d ind ⁇ x 142 of Fig. 7B consisting of root block 144 and a duplicated node A' (155) linked to leaf block 146 by means of direct link 145 and by means of long link 147 to a leaf block 148.
  • the split point was sel ⁇ cted to be link 149 (fig. 7A) (her ⁇ inaft ⁇ r "split link”) th ⁇ r ⁇ by shifting nod ⁇ s A,B,E D and F to n ⁇ w block 146 and nod ⁇ s C,G,I,J,K,L and H to a n ⁇ w block 148.
  • Th ⁇ split link is pr ⁇ f ⁇ rably s ⁇ lected in ord ⁇ r to accomplish an ⁇ ss ⁇ ntially even distribution of nodes and links between the new blocks (e.g. the size of the sub P Fs that resides in blocks 148 and 146 is ess ⁇ ntially th ⁇ sam ⁇ ).
  • a father block -144 (constituting I x ) is created with a duplicated node A' (155) of the split node A (156).
  • the node is copied - 52 -
  • nodes A and C may also b ⁇ linked by means of split link marked as dashed line 150.
  • direct link 154 connects the copied nod ⁇ C 153a to th ⁇ block 148A of th ⁇ original split nod ⁇ 153 whilst th ⁇ link 155 is a far link to th ⁇ split block 148B and th ⁇ valu ⁇ of the link is as the original value of link 152 betw ⁇ n nod ⁇ s C and G b ⁇ for ⁇ (and after) the split.
  • the layer ⁇ d ind ⁇ x 151 is constituted by the trie that includes blocks 141, 148A and 148B forming and block 16 which forms a representative index over the common k ⁇ ys of th ⁇ tri ⁇ .
  • nod ⁇ A in block 141 and nod ⁇ C in block 148 A ar ⁇ optionally disconnected and lik ⁇ wis ⁇ nod ⁇ C of 148A and nod ⁇ G of 148B ar ⁇ optionally disconn ⁇ ct ⁇ d.
  • nodes A ' and C are connected in block 140 to form a (connected) trie and it is - 53 -
  • the resulting layer ⁇ d ind ⁇ x constitutes a balanced structure of blocks thereby ke ⁇ ping th ⁇ index depth to a minimum and consequ ⁇ ntly minimizing th ⁇ numb ⁇ r of accesses (normally, although not necessarily, I/O operations) that are requir ⁇ d in order to find, insert or delete a given data record.
  • the layer ⁇ d ind ⁇ x maintains substantially logarithmic function that depends on the number of records, the layer ⁇ d ind ⁇ x is mor ⁇ ⁇ ffrci ⁇ nt in t ⁇ rms of numb ⁇ r of 1 0 op ⁇ rations r ⁇ quired for access a given data r ⁇ cord as compared to the numb ⁇ r of I/O op ⁇ rations required to access a data record through the trie.
  • th ⁇ r ⁇ pr ⁇ s ⁇ ntativ ⁇ ind ⁇ x and th ⁇ tri ⁇ comply with substantially th ⁇ same index sch ⁇ m ⁇ i.e. the P.AIF.
  • substantially th ⁇ sam ⁇ sch ⁇ me it is meant that th ⁇ r ⁇ ar ⁇ som ⁇ diff ⁇ r ⁇ nc ⁇ s as will ⁇ xplain ⁇ d with r f ⁇ rence to Fig. 9G b ⁇ low.
  • Node A being the lowest ancestor node of nodes B and I, and thus a (connected) trie is formed in block 402.
  • the valu ⁇ associated with short link 414 (betw ⁇ n blocks A' and B' in block 402) is of th ⁇ sam ⁇ valu ⁇ as link 412 (b ⁇ tw ⁇ n A and B in block 405).
  • Th ⁇ valu ⁇ of th ⁇ link 415 (b ⁇ tw ⁇ n nodes A' and F) in block 402 is of the same value as that of link 413 which originates from node A in the direction ne ⁇ d ⁇ d to access node B.
  • the internal structure of block 402 is such that it allows a search to th ⁇ repres ⁇ ntativ ⁇ s of blocks 405, 406 and 407.
  • Th ⁇ direct links 416, 417 of nodes 422 and 411 ar ⁇ optionally r ⁇ tain ⁇ d since it is possible to move along direct link 418 to block 405, se ⁇ ing that node 410 is maintained in th ⁇ access path to both nodes 422 and 411.
  • Fig. 7G shows the resulting layer ⁇ d ind ⁇ x after splitting block 407 of Fig. 7F (in link 420) and Fig. 7H shows th ⁇ r ⁇ sulting lay ⁇ red ind ⁇ x aft ⁇ r splitting block 402 (in the link between nodes I' and N').
  • the resulting layered index in Fig. 7H has, as shown three layers, the first consisting of block 430, the second consisting of blocks 402 and 408 and the trie consisting of blocks 405, 407, 426 and 406. - 55 -
  • FIGs. 8A-BB showing resp ⁇ ctiv ⁇ two illustrations ⁇ x ⁇ mplifying the application of the technique of th ⁇ inv ⁇ ntion to a according to another embodim ⁇ nt of th ⁇ invention.
  • Fig. 8A illustrates a given trie structure having vertical orientation (i. ⁇ . constituting a vertical tre ⁇ ) which, as shown, is unbalanced i.e. three blocks depth (260, 261 and 262) vs. two blocks depth (260 and 264).
  • the description below does not aim at explaining the search scheme of the specified vertical tre ⁇ but ⁇ mphasiz ⁇ s only thos ⁇ aspects which are requir ⁇ d to obtain balanced layered index.
  • nev ⁇ rth ⁇ l ⁇ ss b ⁇ not ⁇ d that th ⁇ nod ⁇ s in trie structure 260 signify offsets in a half byte size. (The nodes valu ⁇ s ar ⁇ presented in hexad ⁇ cimal repres ⁇ ntation) of th ⁇ data r ⁇ cords (a-k) that ar ⁇ shown in Fig. 8A.
  • Fig. 8B illustrates one possible embodiment of the invention.
  • a repr ⁇ s ⁇ ntativ ⁇ ind ⁇ x that consists of on ⁇ block 270 (forming I / ) is constructed with the result that horizontal balanced tree is obtained having a root block 270 from which all the blocks of th ⁇ low ⁇ r l ⁇ v ⁇ l v ⁇ rtical tr ⁇ (th ⁇ latt ⁇ r constitutes the unbalanced tri ⁇ ) ar ⁇ accessed through one I/O operation.
  • the common key of block 260 (in h ⁇ xad ⁇ cimal r ⁇ pr ⁇ s ⁇ ntation of half byt ⁇ units) is 0x4, Oxl and 0x3, wh ⁇ re 0x4 stands for the most signficant bits of the byt ⁇ of the character A and Oxl stands for the least significant bits of the Character A, and Ox 3 stands for the most significant bits of the characters which reside in offset 2 of the data records.
  • block 261 can accommodat ⁇ a root nod ⁇ with valu ⁇ 8, thus, the common key, hereafter k of the block, is changed to be 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, i.e. it consists of 8 units.
  • the repres ⁇ ntative of block 261 in 11 should be changed accordingly.
  • the representative of 261 is k, even if the root nod ⁇ with th ⁇ value 8 does not exist.
  • the ind ⁇ x ov ⁇ r the common keys is accomplished in the repres ⁇ ntativ ⁇ ind ⁇ x (consisting of block 270) such that it constructs a trie that address ⁇ s th ⁇ common k ⁇ ys of th ⁇ first vertical tre ⁇ .
  • ⁇ xample in order to find data record g, one follows node 290, link 291 to node 292. Then, one advances with the dir ⁇ ct link 293 to block 261, which is associated with data record g. Th ⁇ r ⁇ sulting lay ⁇ r ⁇ d index is balanced.
  • th ⁇ r ⁇ pr ⁇ s ⁇ ntativ ⁇ k ⁇ y of a block b ⁇ ing a common k ⁇ y is th ⁇ longest prefix of all keys of th ⁇ data r ⁇ cords that can b ⁇ acc ⁇ ss ⁇ d from th ⁇ block by th ⁇ relevant index scheme.
  • th ⁇ specified prefix size (calculated in 1-bit-long units) ⁇ quals th ⁇ valu ⁇ of th ⁇ root nod ⁇ in the block (which as recalled holds offset value). If the prefix siz ⁇ is ⁇ xpr ⁇ ssed as number of bits, then the prefix size is calculated as the offset value multiplied by the 1-bit-long value. - 58 -
  • Th ⁇ r ⁇ follows now a d ⁇ scription of y ⁇ t anoth ⁇ r ⁇ mbodiment of constructing a layered index of the invention with reference to Figs. 9A-9G.
  • FIGs. 9A-9G showing a succession of modify (insert) transaction on a PAIF tre ⁇ (constituting a tri ⁇ that is susc ⁇ ptibl ⁇ to an unbalanced structure) and the so obtained layer ⁇ d ind ⁇ x.
  • the data records are shown as foiming part of the trie.
  • the actual manner in which the data records are associated to th ⁇ trie may vary dep ⁇ nding upon the particular application.
  • th ⁇ first step (Fig. 9A) record A is inserted whereafter Block 300, includes node 301 having offs ⁇ t 0, being associated to first record A through link 302, having the value 0.
  • th ⁇ tr ⁇ consists of Block 100 having only on ⁇ nod ⁇ .
  • Th ⁇ index schem ⁇ dictates that the s ⁇ arch path to data r ⁇ cord A is d ⁇ t ⁇ rmined according to value 0 at offs ⁇ t 0 as depicted on link 302 and node 301, respectively.
  • Sinc ⁇ Block 300 accommodates nodes 301 and 305, it is not required, as yet, to split the block.
  • Fig. 9D data record D is insert ⁇ d, and the structure of the block following the insert operation is shown in Fig. 9D. Since, how ⁇ v ⁇ r, th ⁇ data block cannot accommodate more than two nodes (overflow occurs), it is now required to split Block 300.
  • Fig. 9E illustrates the tre ⁇ structure after splitting.
  • link 306 is the split link with the motivation that approximately the contents of a half block will b ⁇ r ⁇ tain ⁇ d in Block 300, and th ⁇ contents of the remaining half block will b ⁇ mov ⁇ d to another block 310.
  • other links could b ⁇ likewise sel ⁇ ct ⁇ d to b ⁇ the split link.
  • block 300 in I. is replaced with two blocks 300 and
  • th ⁇ basic partition ⁇ d index of Fig. 9E consists now of two blocks 300 and 310 (which in fact constitute the unbalanced trie).
  • the split node (313) is copied to the block (312) to thereby constitute a duplicated nod ⁇ (314).
  • N ⁇ xt, th ⁇ duplicated node (314) is connected by means of direct link 316 to block 300, and the duplicated node 314 is linked by means of a far link 318, to the block 310.
  • This far link replaces th ⁇ original split link 306 that is mark ⁇ d in Fig. 9E in a dash ⁇ d lin ⁇ .
  • the value of the far link 318 is the same as the value of the split link.
  • the repr ⁇ sentative index (constituted by block 312), allows to search according to th ⁇ common k ⁇ ys of th ⁇ basic partition ⁇ d ind ⁇ x.
  • data record E is insert ⁇ d.
  • this cas ⁇ advancing in the horizontal tre ⁇ (being on ⁇ foim of the layer ⁇ d ind ⁇ x) from th ⁇ first nod ⁇ 314 of block 312 (having a value 1) is not possible by means of the far link 318 since it repr ⁇ s ⁇ nts direction 1 from nod ⁇ 314 (having a 1) valu ⁇ , and a link in direction 0 is required.
  • Ther ⁇ for ⁇ advancing by means of the direct link 316 to block 300.
  • the block that needs to be associated with the new data record is found.
  • data record F is ins ⁇ rt ⁇ d r ⁇ sulting in a tr ⁇ structure shown in Fig. 9F.
  • nod ⁇ 320 is copi ⁇ d to block 312 (d ⁇ signat ⁇ d 323 in Fig. 9G) and since it can not be linked to node 314 of block 312 (since it will not retain the correct inta-block links of th ⁇ nodes) - node 311 of block 300 is also copied to block 312 (designated 322 in Fig 9G) in order to cr ⁇ at ⁇ a (conn ⁇ ct ⁇ d) tri ⁇ that ⁇ nabl ⁇ s to s ⁇ arch by th ⁇ s ⁇ arch sch ⁇ m ⁇ to blocks 300, 326, 310 according to the common keys of the blocks.
  • Figs. 9A-G and 8A-B illustrate two of many possibl ⁇ mann ⁇ rs of r ⁇ alizing th ⁇ split block mechanism that maintains the balance structure of th ⁇ inv ⁇ ntion by constructing a lay ⁇ r ⁇ d ind ⁇ x.
  • the flexibility in adopting another non-limiting variant is shown e.g. in fig. 8B where the near link 271 and - 61 -
  • direct link 272 are r ⁇ pr ⁇ s ⁇ nt ⁇ d by far link 273 (marked in dash ⁇ d lin ⁇ ) with direction as of link 271 r ⁇ nd ⁇ ring thus nod ⁇ 276 redundant.
  • th ⁇ balance technique of the invention confers to the so obtained balanced horizontal oriented digital tre ⁇ (b ⁇ ing one form of the layer ⁇ d index structure) a so called “probabilistic access " characteristics.
  • a s ⁇ arch in connection with an input data record e.g. search for a data record A
  • Fig. 9E For a better understanding of the foregoing consider, for exampl ⁇ , Fig. 9E.
  • Th ⁇ s ⁇ arch path will follow nod ⁇ 314 and link 318 (offs ⁇ t 1 value 1, resp ⁇ ctiv ⁇ ly) and th ⁇ n at offs ⁇ t '6' (root nod ⁇ of block 310) through link 319 (valu ⁇ ' 1 ') to data r ⁇ cord C.
  • Th ⁇ latt ⁇ r example ex ⁇ mplifi ⁇ s the probabilistic search characteristics of the so obtained layer ⁇ d index.
  • the size of th ⁇ common prefix of the k ⁇ y of the sought data record and th ⁇ k ⁇ y of the data record is calculated.
  • the common k ⁇ y of th ⁇ block (310) is the prefix portion of th ⁇ k ⁇ y of th ⁇ actual data record C.
  • the size of the common prefix is zero.
  • the search path follows the direct link from a node with the larg ⁇ st valu ⁇ on the search path (that maintains a direct link).
  • a comparison to the common k ⁇ y (if availabl ⁇ ) or to data r ⁇ cords associated with nodes (if available) can lead to a decision as to wh ⁇ th ⁇ r or not to advance by the index schem ⁇ or to return to a node with a direct link. It should b ⁇ not ⁇ d that th ⁇ common k ⁇ y is not n ⁇ c ⁇ ssarily physically attached to the data records.
  • the criterion to .know that the sought data record does not reside in the tre ⁇ is that th ⁇ siz ⁇ of th ⁇ common k ⁇ y pr ⁇ fix of th ⁇ sought data record and the common key of the block is greater than the valu ⁇ of the split node.
  • th ⁇ value of the split nod ⁇ is 1 (of nod ⁇ 313), thus block 310 is not th ⁇ block that accommodates record L (if such record exists). Therefore, the s ⁇ arch for record L is continued from nod ⁇ 314 and link 316. This proc ⁇ dur ⁇ appli ⁇ s to all modify transactions.
  • block 300 is found in th ⁇ mann ⁇ r sp ⁇ cifi ⁇ d abov ⁇ and is associated with the new data record L.
  • Figs. 7 to 9 exemplified a layer ⁇ d index utilizing a P.AIF based indexing scheme as the basic partitioned index and th ⁇ r ⁇ pr ⁇ s ⁇ ntativ ⁇ ind ⁇ x . Thos ⁇ v ⁇ rs ⁇ d in th ⁇ art will readily appreciate that the layered index of the invention is not bound only to PIAF. Thus, for exampl ⁇ , U.S. 5,495,609 illustrat ⁇ s a diff ⁇ r ⁇ nt tri ⁇ . Consid ⁇ r, for example, the trie of Fig.
  • the layer ⁇ d ind ⁇ x of Fig. 10B brings about, thus, a balanced tre ⁇ of blocks, assuring that essentially the same number of I/O operations is requir ⁇ d to reach ⁇ ach and ⁇ v ⁇ ry data r ⁇ cord in the tree.
  • Those v ⁇ rs ⁇ d in the art will readily appreciate that pref ⁇ rably th ⁇ numb ⁇ r of I/O op ⁇ rations is a logarithmic function d ⁇ p ⁇ nding upon th ⁇ numb ⁇ r of data r ⁇ cords and the number of links originated from a block.
  • a layer ⁇ d index with 3 levels allows access to 1,000,000,000 data records.
  • ther ⁇ follows numerical example. Assuming that every block has 1000 far links. Assuming that the size of ⁇ ach far link is 4 byt ⁇ s it r ⁇ adily aris ⁇ s that the size n ⁇ d ⁇ d for r ⁇ pr ⁇ s ⁇ nting the far links is 4000 bytes. Assuming further that th ⁇ nod ⁇ s and the near links within a block occupy another 4000 byt ⁇ s, th ⁇ r ⁇ sulting block - 65 -
  • each block size is less than 10,000 bytes. For sake of discussion assuming that each block size is 20,000 bytes.
  • a layer ⁇ d index that consists of one block (e.g. block 144 in Fig. 7B) as ind ⁇ x lay ⁇ r I x and assuming that it is link ⁇ d to a thousand blocks in th ⁇ layer I. (of which only two blocks 146 and 148 are shown in
  • the layer ⁇ d ind ⁇ x amounts for a total of 1001 blocks ⁇ ach having a siz ⁇ of 20,000 byt ⁇ s. Accordingly, the total space that should be allocated for holding the blocks of the lay ⁇ r ⁇ d ind ⁇ x is about 20 m ⁇ ga byt ⁇ s. This order of size can b ⁇ ⁇ asily accommodated in the int ⁇ mal m ⁇ mory of say, for ⁇ xample, a personal computer. Assuming now that each block in I.
  • the net ⁇ ff ⁇ ct is that by utilizing a lay ⁇ r ⁇ d ind ⁇ x of th ⁇ inv ⁇ ntion (according to th ⁇ latt ⁇ r ⁇ mbodim ⁇ nt) which is wholly accommodated in the internal m ⁇ mory, a million data records can be acc ⁇ ss ⁇ d without I/O ind ⁇ x.
  • th ⁇ r ⁇ sulting layered index of fig. 10B includes two trees having vertical orientation i.e. the first tree structure consisting of blocks - 66 -
  • 159B and 159C (b ⁇ ing on ⁇ form of the basic partitioned index I. ) and second tree having one block 159A (being one form of the basic partitioned index I x ).
  • the trie index with which the technique of the invention is of concern is not confined to the search tr ⁇ disclosed in the '609 patent, and it may encompass other types of tre ⁇ s as ⁇ xplained above.
  • intra-block structure is not necessarily balanced , i.e. nodes inside block are not necessarily arranged in a balanced sfructure. Whilst this fact is s ⁇ mingly a drawback, those vers ⁇ d in the art will readily appreciate that its implications on the overall database performance are virtually insignificant. This stems from the fact that intra-block search schem ⁇ is normally p ⁇ rfo ⁇ n ⁇ d in th ⁇ fast internal memory of the computer system.
  • th ⁇ arrangement of a block within a layered index is retained in a balanced structure thereby the number of blocks in a search path is a logarithmic function dep ⁇ nding on the number of data records and refl ⁇ cts therefore the number of I/O access ⁇ s to th ⁇ ⁇ xt ⁇ rnal m ⁇ mory (an op ⁇ ration which is inherently slow) in order to load a desired block to the internal memory.
  • the offset size (in t ⁇ rms of numb ⁇ rs of bits) that is accommodated within each node may be alt ⁇ r ⁇ d, th ⁇ mann ⁇ r of realizing empty pointers (i.e. pointers that point to null - having no children) and others.
  • the latter physical realization flexibility applies also to th ⁇ int ⁇ r-block portion.
  • Th ⁇ r ⁇ tention of the index scheme for both the trie and the repr ⁇ s ⁇ ntativ ⁇ ind ⁇ x is not obligatory as will b ⁇ ⁇ x ⁇ mplifi ⁇ d with r ⁇ f ⁇ r ⁇ nc ⁇ to Fig. 11.
  • Fig. 11 illustrat ⁇ s another approach of balancing an unbalanced tre ⁇ of Fig. 8A (i. ⁇ . constructing a layered index) using a conventional B tre ⁇ as a repres ⁇ ntativ ⁇ ind ⁇ x ov ⁇ r th ⁇ r ⁇ pr ⁇ sentative keys of the unbalanced trie.
  • the so obtained horizontal orient ⁇ d balanced tre ⁇ (lay ⁇ r ⁇ d ind ⁇ x) includ ⁇ s blocks 272 at the upper level (index layer I. ), 270 and 271 at a lower lev ⁇ l (ind ⁇ x lay ⁇ r I x ) and the original blocks of the unbalanced vertical orient ⁇ d tree of Fig.
  • Th ⁇ databas ⁇ fil ⁇ management system of the invention not only copes with the drawbacks of th ⁇ conventional trie ind ⁇ xing fil ⁇ but also offers - 68 -
  • the invention is by no m ⁇ ans bound to th ⁇ sp ⁇ cifi ⁇ d storag ⁇ m ⁇ dium.
  • the storage m ⁇ dium with which th ⁇ pr ⁇ s ⁇ nt inv ⁇ ntion is applicable may also be an internal memory.
  • Th ⁇ r ⁇ follows a d ⁇ scription of the second aspect of the invention.
  • th ⁇ databas ⁇ file managem ⁇ nt system of the invention enables to address diff ⁇ r ⁇ nt typ ⁇ s of data r ⁇ cords using a singl ⁇ ind ⁇ x.
  • each data record belonging to a given typ ⁇ is associated with a given designator.
  • the latter forms part of the key of the data r ⁇ cord constituting a d ⁇ signator k ⁇ y.
  • the designator is unique for ev ⁇ ry typ ⁇ of data.
  • a data dictionary maintains meta-data information, which provides information on the data records as a function of the type of th ⁇ r ⁇ cords.
  • meta-data information provides information on the data records as a function of the type of th ⁇ r ⁇ cords.
  • th ⁇ data records it is need ⁇ d to maintain a d ⁇ signator, to b ⁇ abl ⁇ to id ⁇ ntify th ⁇ d ⁇ signator and by using th ⁇ meta-data information, to b ⁇ abl ⁇ to identify or construct the designated key as w ⁇ ll as other information such as the r ⁇ cord siz ⁇ .
  • the search schem ⁇ of the index is oblivious to the meta-data. It locates th ⁇ r ⁇ cord from th ⁇ d ⁇ signator (or composite) key without using the meta-data.
  • the meta-data is required to construct the (composite) designator key and, onc ⁇ the record is retrieved, to determine the properti ⁇ s of th ⁇ r ⁇ cord.
  • the designator -B- is identified, and information on the record designated B is available from the meta-data. For example the size of the book record, its fields and the fields that ar ⁇ th ⁇ k ⁇ y fields.
  • d ⁇ signat ⁇ d data r ⁇ cords is not bound to only on ⁇ typ ⁇ , but rather (pref ⁇ rably) mor ⁇ than on ⁇ typ ⁇ may b ⁇ tr ⁇ at ⁇ d by the designated ind ⁇ x and as will be explained b ⁇ low with subordination r ⁇ lationship.
  • data records of different types may b ⁇ addr ⁇ ssed from the same index.
  • the keys of data records that belong to different types do - 70 -
  • a layer ⁇ d index which is also a designated index based on a trie as its basic partitioned layered index of the kind d ⁇ picted in Fig. 8A.
  • Th ⁇ siz ⁇ of th ⁇ k ⁇ y of th ⁇ r ⁇ cords that b ⁇ long to th ⁇ "Borrow ⁇ r" ⁇ ntity is 6 byt ⁇ s long, whereas the size of the key of the records that b ⁇ long to th ⁇ "Book” ⁇ ntity is 5 bytes long. Inserting books to the designated index of fig.
  • th ⁇ data structure of fig. 12 that includes a designated index that address 2 types of data r ⁇ cords - data r ⁇ cords a-k which are assigned with the designator A and data records w-x which are assigned with th ⁇ d ⁇ signator B.
  • record of type X or r ⁇ cord designated X are used to describe a record having a designat ⁇ d k ⁇ y and th ⁇ designator is X.
  • th ⁇ latter example illustrated on ⁇ manner of realizing designated data (i.e. pre-p ⁇ nding as prefix a character, string or any number of bits) to the key of th ⁇ data r ⁇ cord
  • prefix a character, string or any number of bits i.e. pre-p ⁇ nding as prefix a character, string or any number of bits
  • the proposed designator may be realiz ⁇ d in any known manner provided that the designator distinguishes betw ⁇ n diff ⁇ r ⁇ nt data r ⁇ cords, treated as part of the key, and ther ⁇ fore forms part of the search.
  • wheth ⁇ r th ⁇ designator (i) forms part of the data record (or key portion), (ii) being stores elsewher ⁇ ( ⁇ .g. in a different data structure), or (iii) it may b ⁇ defined elsewh ⁇ r ⁇ , or ⁇ v ⁇ n d ⁇ fin ⁇ d oth ⁇ rwis ⁇ . .An ⁇ xampl ⁇ of th ⁇ latt ⁇ r is a trie structure that is associated with data records all of the sam ⁇ type (for exampl ⁇ , all ar ⁇ d ⁇ signat ⁇ d with a character A ).
  • data record d is access ⁇ d from node 266 by link 270.
  • the first character of data record d is A - the designator.
  • Fig. 13A illustrates a designated index 800 (in the form of PAIF) with four data records 802, 804, 806 and 808 (of which only the designator keys are shown) associated thereto.
  • the data records are all of the sam ⁇ type as readily arises from the designator 'A' that is prep ⁇ nd ⁇ d to ⁇ ach of the data records.
  • Fig. 13B ther ⁇ is shown th ⁇ PAIF 800 with new data record (812) with a composite key A12355B940201333333 (the designator of r ⁇ cord 81 is B). Th ⁇ new data record is subordinated to data r ⁇ cord 806 whos ⁇ k ⁇ y is A 12355. According to the PAIF index, node 814 indicated that the discerning offset is 6 and that the value B links to data record 812 (having the value B at offset 6).
  • Fig. 13C illustrates the PAIF 800 in which another data record 820 is inserted.
  • Data record 820 which represents another instance of B type data record that is subordinated to A typ ⁇ data r ⁇ cord (806) is inserted to th ⁇ PAIF.
  • Th ⁇ disceming offset is 11 (the value of the new node 822) and the link values ther ⁇ of are '0' and ' 1 ' to data records 812 and 820, respectively.
  • Fig. 13D illustrates the PAIF 800, where a differ ⁇ nt typ ⁇ s of records are subordinated to record 806.
  • Data record of typ ⁇ 'D' (824) b ⁇ ing subordinat ⁇ d to data record of type 'A' is linked from node 814 by link 823 having the value D.
  • the PAIF already represents data record d ⁇ signat ⁇ d B wh ⁇ r ⁇ th ⁇ latt ⁇ r is subordinat ⁇ d to th ⁇ data record designated A.
  • Fig. 13E there is shown another embodiment of the P.AIF of Fig. 13D implemented slightly differently.
  • the subordinated data records 812, 820 and 824 are repr ⁇ s ⁇ nt ⁇ d and maintained in the data file without their key prefix that is the designator k ⁇ y of the record 806 (i.e. the prefixed key A12355 is omitted).
  • data record 812 the infoimation availabl ⁇ from the meta-data according to the designator B allows to ⁇ xtract the following information: (i) identify that part of the key is missing,
  • Th ⁇ implem ⁇ ntation described above obviate the n ⁇ cessity to duplicate the repr ⁇ s ⁇ ntation of th ⁇ d ⁇ signat ⁇ d k ⁇ y of data r ⁇ cord 806 in respect of each subordinated data record (by the particular ⁇ xampl ⁇ of Fig. 13D, th ⁇ sp ⁇ cifi ⁇ d pr ⁇ fix A12355 is duplicated thre ⁇ tim ⁇ s for r ⁇ cords 812, 820 and 824).
  • Replacing the key prefix with a link can save space (if the size of th ⁇ pr ⁇ fix ⁇ d is larg ⁇ r than the representation of the link) and allows to access the record that the subordination relates to without necessitating a separat ⁇ s ⁇ arch.
  • Fig. 13D, 13E illustrate that the subornation relationship characteristics of the invention is not limited to any sp ⁇ cific realization.
  • each of the subordinated records 812, 820, 824 can hav ⁇ r ⁇ cords subordinated to it.
  • Moreov ⁇ r, th ⁇ re are som ⁇ oth ⁇ r advantag ⁇ s that ar ⁇ brought about using th ⁇ proposed technique of the inv ⁇ ntion, ⁇ .g. maintaining data int ⁇ grity.
  • an insert transaction that is applied to the PAIF 800 of Fig. 13E, of data record designated B with a composite k ⁇ y A12355B930101123456 subordinat ⁇ d to data r ⁇ cord 806 (having designated key A12355).
  • Th ⁇ s ⁇ arch leads to node 822.
  • the value at key offset 11 of the insert ⁇ d data r ⁇ cord is 0 thus r ⁇ cord 812 is accessed.
  • the search key of record 812 needs to be constructed (by accessing record 806 via link 826) and the insertion of th ⁇ n ⁇ w data record can be compl ⁇ t ⁇ d. It should be noted that th ⁇ link to r ⁇ cord 806 obviates the ne ⁇ d to conduct a separate search for record 806 by it's key in order to confirm it's exist ⁇ nc ⁇ . Thus th ⁇ maintenance of data integrity is more ⁇ ffici ⁇ nt.
  • P ⁇ rforming th ⁇ sam ⁇ data int ⁇ grity check using the sp ⁇ cified B-tre ⁇ ind ⁇ x implies considerabl ⁇ ov ⁇ rh ⁇ ad sinc ⁇ it is r ⁇ quired two phase operation.
  • a search is applied to the index of data records of type 'A' in order to find data record whose key is 12355. Only upon finding it record of type B can be insert ⁇ d (and a s ⁇ parat ⁇ index file is normally updated).
  • th ⁇ data structure of fig 20E exemplifies other advantages r ⁇ sulting from th ⁇ fact that subordinat ⁇ d data r ⁇ cords ar ⁇ link ⁇ d to th ⁇ ir "parent" r ⁇ cord. For example, if record from type A is a customer and record from type B is an invoice, it is usually needed to access the invoice details with the customer details. The link from the invoice to the customer obviat ⁇ s a separate search for the customer details. - 74 -
  • the mov ⁇ from node 814 to node 812 can be by the split link. If the split link does not exist, for exampl ⁇ in fig. 7F on ⁇ n ⁇ eds to use the link 421 of node B' (422) when it is needed to advance by link 400 from node B (423) to node E (424).
  • th ⁇ r ⁇ is shown a schematic illustration of a designat ⁇ d ind ⁇ x according to on ⁇ embodiment of th ⁇ invention.
  • the ind ⁇ x contains two s ⁇ arch paths to on ⁇ d ⁇ signated data record ("DEPOSIT" data - 75 -
  • r ⁇ cord such that the deposit can be access ⁇ d by ⁇ ach of the two composite keys - a designat ⁇ d k ⁇ y that includes the key fields account number, date and client number and a second designated key that includes th ⁇ k ⁇ y fi ⁇ lds cli ⁇ nt numb ⁇ r, dat ⁇ and account number.
  • th ⁇ account data record has a d ⁇ signat ⁇ d k ⁇ y 'A 133333' (1201)
  • Updating a d ⁇ posit for the account can b ⁇ impl ⁇ m ⁇ nted by means of designated record 203 subordinated to designated record 201.
  • the P.AIF would allow to access records 201,203 from node 207 by link 206.
  • data record 204 r ⁇ pr ⁇ s ⁇ nt s a deposit of a client.
  • the key of record 202 is B133333. Updating a deposit 204 to a client 202 can b ⁇ impl ⁇ m ⁇ nt ⁇ d by th ⁇ index 200 and node 209 linked (208) to data record 204.
  • the k ⁇ y of data r ⁇ cord 203 is. 'A133333C01019811346' (jfc, ).
  • the key of record 204 is Bl 1346D010198133333 (k. )
  • This drawback may be overcome by repr ⁇ s ⁇ nting a single DEPOSIT record as a multidim ⁇ nsion r ⁇ cord 210.
  • Data r ⁇ cord 210 (Fig. 14) is a multi-dimension record that is updated and accessed by the designat ⁇ d ind ⁇ x 200 according to the designator key k x (designator C) and according to the designator key k 2 (designator D). (note that when data record is a multi-dimension record, the designator of th ⁇ r ⁇ cord d ⁇ p ⁇ nds on th ⁇ k ⁇ y that is b ⁇ ing used) The path in the index by k x leads to nod ⁇ 207 and from that node to the designator C of record 210.
  • the information in the m ⁇ ta-data according to th ⁇ d ⁇ signator C allows to construct th ⁇ r ⁇ l ⁇ vant structure.
  • d ⁇ signator D of r ⁇ cord 210 Th ⁇ information in the meta-data according to the designator D allows to construct th ⁇ r ⁇ levant structure, for example construct a data structure that includes the key k 2 .
  • the search path defined by the search keys of r ⁇ cord 203 leads to th ⁇ first fi ⁇ ld 212 having a valu ⁇ 'C (which is th ⁇ d ⁇ signator according to s ⁇ arch key k x ).
  • the third fi ⁇ ld points to data r ⁇ cord 201.
  • Th ⁇ s ⁇ cond field 215 (having a value 'D' - which is the designator according to search key k 2 ) of th ⁇ same data structure 210 is accessibl ⁇ by s ⁇ arch path that is defined by the s ⁇ arch k ⁇ y of r ⁇ cord 204.
  • the fourth field has a link to the actual data record 202.
  • data record 210 can include other fields.
  • the inv ⁇ ntion is by no m ⁇ ans bound to a giv ⁇ n realization and accordingly the manner of realizing data record 210 as depicted in Fig 14 is only one out of many possible variants. Th ⁇ number of search paths is not limited. As had been ⁇ xplain ⁇ d above with ref ⁇ r ⁇ nce also to Fig. 13E, if the sought data record is Axxxx (i.e.
  • the specified description which provides two (and in the g ⁇ n ⁇ ral cas ⁇ at l ⁇ ast two) s ⁇ arch paths to on ⁇ physical occurrence of data records constitutes the multi-dim ⁇ nsional data structure which is a designated index that contains at least two search paths to one data record (called multi-dimension record).
  • Relation among data el ⁇ ments - Fig. 15 illustrates another feature of - 77 -
  • data record A (a book data record) has C, F, J, K and L data records subordinated thereto.
  • L on ⁇ -to-on ⁇
  • on ⁇ -to-many relations may easily be r ⁇ aliz ⁇ d.
  • Consid ⁇ r for ⁇ xampl ⁇ , that a book has many categories (L), i.e. one-to-many, howev ⁇ r, it has only on ⁇ abstract (K), i.e. one-to-one.
  • a one-to-on ⁇ data relationship is implem ⁇ nt ⁇ d by a d ⁇ signat ⁇ d (composit ⁇ ) k ⁇ y of two components: the first is th ⁇ d ⁇ signat ⁇ d k ⁇ y of its subordinating r ⁇ cord and th ⁇ s ⁇ cond is th ⁇ d ⁇ signator of th ⁇ subordinat ⁇ d r ⁇ cord (sinc ⁇ it is a on ⁇ -to-on ⁇ relation th ⁇ r ⁇ is no n ⁇ d to us ⁇ th ⁇ k ⁇ y field of the subordinated r ⁇ cord).
  • Wh ⁇ r ⁇ as a one-to-many relationship is impl ⁇ mented by a designator (composite) key whose first component is the designator key of the subordinating record, and whose second component consists of th ⁇ d ⁇ signator and k ⁇ y of th ⁇ subordinat ⁇ d record.
  • the one-to-on ⁇ r ⁇ lation b ⁇ tw ⁇ n a book and its abstract is maintained by defining th ⁇ k ⁇ y of L to be .AxxxL, wher ⁇ Axxx is th ⁇ d ⁇ signat ⁇ d key of A, L is the designator of th ⁇ k ⁇ y of record L.
  • the one-to-many relation betw ⁇ n a book and a category is maintained by defining the key of L to be AxxxLyyy, wh ⁇ r ⁇ Axxx is the designated key of A, L is the designator of the key and yyy are the key field(s) of record L.
  • the r ⁇ lational mod ⁇ l considers all data as consisting of tables. Each table consists of records of the same structure, call ⁇ d tuples. Suppos ⁇ , th ⁇ - 78 -
  • tuples consist of fields FI, F2 and F3. Each such field is a key. If k ⁇ y F2 is subordinat ⁇ to key FI, and key F3 is subordinate to key F2, we can easily construct th ⁇ tabl ⁇ : to r ⁇ trieve its tupl ⁇ s, follow the designator of key FI, and from there for each value of FI, follow th ⁇ d ⁇ signator of F2, and in th ⁇ same manner continue to F3. Each such triple defin ⁇ s a tupl ⁇ of th ⁇ table.
  • Performing the proj ⁇ ction of (F2, F3) might b ⁇ ⁇ xp ⁇ nsiv ⁇ , sinc ⁇ it requires searching all valu ⁇ s of FI first. How ⁇ v ⁇ r, if this op ⁇ ration is common, the designat ⁇ d index should also maintain the search path (F2, F3, FI).
  • the designat ⁇ d ind ⁇ x enables to repr ⁇ s ⁇ nt additional data mod ⁇ ls, including . relational database, an obj ⁇ ct oriented system, and a hierarchical database, wher ⁇ substantially no data is duplicated.
  • Th ⁇ obj ⁇ ct ori ⁇ nt ⁇ d approach considers all data as objects. Every object belongs to a class, which determines its structure and which methods (functions) can be applied to it. Th ⁇ classes are organized in a hierarchy, from which structure and method may be inherit ⁇ d. Th ⁇ obj ⁇ ct-ori ⁇ nt ⁇ d approach is ⁇ ph ⁇ m ⁇ ral — an obj ⁇ ct ⁇ xists only whil ⁇ th ⁇ program that cr ⁇ at ⁇ d it is active Objects that need to be supported for a long ⁇ r p ⁇ riod of tim ⁇ , ar ⁇ d ⁇ fin ⁇ d as persistent. Th ⁇ s ⁇ obj ⁇ cts are stor ⁇ d on th ⁇ disk and ar ⁇ availabl ⁇ to - 79 -
  • the multi-model d ⁇ signat ⁇ d ind ⁇ x can easily support such object. Since their structure is uniformly encoded with the aid of designators, later incarnations of the program as well as other programs can access thes ⁇ p ⁇ rsist ⁇ nt obj ⁇ cts. Not ⁇ that at th ⁇ sam ⁇ time a persist ⁇ nt object can also be part of a relational table. Th ⁇ r ⁇ is no n ⁇ d to duplicate data.
  • th ⁇ relational approach considers all data as tables.
  • the object-relational approach provides an int ⁇ rfac ⁇ to convert tables to objects.
  • the int ⁇ rfac ⁇ requires the user to sp ⁇ cify th ⁇ r ⁇ lationship between the obj ⁇ cts and the table attribut ⁇ s. If som ⁇ attributes thems ⁇ lves are tables, we n ⁇ d to allow relational algebra operations on thes ⁇ tabl ⁇ s too. Th ⁇ s ⁇ conversions are performed by the application program.
  • Th ⁇ databas ⁇ is unabl ⁇ to optimize the queri ⁇ s.
  • the application program's queri ⁇ s are - 80 -
  • a claim can be efficiently access ⁇ d both from th ⁇ customer object and the policy object and being from a typ ⁇ structured as for example in fig.16 (structure 210).
  • the object-orient ⁇ d approach allows users to add user-d ⁇ fin ⁇ d typ ⁇ s (UDT) and us ⁇ r-d ⁇ fmed functions (UDF).
  • UDT user-d ⁇ fin ⁇ d typ ⁇ s
  • UDF us ⁇ r-d ⁇ fmed functions
  • the relation b ⁇ tween the photo data to the claim is handled in the same manner as with built in classes and relations.
  • the new UDT can be bas ⁇ d on or b ⁇ related (by subordination) to any other data type.
  • th ⁇ application can navigate to the new UDT from the defin ⁇ d classes from which the new UDT can inherent m ⁇ thods and other properties.
  • wh ⁇ n navigating in the index one would navigate to a claim from which on ⁇ could reach the photo as well as any other part of the claim's data.
  • the network and hierarchical models have be ⁇ n r ⁇ plac ⁇ d by th ⁇ relational model. However, even though these models are obsolete, they have some advantages (as well as many disadvantages) over the tabl ⁇ -ori ⁇ nt ⁇ d impl ⁇ m ⁇ ntation. Onc ⁇ a r ⁇ cord is retrieved the addr ⁇ ss ⁇ s of related records are readily available.
  • the B-tre ⁇ implementation requires us to maintain two tre ⁇ s: on ⁇ of th ⁇ customers and home address ⁇ s, and th ⁇ s ⁇ cond of loans and customers.
  • th ⁇ s ⁇ cond of loans and customers For having retriev ⁇ d the data of a loan, the names of the customers that - 82 -
  • the proposed multi-model designat ⁇ d ind ⁇ x (such as for example in fig. 16), once reaching the node repr ⁇ s ⁇ nting th ⁇ loan , on ⁇ can continue to a designator that identifies the customers that took that loan (for exampl ⁇ r ⁇ cords of typ ⁇ B). Normally, at most on ⁇ disk access is required for each customer.
  • the proposed multi-dimensional d ⁇ signat ⁇ d ind ⁇ x has the advantages of the network model, without its disadvantages. While the network model treated each node separat ⁇ ly, and was susceptible to long search paths, the multi-model designat ⁇ d index treats all data uniformly and the length of the search paths in probably logarithmic such that the bas ⁇ of th ⁇ logarithm is th ⁇ block siz ⁇ . Thus, in practice, the search requir ⁇ s a singl ⁇ disk access.
  • Th ⁇ client-serv ⁇ r model enabl ⁇ s ⁇ ffici ⁇ nt impl ⁇ m ⁇ ntations of th ⁇ relational model.
  • the server central computer
  • clients oth ⁇ r computers
  • an application n ⁇ ds data it formulat ⁇ s an SQL qu ⁇ iy, which is sent by th ⁇ cli ⁇ nt to th ⁇ s ⁇ rv ⁇ r.
  • Th ⁇ s ⁇ rv ⁇ r evaluates the query and returns the resulting tabl ⁇ to th ⁇ client.
  • the interface betw ⁇ n the client and the serv ⁇ r is via SQL queries — the serv ⁇ r is unaware of th ⁇ int ⁇ mal data structures and code of the application.
  • Th ⁇ designated index allows to apply the client-s ⁇ rver approach for the object-oriented and object-relational models.
  • the application program sends the path of k ⁇ ys and link d ⁇ signators leading to the desir ⁇ d nod ⁇ to the server. Based on this data the server can fulfill the request without any .knowledg ⁇ of th ⁇ data structure of the application program.
  • the client and the s ⁇ rv ⁇ r should agr ⁇ on th ⁇ nam ⁇ s of th ⁇ f ⁇ lds and th ⁇ ir d ⁇ signators.
  • Th ⁇ s ⁇ rv ⁇ r n ⁇ d not be aware of the type of data of each such field, and its semantic content.
  • On ⁇ of the most important f ⁇ atur ⁇ s of a tri ⁇ bas ⁇ d data structure is the modest size of its representation.
  • the PAIF for example maintains ev ⁇ n smaller size than a conventional trie b ⁇ caus ⁇ of it's compr ⁇ ss ⁇ d r ⁇ pr ⁇ sentation.
  • the last lev ⁇ l of the P.AIF index contains a trie with links that point to other trie nodes in th ⁇ sam ⁇ block, and links that point to r ⁇ cords.
  • Th ⁇ ind ⁇ x contains exactly N pointers to these records. If each pointer r ⁇ quir ⁇ s 4 byt ⁇ s, the size needed for the pointers is 4N bytes. In addition, each pointer has a direction, (1 byt ⁇ ) thus the total is 5N bytes.
  • n ⁇ N - l trie nodes Let d denote the av ⁇ rag ⁇ numb ⁇ r of children of a trie nod ⁇ th ⁇ n n ⁇ N l ⁇ d - ⁇ ) . Sinc ⁇ in practice d » 2 , n « N . Each trie node has a l ⁇ vel numb ⁇ r (1 byt ⁇ ). Sinc ⁇ each trie node has at most one incoming tri ⁇ link, th ⁇ r ⁇ ar ⁇ at most n - 1 tri ⁇ links, ⁇ ach tri ⁇ link has a label, which is a single character and an intra-block pointer (1 byte), thus a total of 3n bytes. Thus in the worst cas ⁇ it is n ⁇ d ⁇ d 3n + 4N ⁇ IN byt ⁇ s in th ⁇ worst cas ⁇ . And b ⁇ tw ⁇ n 4N and 6N byt ⁇ s in practice.
  • Perfo ⁇ ning th ⁇ sam ⁇ analysis but from anoth ⁇ r angl ⁇ Consid ⁇ r two point ⁇ rs p and p 2 that ⁇ manat ⁇ from nod ⁇ v of l ⁇ v ⁇ l k . Let x be a k ⁇ y reachable from p ⁇ andx 2 a key reachable from p 2 . Then jtj and x 2 share the first & -1 characters. In A PAIF structure, each one of these characters is represented at most once. In the B-tree repr ⁇ s ⁇ ntation it is needed to explicitly represent th ⁇ first k character of each key.
  • first two records reside in the same block, then it is possible to keep a single full sized point ⁇ r for the first pointer to a block, and instead of keeping a pointer for each of the r ⁇ maining outgoing links to that block, computing their displacement, i.e., if the first two records reside in block number 2000 and the third record in block 7000 it is possible to maintain the structure 2000(e,f) 7000(h).Th ⁇ savings would be much more substantial if a larger number of outgoing links point all to the same block. If k such links point to - 85 -
  • fig. 17A shows a nod ⁇ 2000 of a trie with the links 2010, 2011, 2012 (values 5,9,A resp ⁇ ctiv ⁇ ly) that address 3 data records - 2002, 2004, 2006 at disk address 3000, 5000, 7000 respectively.
  • the size ne ⁇ d ⁇ d to r ⁇ pr ⁇ s ⁇ nt th ⁇ link valu ⁇ s (1 byt ⁇ for each link) and the pointers (4 byt ⁇ s) to th ⁇ data is 15 bytes.
  • r ⁇ pr ⁇ s ⁇ nt th ⁇ link is the address to block 2020 (4 bytes) and th ⁇ link values to the data records 2002, 2004, 2006 that reside in the block (1 byte for each link value).
  • the size ne ⁇ d ⁇ d to r ⁇ present th ⁇ point ⁇ r to the data block and the valu ⁇ of th ⁇ links is only 7 byt ⁇ s - (3000:5,9,A).
  • node 2000 can include links to other data records or data blocks (such as link 2024 to data block 2022 accommodating data r ⁇ cord 2008).
  • the database may b ⁇ located in a central location, or distributed among two or more r ⁇ mot ⁇ locations.
  • Figs. 18A-D th ⁇ r ⁇ ar ⁇ shown four b ⁇ nchmark graphs demonstrating the enhanced performance, in terms of response time and file size of databas ⁇ utilizing a file managem ⁇ nt system that employs a system of the invention vs. commercially available Ctr ⁇ based database.
  • the inserts are realized through Uniface application running in Windows (for workgroup) op ⁇ rating syst ⁇ m.
  • Th ⁇ benchmark of Fig. 18A concerns measuring the time in minutes for inserting an ev ⁇ r increasing number of a priori sorted data records to a file (0-1,000,000).
  • the larger number of inserts th ⁇ gr ⁇ at ⁇ r is th ⁇ improv ⁇ m ⁇ nt in terms of response time of the database file managem ⁇ nt syst ⁇ m of th ⁇ inv ⁇ ntion.
  • inserting 1 million records takes about 669 minutes in the Ctree based database as compared to only 65 minutes in the syst ⁇ m of th ⁇ inv ⁇ ntion.
  • Mor ov ⁇ r, th ⁇ r ⁇ sponse time in th ⁇ fil ⁇ management system of the invention increases by only a small extent as the numb ⁇ r of records increases, as opposed to significant increas ⁇ in th ⁇ r ⁇ spons ⁇ tim ⁇ in the counterpart syst ⁇ m according to the prior art.
  • the b ⁇ nchmark of Fig. 18B illustrates the file size in mega bytes as a function of number of data records in the file (0-1,000,000). As shown in Fig. 18B, the larger number of r ⁇ cords the greater is the improvem ⁇ nt in t ⁇ ims of file size in th ⁇ databas ⁇ fil ⁇ manag ⁇ m ⁇ nt syst ⁇ m of th ⁇ inv ⁇ ntion. Thus for 1 million r ⁇ cords th ⁇ fil ⁇ siz ⁇ of Ctr ⁇ bas ⁇ d fil ⁇ is about 151 m ⁇ ga byt ⁇ as compared to only 22 mega byte in the database file managem ⁇ nt syst ⁇ m of th ⁇ inv ⁇ ntion.
  • Graphs 18C and 18D are similar to thos ⁇ shown in Figs. 18A and 12B apart from the fact that in the former (18C and 18D) th ⁇ data r ⁇ cords ar ⁇ ins ⁇ rted randomly whereas in the latter (18A and 18B) the data records are a - 87 -
  • the system of the invention is more efficient in terms of both respons ⁇ time and file size.
  • Figs. 19A-D illustrates a benchmark graphs of a system of the invention (operating under DOS operating system) vs. commercially available Btre ⁇ bas ⁇ d databas ⁇ syst ⁇ m. The results are as before i.e. the system of the invention is more efficient in terms of both respons ⁇ time and file siz ⁇ .

Abstract

A database file management system for accessing data records is being executed on data processing system, the data records are linked to a trie index that is arranged in blocks (402, 405, 406 and 407) and being stored in a storage medium. The trie index (A, B and I, element 402) enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks. There is provided a method for constructing a layered index arranged in blocks, which includes the steps of providing the trie index and constructing a representative index over the representative keys of the trie index. The layered index enables accessing or updating the data records by key or keys and it constitutes a balanced structure of blocks.

Description

- 1 -
DATABASE APPARATUS
FT. E D OF THE INVENTION
This invention relates to databases and database management systems.
BACKGROUND OF THE INVENTION
As is well known, a database system is a collection of interrelated data files, indexes and a set of programs that allow one or more users to add data retrieve and modify the data stored in these files. The fundamental concept of a database system is to provide users with a so called "abstract" and simplified view of the data (referred to also as data model or conceptual structure) which exempts a conventional user from dealing with details such as how the data is physically organized and accessed.
Some of the well .known data models (i.e. the "Hierarchical model", "Network model", "Relational model" and "Object Relational Model" will now be briefly reviewed. A more detailed discussion can be found for example in: Henry F. Korth, Abraham Silberschatz, "Database System Concepts", McGRA -ffill International Editions, 1986 (or the 3rd edition (1997))., Chapters 3-5 pp. 45-172
Generally speaking, all the models to be discussed below have a common property in that they represent each "entity" as a "record" having one or more "fields" each being indicative of a given attribute of the entity (e.g. a record of a given book may have the following fields "BOOK ID", "BOOK NAME", "TITLE"). Normally one or more attributes constitute a "key" i.e. it identifies the record. In the latter example "BOOK-ID" serves as a key. The various models are distinguished one from the other, inter alia, in the way that these records are organized into a more comϋlex structure: Relational Model - The relational model, introduced by Codd, is a landmark in the history of database development. In relational databases an abstract concept has been introduced, according to which the data is represented by tables (refεired to as "relations") in which the columns represent the fields and rows represent the records.
The association between tables is only conceptual. It is not a part of the database definition. Two tables can be implicitly associated by the fact that they have one or more columns whose values are taken from the same set of values (called "domain").
Other concepts introduced by the relational model are high level operators that operate on tables (i.e., both their parameters and results are tables) and comprehensive data languages (now called 4th generation languages) in which one specifies what are the required results rather than how these results are to be produced. Such non-procedural languages (SQL - Structured Query Language) have become an industry standard. Furthermore, the relational model suggests a very high level of data independence. There should not be any effect on the programs written in these languages due to changes in the manner data are organized, stored, indexed and ordered. The relational model has become a de-facto standard for data analysts.
Network Model - In the relational model, data (and relationship between data) are regarded as a collection of tables. In distinction therefrom in the network model data are represented as a collection of records whereas relationship between the records (data) are represented as links.
A record in the network model is similar to an "entity" in the sense that it is a collection of fields each holding one type of data. The links may be effectively viewed preferably (but not necessarily) as pointers. A collection of records and the relation therebetween constitutes a collection of graphs. Hierarchical Model - The Hierarchical Model resembles the network model in the manner that data and relations between data are treated, i.e. as records and links. However, in distinction from the network model, the records and the relations between them constitute a collection of trees rather than of arbitrary graphs. The structure of the Hierarchical Model is simple and straightforward particularly in the case that the data that needs to be organized in a database are of inherent hierarchical nature. The hierarchical model has some inherent shortcomings, e.g. in many real life scenarios data cannot be easily arranged in hierarchical manner. Moreover, even if data may be organized in hierarchical manner, it may require larger volumes as compared to other database models.
Consider for example a basic entity "Employee" with the following subordinated attributes "Employee_Salary" and "Employ ee_Attendance". The latter may also have subordinated attributes e.g. "Employ eeJEntries'' and "Employee_Exits". In this scenario the data is of inherent hierarchical nature and therefore should preferably be organized in the hierarchical model. Consider, for example, a scenario where "Employee" is assigned to several "Projects" and the time he/she spends ("Time_Spent") in each project is an attribute that is included in both the "Employee" and "Projects" entities. Such arrangement of data cannot be easily organized in the hierarchical model and one possible solution is to duplicate the item "Time_Spent" and hold it separately in the hierarchies of "Employee" and "Project". This approach is cumbersome and error prone in the sense that it is now required to assure that the two instances of "Time_Spent" are kept identical at all times.
Object Oriented Model -A comprehensive explanation can be found in "Object Oriented Modeling and Design"', James Rumbaugh, Michael Blaha, William Premerlani, Fredrick Eddi and William Lorεnsen.
The object-oriented approach views all entities a objects. Each object belongs to a class, with each class there are associated methods and fields. - 4 -
To enable encapsulation some the fields are private, accessible only to methods of the class while others axe public accessible to all. Thus "Joe Smith" belongs to the class of persons. For that class, the private fields age can be defined. Applying the class method update_age() to the object Joe will change his age. The methodology allows to define sub-classes which inherit all the methods and fields of the super-class. Thus, for example, the employee class can be defined as a subclass of the person class. In addition one may define additional fields and methods to the subclass. Thus, the employee class could support a salary field, and the get_raise ( ) method.
Object Relational Model allows an object view on relational-organized data. Thus, one is able to operate on the data as if it is organized as objects and at the same time, support the relational approach.
As mentioned in the foregoing, data models deal with the conceptual or logical level of data representation and "hide" details such as how the data are physically airanged and accessed. The latter characteristics are normally dealt with by a so-called database file management system.
The database file management system maps the logical structure (in terms of database model) to a data structure, pertinent operations and possibly other data. The data structure includes index -and data records. The index enables accessing or updating the data records by a key. .In the context of search, the term search key is used. Database file management system should preferably operate on the data records so as to accomplish enhanced performance in terms of time (i.e. from the user's standpoint fast response time of the database), and space (i.e. to minimize the storage volume that is allocated for the database files). As is well known in the art, normally, there is a trade off between the time and space requirements. The performance of the database depends on the efficiency of the data structures that are used to represent the data and how efficiently the system can operate on these data. A detailed discussion on conventional file and management systems is given for example in Chapters 7 (file system structure) and 8 (indexing ) in "Database System Concepts", ibid.
Known database file management systems typically utilize the following indexing schemes, which fall into the following main categories that include: Multi-way trees indexes and others.
Multi-way trees indexes- These techniques can be used to create a one or more access paths (referred to also as search paths) to the same data record. The search paths form a multi-way tree. Its main disadvantages are that it requires space (usually all the keys to the records plus some pointers) and maintenance (addition and/or deletion of keys whenever an update transaction (see definition below) occurs i.e. record is added and/or deleted. Normally, the nature of the indexing scheme as well as the volume of the data held in the files determine the number of accesses that are required to find or update (update encompasses, insert, delete or modify) a given data record. In the case that the storage medium under consideration is an external memory, the number of accesses is effectively the number of .I/O accesses. As will be explained below, in each access to the storage medium a block of data is loaded into the memoiy.
Various types of tree inde.xing schemes have been developed but, normally, an indexing implementation is more costly than the specified direct access indexing techniques. On the other hand, tree indexing allows sequential and sub-range processing. One of the most widely used indexing schemes is the B-tree (under various commercial product names and implementation variants such as B tree) in which the keys are kept in a balanced tree structure and the lowest level points at the data itself. Detailed explanation of the B-tree indej ing scheme is found in "Database System Concepts" ibid. pp. 275-282. The number of I/O accesses obeys the algorithmic expression LogκN ÷ 1 where K is an implementation dependent constant and N is the total number of records. This means that the performance slows down logarithmically as the number of records increases.
It is possible, of course, to use a combination of the above or other techniques, e.g. an indexing scheme which is implemented in accordance with two or more of the above techniques.
One of the significant drawbacks of the aforementioned popular B-tree indexing scheme is that the keys are not only held as part of the data records, but also as part of the index
This results, of course, in the undesired inflation of the index size and the latter drawback is fuilher aggravated when indexes of large size are utilized (i.e. when a relatively large number of bits is required for representing the key).
One possible approach to cope with this problem is to exploit the Trie indexing scheme. An example of the latter is the trie discussed in G. Wiederhold, "File organization for Database design"; Mcgraw-Hill, 1987, pp. 272, 273, or in D.E. Knuth, "The Art of Computer Programming"; Addison- Wesley Publishing Company, 1973, pp. 481-505, 681-687.
Generally speaking, the trie indexing scheme enables a rapid search whilst avoiding the duplication of keys as manifested for example by the B tree technique. The trie indexing scheme has the general structure of a tree wherein the search is based on partitioning the search according to search key portions (e.g. search key digit or bit). Thus, for example each node in the trie indexing file represents an offset of the search key and the link to any one of its children represents the character's value at said offset. The trie structure affords efficient data structure in terms of the memory space that is allocated therefor, since, as specified before, the search-key is not held, as a whole, in internal nodes and hence the duplication that is exhibited for example in the B -tree indexing technique is avoided.
In a specific variant of the trie such as the trie described in "File organization for Database design" ibid., in order to achieve enhanced - 7 -
performancε in teπns of response time, a trie indexing file should be built by selecting the digits (or bits) from the search key such that the best possible partition of the search space in obtained, or in other words so as to accomplish a tree which is as balanced as possible. This, however, requires a priori .knowledge of the data records of the trie and is accomplished at the penalty of obtaining an unsorted data, which in many real-life scenarios is inapplicable. It is noteworthy that if sorted data is mandatory, a balanced structure can not be guaranteed even if there is sufficient a prioiri knowledge of the data records of the trie. It should be noted that the specified trie does not support sequential sub-range processing.
When considering a large amount of data, it is of particular importance to maintain a so-called balanced structure of the tree index in order to avoid long paths for accessing a given data record from the root node to the leaf node that is associated with the sought data record. The specified B-tree indexing scheme, constitutes an inherent balanced tree structure, even after the tree has been subject to update transactions. The inherent balanced (or essentially balanced) structure is accomplished, however, and as explained above, at the penalty of inflating the contents of the blocks in the tree and, consequently, unduly increasing the file size that holds the index, particularly insofar as large trees which hold multitude of data records are concerned. The large volume of the files adversely affects the performance of the data management system in terms of number of accesses (and consequently in terms of accessing time) to the storage medium in order to reach a sought data record, which is obviously undesired.
Turning now to the "others" category of index schemes it includes for example the so called Skip list index: A skip list is a randomized data structure: It consists of levels, the lowermost level, level 0, consist of a list of all records ordered by non-decreasing order. Each node of level i (i = 0,...,h ) chooses, with probability p, whether to be a representative of level i + 1. The rεϋresentatives cf level i constitute the nodes of level i - 1 . These - 8 -
representatives, too, are organized as an ordered list. Level h+1 is the first empty level.
Having discussed the major drawbacks of hitherto known index schemes i.e. inflated data volumes (e.g. B-tree and variants thereof) and susceptibility to unbalanced structure (e.g. trie), there follows a discussion in another aspect which pertain to various characteristics including subordination of data records and multi-dimensional characteristics .
Thus, consider for example, two types of data records reprεsεntεd as two entities (tables), i.e. Books and borrowers, each being associated with respective unique key, e.g. borrower is identified by Borrower Jd and book is identified by Book d. In real life scenario, such as in a public library, one is interestεd to view for example all books borrowed by a given borrower. The latter transaction exemplifiεs subordination of data records, where "books" are subordinated to "borrower". .In order to resolve this query, one should apply two queries - one for the borrower information and another for the boolcs borrowed by him (according to the composite key - book borrower)
Insofar as B-tree indexing scheme is concerned, in order to support the subordination of data in the manner specified, several separate index files are requires, as follows:
• Boolcs index file, accessible via book-Id key;
• Borrowers index file, accessible via borrower-Id key;
• Transactions via borrowers, accessible via the composite key {borrower-Id book-Id).
Accordingly, the index scheme includes here three index files. This obviously poses undesired overhead insofar as data volumes and additional integrity maintenance and checking are concerned. Thus, for example, rεmoval of a givεn book from the book file requires a preliminary tεst to inquire whether it exists in the borrower-book index file.
Having discussed the drawbacks of hitherto known techniques insofar as subordination of data records are concerned, the cumbersome - 9 -
representation and manner of operation therεof becomes even worth considering implemεntations of thε so called multi-dimensional data records
Reverting now to the latter examplε, the tables Books and borrowers are now regardεd as multi-dimensional tables, which can be reached from sevεral views. Thus, in addition to the above mentionεd borrowεr-> book viεw (books borrowεd by borτower(s) which is implemεnted by an index ovεr thε borrowεr-book composite key, the database should support thε altεrnativε viεw of borrowεrs that borrowed a given book(s), which requires, of course, to utilize the alternative composite key (book-borrower).
In the Btree reprεsεntation, it is accordingly rεquirεd to add anothεr indεx filε accεssiblε via the composite kεy {book-Id borrower-Id), giving rise to a total of four index files.
The pertinent drawbacks are self explanatory and bεcomε εvεn worth for n dimεnsional tablεs {n >2).
There is accordingly a need in the art to reduce the drawbacks of data processing systems that exploit hitherto .known database file managemεnt systεm. Spεcifically, there is a neεd in the art to provide for a data processing system that exhibits database performance by utilizing an efficient database file managεment system.
There is yet further neεd in thε art to providε for a database file managemεnt systεm that utilizes an index which inhεrεntly bεing not susceptible to unbalanced structure in thε manner specified above.
There is still further nεεd in thε art to providε for an index which inherently supports reprεsεntation of multiple types of data, subordination of data records and/or multi-dimensions.
GLOSSARY OF TERMS:
For clarity of explanation, therε follows a glossary of additional tεrms used frequently throughout the description and the appended claims. Some of the terms arε conventional and others have been coined: - 10 -
Block - a storage unit which can be accessεd by a singlε I/O opεration. A block may contain data arrangεd in any dεsirεd mannεr, ε.g. nodεs arrangεd as a tree and possibly also links to actual data records. A block may reside in main (refεrrεd to also as intεrnal) or sεcondary (referred to also as extεrnal) storagε.
Tree - a data structure which is cither empty or consists of a root node linked by means of d {d ≥ ) pointers (or links) to d disjoint trees called subtrees of the root. Thε roots of thε subtrεεs arε referred to as children nodes of the root node of thε tree, and nodes of the subtrees are descendent nodεs of thε root. A node all the subtrees of which arε εmpty is called a leaf node. The nodes in thε trεε that arε not lεavεs arε dεsignatεd as internal nodes.
In the context of the invention, leaf nodes are also nodes that are associated with data records.
Nodes and trees should be construed in a broad sense. Thus, the definition of treε encompasses also a treε of blocks whεrεin each node constitutes a block. In the same manner, descεndεnt blocks of a said block arε all thε blocks that can be accessεd from thε block. For detailed definition of "trεε", refer also to the book Cormen, Lεisεrson and Rivεst, or Lεwis and Dεnεbεrg "Data structures and thεir algorithms".
It should bε notεd that thε association (ε.g. link) betweεn lεaf nodε and data rεcord encompasses any realization, which enablεs to access data records from lεaf nodεs. Thus, by way of example, a data record may be accessed directly (i.e. through pointer) from the leaf node. By another non-limiting examplε, thε lεaf nodε points to data structure, (e.g. a table) which, in turn, enables to access data records. Othεr variants arε of course, also feasible.
Depth of an index - is definεd as thε maximum number of blocks from a root block to a block associated with a data record. - 11 -
Balanced Index - An indεx is balanced if thεrε εxists a constant c such that thε numbεr of accesses needed to reach any data record is at most clogrc , where n is the number of records in the structure.
Obtaining a balanced treε εncompasses, applying balancing technique, post factum, (on an unbalanced structure), bringing about a balanced structure, or, if desirεd, applying thε balancing technique on the fly, so as to maintain, a balanced balanced structure.
Accessing in an index would be considerεd as a process of moving from a node to another node within a block or to another block usually, although not necessarily, in order to reach sought data records.
Navigating is considerεd as accessing data records, usually (although not necessarily), in order to collect them in an orderεd mannεr by thεir kεy.
Search scheme: mεaning thε algorithm that is associated with an index that is used for accessing a given data record by key; intra-block search scheme meaning the algorithm that is usεd insidε thε block for accessing a given data record or another block. Thε data rεcord is not necessarily accommodated within said block.
Common key of a block - The common key of a block is the longεst prefix of all kεys of thε data rεcords that can bε accessed from the block by the relεvant search scheme. If dεsirεd, part or all of thε common kεy may bε hεld explicitly in the block.
Update transactions - transaction consisting of eithεr inserting a new data record, or delεting an εxisting data rεcord or modifying an existing data record or portion therεof .
Vertical orientated trie structure - conventional orientation of digital treε from root to lεavεs. As will be εxemplified bεlow, it is not always obligatory to maintain all thε links bεtwεεn nodes and/or blocks in the vertical trie. As will be explainεd in greater detail below, in an index of the invention, thε triε that is susceptible to an unbalanced structure constitutes a vertical treε. As will bε dεscribed below, in some specific embodimεnts, thε - 12 -
construction of indεx ovεr thε kεys of thε data records of triε constitute vertical orientεd triεs.
Horizontal oriented trie structure - having h lεvεls of vεrtical orientated trie structures with the first levεl standing for thε uppεrmost lεvεl and the h th levεl standing for thε lowεπnost level (constituting the triε that is suscεptiblε to an unbalanced structure) which is normally associated with data rεcords, and allows to movε from a block in the z' th levεl to a block in thε i + 1 st level according to a common key value of the block. In various embodiments of the invention, and as will be explainεd in greater detail below, the h upper levels constitute a representativε indεx ovεr thε common kεys of thε blocks of thε lowεrmost level treε.
Storage medium - .Any medium that may be used to store data, including eithεr or both of intεmal and external memory. Extεrnal mεmory may bε one or more of the following: magnetic tape, magnetic disk, optical disk, or any othεr physical medium used for storing data. Intεrnal mεmory includes any known main memory including cache memory as well as any other physical storage medium that serrε as internal memory.
Short link - (refεrrεd to also as near link) a link labεlεd k bεtween a node a having the value r to node b in the same block such that the keys of the data records that include node b on their access path havε thε value k at key position r.
Long link - (referred to also as far link) a link betwεεn a nodε v in block B of level i to block W of level i - 1 or to a data record. If v has value r and the label of the link is k, then thε valuε of thε common kεy of block B' or thε kεy of the data record is k at position r.
The label of a short link or a far link is also referred as the value or direction of the link.
Split link - If a block overflows and a split procεss is performed such that if node a is linkεd to node b, and after the split node b and its descεndent - 13 -
nodεs arε accommodated in a different block — block B — then the link between node a and nodε b is a split link. Aftεr thε split, thε split link is the link betwεεn nodε a and block B (that is accommodating nodε b). A split link is a labεlεd link.
In sεveral implementations such as PAIF maintaining the split link from node a to to the block B where node b resides is optional since one can access block B through the layered index.
Direct link - a link betweεn nodε v in block B of lεvεl i to block B' of level i - \ , that includes a node v' such that nodes v and v' have the same value. If a search path to data record with a key k includes node v but does not include any of its near and far links then it should contain the dirεct link to block B'. A dirεct link has no label.
There follows a description that pertains to the terms duplicated node and copied node that are utilized in thε block split procedure.
Thus, if a node v ' has value k then all the keys of data records accessible from v ' and its labelεd links agrεε on positions 0,...,k -1.
If a nodε v is crεatεd such that it has a value equals to the value of node v ' and all data records accεssiblε from v and its labεled links are accessiblε from node v' and its labelεd links, v is considered a duplicated node of v'. A duplicated node maintains a direct link to the block that includes node v . (a duplicated node is also refεrrεd as copied node). - 14 -
GENERAL DESCRIPTION OF THE INVENTION
There follows a discussion in various additional terms and procedures that are used in the description and thε claims in thε context of the present invention.
Data records consist as a .rule of several fields, some of which are designatεd as keys. Somεtimεs thε records arε ordεrεd by onε of thε keys, called the primary key. .An index (or index schemε) ovεr thε keys of data records or over representativε kεys (for the definition of the latter seε bεlow) is a data structure that facilitates search by one or more of the keys. Examples of index are any of the specified Multi-way tree index schemes. An index according to the invention may be constituted by using more than one index schemε.
Thε indεx may be stored in a file or files that reside partially or entirely in the internal memory or extεrnal mεmory.
In accordance with the invention there is provided an index that includes a partitioned index — a dynamic data structure - that allows search by key, and is partitioned into blocks, each of which contains a representative key. The representative keys should be sufficient to find the block associated with a record whose key equals the sεarch kεy (if onε εxists). Having located the block, the data record may easily be retrieved. The reprεsεntative keys are not necessarily stored physically in the block.
Examples of partitioned index are:
1. The sequence of blocks of a file orderεd by increasing key value of the primary key. The index leads the search to the block containing thε kεy. To allow sεarchεs by a kεy that is not the primary key, a partitioned index is constructed such that for each record the - 15 -
partitionεd index contains its key and its link. Thesε pairs arε ordεrεd by non-dεcreasing value of thε kεy. Thε indεx lεads to thε block containing thε address of the desirεd rεcord.
2. A triε arrangεd in blocks.
3. Other types of index schemεs that mεεt thε provision of partitioned index.
A partitioned indexεs ovεr thε kεys of data rεcords is called a basic partitioned index and is denotεd indεx layer I..
This partitioned index might become non-balancεd, thus giving rise to some long search paths.
To search the partitioned index εfficiεntly, an additional index layer (an index layer is denotεd in short also index) Ix is constructed over the representativε kεys of IQ. If Ix is also a partitionεd indεx thεn an additional index I. may be constructed over the reprεsentative kεys of thε blocks of Iλ . This process may be repεatεd until creating an index Ih (herεinaftεr root indεx) which preferably is fully contained within a single block. Thε root indεx Ih is not necessarily a partitioned index. The layered index
(which constitutes also an index) is the collection of I0,...,Ih .
Iv... ,Ih constitute a so called representative index.
To search a record by key k , the latter is searched in Ih (and in some cases in /.,_, to /, and data record(s)) in order to find the block B of
Ih_x leading to k . This process is repeated until reaching the block of I0 that is associated with the record with key k (if one exists).
To insert a new record r with key k , a search is performεd as above to find the block B . Having found B in I. , r is added to B .
If B (in 70) overflows, it is split into two (or more) blocks and the reprεsentative of B in/, is replaced by the reprεsεntatives of the new blocks. - 16 -
Thε ovεrflow of block Bx in Ix entails a splitting of Bx and the reprεsentative of Bx in I. is replaced by the representativεs of thε new blocks etc. If the block of /,, overflows an additional layer Ih+X is created and added to the layerεd indεx. It should bε notεd that an "ovεrflow" statε may bε dεtεrmined according to the particular application, and doεs not necessarily triggerεd whεn block is rεndεrεd full. Thus, for example, by one embodimεnt ovεrflow occurs whεn a block is at least half size full.
Deletion is similar to insertion, and might involve merging — revεrsε process of splitting. The updatε or thε split nεεd not nεcessarily be performed on the fly, but may bε dεlayεd (i.e. performed post factum).
It should be noted that the const ction of the layerεd indεx preferably retains a balanced index.
It should be notεd that in some embodimεnts thε balanced index is sufficient, and in some cases wherε thε lay red index (without IQ) is of relatively small volume (e.g. may be accommodated mostly or entirely in the internal memory) the "balanced structure" requirεmεnt may bε exemptεd.
In accordance with a first aspect of the invεntion, it has bεεn found that thε inhεrεnt limitations of a basic partitioned index (e.g. trie) that is susceptiblε to an unbalanced structure may bε copεd with by providing an indεx and, morε spεcifically, a layεred indεx in the manner specified.
Focusing, for example, in the layerεd indεx as compared to the basic partitioned index (e.g. trie), it readily arises that accessing selected data records through thε layered index is substantially more efficient than the accessing the same data records through said trie.
In the context of thε invεntion, "morε εfficient" means that the number of accesses to the storage medium through the layerεd indεx in ordεr to pεrform an update transaction (e.g. insert, delεtε or modify) on a data rεcord or access data record is smaller compared to the number of accesses to the storage medium through the basic partitioned index. - 17 -
Numbεr of accesses should be construed such that in each access a block is handled (e.g. loaded or procεssεd) from thε storagε mεdium.
Thεrε may bε εxceptional scenarios where the latter "morε εfficient" provision does not apply ε.g. in thε casε of vεry small filε having only fεw blocks, whεrε accessing a data record through the basic partitioned index may requirε thε samε or even lεss opεrations than through said layεrεd indεx.
In ordεr to implement partitionεd indεx as a triε - thε construction of a layered index from a basic partitioned index which is a trie, requires somε further considerations.
Thus, each kεy is rεgardεd as a character or bit string. Moreover, if the trie cannot be accommodated in a single block, it is partitionεd into blocks, such that εach block contains a singlε subtree of the trie. The reprεsεntativε kεy of the block is the string associated with the root node of the trie in thε block, i.e., the sequence of labels of the path from thε root of thε trie of /,. to the root of the trie of thε block. As in thε gεnεral layered index schemε, the rεpresentative kεys of /,. are the kεys of Ii+ . To search a key k in IM , one sεarches for the longest prefix k in the blocks of Ii+X and from there moves to the appropriate block of /,..
The insertion of a rεcord εntails adding its kεy to 70, i.e., adding a value to the triε of I- . If as a result a block overflows, the block is split — it is partitioned into typically two (in some implemεntations morε) blocks, such that εach block contains a (connεctεd) triε. To accomplish this a link bεtwεεn a nodε w and its child v is severed, and the subtreε rootεd atv is movεd to anothεr block. The reprεsεntative key of the nεw block, is addεd to Ix . As in thε gεnεral layεrεd indεx schεmε, this process is continued to y..y.
If the basic partitionεd indεx is a comprεssεd trie like Patricia or PAIF, only part of the keys are saved, this savεs indεx space. Howevεr, - 18 -
thεsε savings affect the manner in which the search is performεd. In such compressed tries usually only nodes of dεgrεe greater than or equal to two are maintained. If the sεarch kεy k doεs not bεlong to comprεssεd triε, thε sεarch might tεrminatε at somε record r , and we have to check whethεr k is εqual to the key of r . If the keys arε different thεn thε triε does not contain a record with key k .
The effect of this strategy on the layered index schεmε is that thε prεfix of k might not bε represented in the index. To enable search in such cases a direct link from nodes of blocks of Ii to block of f_ are introduced.
Thesε links do not havε a direction, and arε taken when the appropriatε position of thε sεarch kεy doεs not agree with any one of the directions of the nodε.
Supposε the search reaches block 5M of /,._, , whose reprεsεntativε key kt_x is not a prefix of k . (If k(_x is not recorded explicitly in Bt_ , we can reach any data record r accessible from Bt_x , and from r's key detεrminε &,._, .) To continue the sεarch, wε compare k and&M to find the position j of thε first character where thεy differ, search up the trie of block Bi until finding a node v with a dirεct link and value less than or εqual to j . Thε search continued from block of It_x pointed at by that direct link. (If no such node exists, we go to the first block of the index f_x .) Thus, in the worsε case, each layer might rεquirε one extra access. This notwithstanding, and as will be explained below, 3 layers arε sufficient to address billions of rεcords and usually 2 layers can be maintained in the internal memory of a computer. Thus it is possible to have no more than two I/O accesses to the εxtεrnal storage medium in ordεr to access the block associated with a data record.
The split process also has to accommodate dirεct links. Supposε that thε access path to block Bt_ of /,._, consists of blocks,, of layer I; , £,._, - 19 -
ovεrflows and is split into blocks Bt_x and By . Block Bl has now to contain links to all its dεscεndεnt blocks in It_x . This can bε accomplished by the following non-limiting technique:
Let ky be the representative key of By, this key is insertεd to T, — thε comprεssεd triε of B, — so that thε sεarch to the keys of descendεnts of B reaches By, and the search for thε descendεnts of #,_, reaches Bt_x .
A non-limiting method to accomplishing split process is as follows:
1. at least one short link among the short links of a node (herεon split nodε) in thε block is dεlεtεd (hεrεon split link) in a way that at least two tries exist in the block.
2. each of the sub-treεs is movεd to a separate block.
3. If the block of Bl does not exist, Bl is crεatεd and a copied node of the split node is crεatεd in Bt .
4. If thε block of Bt εxists and a copied node of the split node does not exist in Bl , then a copied node of the split node is created in Bt and connected to the trie of Bl such that By (at the εnd of thε split process) is accessiblε in a sεarch path that includes the root node in Bl and the copied nodε and its labεlεd links according to thε rεpresentativε kεy of
5. If thε copied node has no direct link, add a direct link from the copied node to thε block £,._, .
6. Add a far link from thε copiεd nodε to the block By or if the copied node has a short link to a child node in the direction of thε far link, thε far link can bε rεplacεd by a dirεct link from thε child nodε to block s .
In thε abovε implεmentation, a split of a block in Ik , k>0 is performed such that the split links (of Ik ) are links bεtween copiεd nodεs of - 20 -
split nodes that reside in different blocks.
Accordingly, in accordance with one aspεct the invention provides for in a storage mεdium usεd by a databasε file managemεnt system exεcuted on data processing systεm, a data structure that includes: a layered index arranged in blocks; the layerεd index includes a basic partitioned index that is associated with data records; the basic partitioned indεx εnables accessing or updating the data records by key or keys, and bεing susceptible to an unbalanced structure of blocks; said layerεd indεx εnablεs accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
Thε invεntion furthεr provides for, in a storage mεdium used by a database file management system exεcuted on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the keys of data records; the index includes a basic partitioned index that is associated with the data records; the basic partitioned index enablεs accessing or updating the data records by key or keys, and being susceptiblε to an unbalanced structure of blocks; said index enables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
Still fu.rtlιer, thε invention provides for, in a storage mεdium usεd by a databasε file managemεnt system exεcutεd on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the kεys of data rεcords; the index includes a trie that is associated with the data records; the trie enables accessing or updating the data records by kεy or keys, and being susceptiblε to an unbalanced structure of blocks; said indεx εnables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
Still furthεr, the invention provides for in a database file management - 21 -
systεm for accessing data records and being exεcutεd on data processing system; the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium; the basic partitionεd indεx εnablεs accessing or updating the data records by key or keys and being susceptiblε to an unbalanced structure of blocks; a method for constructing a layerεd indεx arranged in blocks, comprising the steps of:
(a) providing said basic partitioned index;
(b) constructing a reprεsεntativε indεx ovεr thε represεntativε kεys of said basic partitionεd indεx; said layεred index enablεs accessing or updating the data rεcords by key or keys and constitutes a balanced structure of blocks.
Thε invεntion furthεr providεs for in a databasε file management system for accessing data rεcords and being exεcuted on data processing system; the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium; the basic partitioned index enables accessing or updating thε data rεcords by kεy or keys and being susceptiblε to an unbalanced structure of blocks; a method for constructing an index ovεr the keys of the data rεcords, thε indεx bεing arrangεd in blocks, comprising the steps of:
(a) providing said basic partitioned indεx;
(b) constructing an indεx ovεr thε rεpr sentativε kεys of said basic partitionεd indεx; said indεx εnablεs accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
In accordance with the invention thεrε is furthεr providεd in a database file managemεnt system for accessing data records and being exεcutεd on data processing system; the data records are associated with a triε arrangεd in blocks and bεing storεd in a storagε medium; the trie enables accessing or updating the data records by key or kεys and being susceptible to an - 22 -
unbalancεd structure of blocks; a method for constructing an index over the keys of the data records, thε indεx bεing arrangεd in blocks, comprising thε stεps of:
(a) providing a triε;
(b) constructing an index over the reprεsεntative keys of said trie; said index enablεs accessing or updating the data rεcords by key or keys and constitutes a balanced structure of blocks.
Thε indεx, according to thε invention is prefεrably, although not necessarily constructed by onε or morε of thε indεxing schemes selεctεd from the specified index schemεs. Typical, yεt not exclusive, examples of multi-way trees indexes being the B-treε indεxing schεmε.
By one embodiment said basic partitioned search scheme being a triε that is constituted by a digital treε of thε .kind disclosed in U.S patent no. 5,495,609.
By another embodiment said trie is constituted by a so called Probabilistic Access Inde.xing File (PACF).
Thus, by a specific embodimεnt thεrε is provided in a storage medium used by a database filε managεmεnt systεm executed on data processing system, a data structure that includes at least one probablistic access indexing file (P.AIF) having a plurality of nodes and links; the lεavε nodes of said P.AIF are associated each with at least one data record accessiblε to said user application program and wherein at least portion of said data record constitutes at least one search-kεy; selεctεd nodes in said PLAF represent, each, a given offset of a search key portion within said inset sεarch kεy; link(s) originatεd from εach given node from among said selected nodes, represent, each, a unique valuε of said search key portion; the PLAF having at least two sub-PIAF's being arrangεd, each, in a block; - 23 -
said data base file managemεnt systεm is furthεr capable of arranging said blocks as a balanced structure of blocks.
In the context of PAIF, it should bε notεd that said sεlεctεd nodεs, whilst prεfεrably including only a givεn offsεt, this is not always nεcessarily the case. Thus, one or more of said nodes may include other information, such as portions of the keys and/or other information, all as requirεd and appropriatε.
According to a modified embodimεnt, thε triε bεing of thε PAIF type, the indexing schemε is constituted by a search scheme substantially identical to that of the PAIF trie.
Beforε procεεding any furthεr it should bε notεd that for convεniεncε of dεscription only thε invεntion is described mainly with refεrence to triε as a basic partitionεd indεx. Thosε vεrsεd in thε art will rεadily apprεciate that the invention is by no means bound by trie and accordingly any basic partitioned indεx is applicable.
Thus, a database filε managεmεnt system that employs a layerεd index of the invention is advantageous, in terms of enhanced perfoimance as compared to hitherto .known techniques inter alia owing to the following characteristics:
• The data are hεld inhεrently in sorted form according to search key.
Namely, One can navigate in the tree by the order of the kεys of thε data rεcords. The layerεd indεx inhεrεntly supports sεquential operations likε "get next" and "get previous". In this rεspect, the proposed layered index constitutes an advantage ovεr ε.g. hashing scheme and some implemεntations of digital trees.
• There is no requirεmεnt for in advance I owledgε of thε contents of the database, in ordεr to maintain balanced index.
• A balanced layerεd indεx is retained and the depth of thε layered index is relatively small, thereby minimizing the number of accesses (normally - 24 -
slow I/O operations) that are requirεd to pεrform updatε transaction or access data record. According to onε εmbodiment, practically one I/O
(and no more than two I/O) operation (constituting one or two access) is requirεd in order to access a given data record from among billions data records.
The invention thus furthεr provides for in a computer system having a storage medium of at least an internal mεmory that rangεs bεtween 10 to 20
M bytε or more, and an extεrnal mεmory; a data structure that includes an index over thε kεys of thε data rεcords; thε indεx is arrangεd in blocks; such that for one billion data records substantially no more than two accessεs to said εxternal memory are required in order to access a block that is associated with any one of said billion data records, irrespective of the size of the kεy of said data rεcords.
Still furthεr, thε invεntion providεs for in a computer system having a storage medium of at least an internal memory that ranges between 10 to 20 M byte or more, and an external memory; a data structure that includes an index over the keys of the data records; the index is arrangεd in blocks; such that onε million data rεcords substantially all thε blocks of thε indεx arε accommodated in said internal mεmory regardless of the size of the key of said data records.
Thε invεntion furthεr provides for In a computer system having a storage medium, a data structure that includes an index over thε kεys of data rεcords; thε index is arranged in a balanced structure of blocks and enables to perform sequεntial opεrations on said data records; the index sizε is εssεntially not affεcted from the size of said kεys.
It should bε notεd that the data records may residε in thε blocks of thε layεred index, or may reside in separate data files (one or more). In thε latter embodiment the data records should be associated, of course, to the corre- - 25 -
sponding layεrεd indεx. As will furthεr be clarified with refεrence to thε dεscription of specific embodiment below, a given data record may accommodate more than one search key.
Thε indεx, according to thε invεntion is prεferably, although not necessarily constructed by one or more of the indεxing schεmεs sεlεctεd from the specified index schemes. Typical, yet not exclusive, examples of multi-way treεs indεxεs bεing thε B-trεe indexing scheme.
There follows now a discussion that pertains to the second aspect of the invention.
Thus, normally data consists of records of several types (e.g. in the examplε abovε books and borrowers). The type of the record determines its fields (attributes) and its keys. In a conventional systεm e.g. of the kind employing a B-treε indεx, thε typε of each key is not kept with the rεcord and not considered part of the key. Thε program "k. nows" thε typε of the record, and therefrom the fields of the data records and their structure.
According to the second aspect of the invention there is proposed a different approach. Each typε of key is assigned with a designator — a string of bits, e.g. a series of one or more characters which, normally but not necessarily, (is) are addεd as a prefix to all keys of this type. A designated key is a key with its designator. The designator is treated as part of the key (for search or update purposes), and therεforε is part of the index schemε.
Thε dεsignator εnables to obtain the properties of the data record as a function of thε typε. Thus by looking at thε dεsignator of thε kεy, onε obtains thε dεsignator hεncε can dεducε thε typε of thε rεcord, onε need not .know the record type a priori. Data records in which thε kεys arε dεsignatεd arε called designated data records. A designated index is an index that enablεs sεarch on designated data records.
Therε follows a dεscription which exemplified the use of designators in accordance with the invεntion. Thus, consider a class C , such that all data records of this class have a key field (or fields) kλ , and possibly sevεral - 26 -
other non-key fields. Let R bε a data rεcord of class C, whεr R.kx =FIAT. Lεt thε dεsignator of kx bε A. By adding thε designator one gets thε key AFLAT. To access a rεcord with R.k =FIAT, the designatεd indεx is sεarchεd for thε kεy AFIAT.
Having dεscribεd thε designator fεaturε, thεrε follows a dεscription of another feature according to the second aspect — subordination of data records. Consider a record Rl with a kεy Kl, and rεcord R2 with a composite key consisting of the ordered pair of keys Kl, K2. (In this case, the designated key of R2 is the composite key K1',K2' , where K2' consists of thε kεy K2 prεfixεd by a designator D2. (D2 is considerεd thε dεsignator of R2.) In a dεsignated index, one can select Rl by searching the key Kl' — the key Kl with its designator Dl, and select R2 by searching the same index by the key K1'K2' — the concatenation of Kl' and K2' wherε K2' is thε kεy K2 with its designator D2. In this case K2 is subordinated Jo Kl.
The subordination relationship is εxtεndεd also to rεcords. If K2 is subordinated to Kl, the designator of K2' is D2 and the designator of R2 is also D2 (or Dl, D2). If R2 is subordinated to Rl, the key of R2 is composed by concatenating K2' to Kl . Note that in K2', D2 is prefixed to K2.
In thε ERD modεl, the type of record Rl and the type of rεcord R2 may stand in a one-to-many relationship, meaning that several records of type R2 may be related to a single record of type Rl. Such a relation can be implemεntεd by thε subordination rεlation: sεvεral records of type R2 will be subordinatεd to a singlε rεcord of typε (ε.g., sεvεral books can bε borrowεd by thε samε borrowεr). In particular, if this relationship is one-to-one (e.g. onε to one is the relationship where only one book can bε borrowεd by εach borrower) then the key K1'D2, where D2 is the designator of R2, is sufficient to locate R2. In a designatεd indεx thε sεarch path to K1'K2' includes the search path to Kl'. (This doεs not preclude the possibility of reaching the record R2 via another path.) The latter characteristic exhibits another important feature according to the second - 27 -
aspεct, i.ε. inhεrεnt maintenance of data integrity. Thus, the insεrtion of a rεcord whose key is K1'K2' (or K1'D2) can only be perfoi εd if thε record whose key is Kl' exists. In thε example above, an insertion of a transaction of a borrower (Borrower_Id = 111111) who boirowed a book (book_Id = 2222) should result in insεrtion of a rεcord R2 whose designatεd key is A111111B2222 (hereon borrower-book record)_only if the specified borrower (data record Rl with Kl=l l l l l l) exists (in thε abovε εxamplε, the designator of the borrower is A and the designator of the subordinated borrower-book data record is B). Data integrity is accomplished with just small overhεad since the path in the index to the borrower-book record includes sufficient information to detεrminε whether the borrower exists. If the borrower doεs not εxist, thε path to thε composite key will not pass through the borrower. This will be automatically detected in the insertion process. In contrast, according to the prior art, records of different types werε associated with different index files. Bεfore inserting a new data record (with a composite key) in the Borrower-Book indεx filε, a sεparatε check must be performεd in the Borrower index file in order to ascertain whether the specified borrower (record Rl, key Kl) exists, thus posing undue overhead.
Note that the subordination relation is not limited to just two levεls, thε subordinatεd record can itself have a record subordinated to it and accordingly n level of subordination may be accomplished. For εxamplε, consider a banking database, wherε thε account rεcords are subordinated to the branch rεcords, and deposits records arε subordinated to accounts.
Turning now to the multi-dimension feature according to the second aspect of the invention, lεt R bε a rεcord that is idεntifiεd by εithεr of two kεys Kl and K2. Thεn, thε designatored index should contain two search paths to R, one by the designated key Kl' and one by thε dεsignatεd kεy K2'. Accordingly, R constitutes a multi-dimensional record. A multi-dimensional index includes the desisnated index and the - 28 -
multi-dimεnsional data rεcord(s).
Consider a first embodimεnt where multi-dimensional index does not apply to subordinated data records. Thus, for example, consider a class C, such that all data records of this class have two key fields kx — the car model — and k2 — its licensε platε number, and possibly sevεral non-kεy fields. Let R bε a data rεcord of class C , whεrε R.kx =FIAT and R.k2 = 127. Lεt thε dεsignator of kx be A and that of k. be B. By adding the designators one gets the keys AFIAT and B 127. These extended keys are insertεd into a single designatεd indεx. To access a record with R.kx =FIAT, the designatεd indεx is sεarchεd for thε key AFIAT, and to select a record with R.k2 = 127, the same dεsignatεd indεx is searched for B 127.
The above discussion and examplε considered a multi-dimensional index wherε the data records do not necessarily exhibit subordination relationship. The multidimensional index may optionally applied also to subordinatεd data rεcords. For εxamplε, consider a banking database, where the dεposits arε subordinatεd to both accounts and depositors. A single designated index provides access to accounts (by the designated key kx account-number), to depositors (by the dεsignatorεd kεy &2' depositor-name) and to deposits by both kx k2 and k2 k (It is possible, of course, to use differεnt designators for the kx when it is subordinated to k2 and to k2 when it is subordinated to k .)
The designator of a multi-dimεnsional rεcord dεpεnds on thε dεsignator of thε kεy usεd to sεarch or update the record. Thus, the dεsignator of a car rεcord (FIAT, 127) is A whεn sεarching or updating thε rεcord by thε kεy AFIAT, and is B whεn accessing it via the license plate number B 127.
In addition to the data records it is neεdεd to maintain meta-data. The meta-data includes infoπnation on the differεnt rεcords as a function of thεir typε. Thus, it is needed to identify the designator and as a result the - 29 -
information on thε rεcord is available, for examplε a dεscription of thε various fiεlds, kεys, subordination, rεcord sizε εtc. Thε sεarch scheme in the designated index is oblivious to the meta-data. It locates thε record, identifiεs thε dεsignator (for εxample the designator can be prefixed to the record) and construct the (composite) designated key.
There is thus provided in accordance with a second aspect of the invεntion, in a storagε mεdium usεd by a databasε filε managεmεnt systεm executed on data processing systεm, a data structure that includes: an index over the keys of data records; the data records bεing of at lεast two typεs where data records of the sεcond typε arε subordinatεd to thε data rεcords of the first type.
Still further in accordance with the sεcond aspect therε is providεd in a storagε medium used by a database file management system executed on data processing system, a data structure that includes: a designatεd indεx over designatεd kεys of data records; the data rεcords, constituting designated data records, bεing of at lεast two types wherε dεsignatεd data rεcords of thε sεcond typε arε subordinatεd to thε dεsignatεd data rεcords of the first type.
According to the second aspect various advantages arε accomplished including:
α The data structure that includes designated index and designatεd data can maintain the relations bεtwεεn diffεrent data items.
□ The data structure that includes designated index and designatεd data can link logically related items.
□ The data structure that includes designated index and designatεd data can support sεvεral data models simultaneously and efficiently.
□ The data structure that includes designated index and designatεd data allows high εfficiεncy in maintaining data integrity. - 30 -
□ The data structure that includes designatεd indεx and dεsignatεd data allows high efficiency in rεtriεving relating data.
A detailed discussion as regards the various advantages offεred by the database file managemεnt systεm of thε invention is given below with reference to specific embodimεnts.
It should bε notεd that the data records may constitute part of the PAIF, or may residε in onε or morε sεparatε data filεs. In thε lattεr εmbodimεnt thε data records should be linked, of course, to the corresponding P.AIF. As will furthεr bε clarified with refεrεnce to the description of specific embodiment below, a givεn data rεcord may accommodate more than one sεarch kεy.
It would also bε presentεd how complex data structures and data relations can be supported by a new uniform and simple technology.
It would also be presented how an index structure can bε of a minimal sizε, not dεpεnding on the size of the keys.
All of the above mentioned advantages are supported inherently by the invention without any preliminary considerations on the data (i.e. key rangε is unknown, number of records is unknown, random physical location of data records is assumed and so on).
By still another aspεct thε invεntion providεs in a storage medium used by a database file managemεnt system executed on data processing system, a data structure that includes: an index being stored in the storage medium and constructed over the keys of said data rεcords that arε stored in blocks; the index being arranged in blocks with thε lεaf blocks being linked to data records by means of links; said index is characterizεd in that at lεast onε of said links is shared by at least two data records stored in thε same block.
By one embodiment, the index bεing constituted by a trie.
Still further, the invention provides for, in a storage medium used - 31 -
by a database file managemεnt systεm εxεcuted on data processing system, a data structure that includes: an index bεing stored in a storagε mεdium and constructed over the keys of said data records that arε storεd in blocks; the index being arranged in blocks with the leaf blocks being linkεd to data rεcords by means of links; said index is charactεrizεd in that at lεast onε of said links is shared by at least two data records stored in the samε block; said indεx constituting a layεrεd index according to claim 1, and blocks of said basic partitioned index arε linked to said data records.
BRIEF DESCRIPTION OF THE DRAWINGS:
In order to understand the invention and to see how it may be carried out in practice, a preferred embodiment will now be dεscribεd, by way of non-limiting example only, with reference to the accompanying drawings, in which:
Fig. 1 shows a generalized block diagram of a system employing a database file management system;
Fig. 2 shows a samplε databasε structure rεprεsεntεd as an Entity Rεlationship Diagram (ERD), and serving for illustrative purposes;
Fig. 3 shows the database of Fig. 2, represented as tables in accordance with the relational data model, with each table holding few data occurrences;
Fig. 4 shows the "CLIENT" table of Fig. 3, in accordance with file managemεnt systεm employing conventional B+ treε indεx schεmε;
Fig. 5 shows thε "CLIENT" tablε of Fig. 3, in accordance with file managεmεnt systεm employing conventional trie index scheme;
Figs. 6A-6C show the "CLIENT" table of Fig. 3, in accordance with file managemεnt system employing a P.AIF index scheme; - 32 -
Figs. 7A-7H show schematic illustrations exεmplifying construction of a layεrεd indεx, according to onε εmbodimεnt of thε invεntion;
Figs. 8A-B show schematic illustrations exεmplifying construction of a layεrεd indεx, according to yεt another embodimεnt of thε invention;
Figs. 9A-G show schematic illustrations exεmplifying construction of a layεrεd indεx, according to yεt another εmbodimεnt of thε invention;
Figs. 10A-B show schematic illustrations exemplifying construction of a layered index, according to another embodimεnt of the invention;
Fig. 11 shows a schematic illustration exemplifying construction of a layered index, according to still yet another εmbodimεnt of thε invεntion;
Fig. 12 shows a schematic illustration for exemplifying use of designators in a designated index in accordance with one embodiment of the invention;
Fig. 13A-E show five schematic illustrations for exemplifying feature of subordination of data rεcords in a dεsignatεd indεx in accordance with one embodimεnt of thε invεntion;
Fig. 14 shows a schematic illustration of a designatεd indεx εxεmplifying multi-dimension record according to an embodimεnt of the invention;
Fig. 15 shows a schematic illustration of a designated index according to another embodiment of the invention;
Fig. 16 shows a schematic illustration for exεmplifying feature of relations among data records provided in accordance with one embodiment of the invention;
Fig. 17A-B show a schematic illustration of compressεd represεntation of links to data records in accordance with one embodiment of the invention;
Fig. 18A-D show four benchmark graphs demonstrating the enhanced performance, in terms of response timε and filε sizε, of a databasε utilizing a filε managεmεnt system of the invention vs. commercially available Ctreε based database; and - 33 -
Fig. 19A-D show four bεnchmark graphs demonstrating the enhanced performance, in terms of rεsponsε time and file size, of a databasε utilizing a file management system of the invention vs. commercially available Btree based database.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
Attention is first directed to Fig. 1 showing a generalized block diagram of a system employing a databasε file management system of the invention. Thus, a genεral purposε computer 1, e.g. a pεrsonal computer (P.C.) employing a Pentium microprocessor 3 commercially available from Intel Co.rp. U.S.A, has an operating system module 5, ε.g. Windows NT® commercially available from Microsoft Inc. U.S.A., which communicates with processor 3 and controls the overall operation of computer 1.
P.C. 1 further accommodates a plurality of user application programs of which only threε 7, 9 and 11, rεspεctivεly arε shown. Thε usεr application programs arε εxεcutεd by processor 3 under the control of operating system 5, in a .known per se manner, and are responsive to user input fεd tlirough keyboard 13 by the intermediary of I/O port 15 and thε opεrating systεm 5. The user application programs further communicate with monitor 16 for displaying data, by the intermediary of I/O port 17 and operating system 5. The user application programs can access data stored in a database by means of database managemεnt system module 20. The genεralizεd database management system, as depicted generally in fig. 1, includes high lεvεl managεmεnt system 22 which views, as a rule, the undεrlying data in a "logical" manner and is responsive, to thε usεr application program by means .known per se such as, e.g., SQL Data Definition and Data Manipulation language (DDL and DML). The databasε managεmεnt systεm typically exploits, in a .known per se manner, a data dictionary 24 that includes meta-data which maintains information on the underlying data. - 34 -
Thε underlying structure of thε data is govεrnεd by databasε file management system 26 which is associated with the indεx schεmε and actual data rεcords 28. Thε "high-lεvεl" logical instructions (e.g. SQL commands) received and processεd by thε high-lεvεl managεmεnt system 22 are converted into "lower level" commands that access or update the data records that are stored in the database file(s) and to this εnd thε databasε file managemεnt system considers the actual structure and organization of the data records. The "high levεl" and "low level" portions of the database file management system can communicate through a known per sε Application Programmers Interface (.API), e.g. the Microsoft opεn databasε connectivity (ODBC) interface commercially available from Microsoft. The utilization of the ODBC enables "high levεl" modules of the database filε managεmεnt systεm or application program to transparently communicate with differεnt "database file managεmεnt systems" that support the ODBC standard. The terms access or update of data records used herεin εncompass all kind of data manipulation including "find", "insert", "delεtε" and "modify" data rεcord(s), and thε pεrtinεnt DDL commands which afford the construction, modification and delεtion of thε databasε. Fig. 1 further shows, schematically, a storage medium in the form of internal memory module 29 (ε.g. 16 Mεga bytε and possibly εmploying a cache memory sub-module) and an εxtεrnal mεmory modulε 29' (ε.g. 1 gigabytε). Typically, εxtεrnal mεmory 29' is accessed through an extεrnal, relatively slow communication bus (not shown), whereas the internal memory is normally accessed by means of a faster internal bus (not shown). Normally, by virtue of the relatively small size of thε intεrnal mεmory, only those applications (or portions therεof) that are currently executed are loaded from the external memory into the internal memory. By the same token, for large databases that cannot bε accommodated in their entirety in the internal mεmory, a major portion thereof is stored in the external memory. Thus, in responsε to an application gεnεratεd query that seeks for one or more data records in the database, the - 35 -
database management system utilizes operating system services (i.e. an I/O operation) in order to load, through the extεrnal communication bus, onε or morε blocks of data from the eternal to the intεmal memory. If the sought data records are not found in the loaded blocks, additional I/O operations are requirεd until the sought data records are targeted.
It should be noted that for simplicity of presεntation, the internal and extεrnal memory modules 29, 29', arε sεparatεd from thε various modulεs 5, 7, 9, 11, 20. Clεarly, albεit not shown, thε various modulεs (opεrating system, DBMS, and user application programs) are normally stored in the εxtεmal mεmory and thεir currently executed portions are loaded to the internal memory.
Computεr 1 may serve as a workstation forming part of a L^AN Local .Area Network (L.AN) (not shown) which employs a server having also essεntially thε same structure of Fig. 1. To the extent that the workstations and the sεrvεr employ client-servεr basεd protocols a predominant portion of said modules (including the database rεcords thεmsεlvεs 28) reside in thε server.
Those versεd in thε art will readily appreciate that the foregoing embodimεnts dεscribεd with rεfεr ncε to of Fig. 1 are only two out of many possible variants. Thus, by way of non-limiting εxamplε, thε databasε may be an on-line database residing in an Intεmεt Wεb sitε. Thε invention is, of course, not limited to the specified partition of small internal mεmory and largε εxternal memory. Thus, for example, by a modified embodiment a large internal and extεrnal mεmoriεs arε employεd and by yet another modified embodiment only internal mεmory is εmployed.
It should be further noted that for clarity of explanation system 1 is illustratεd in a simplifrεd and gεnεralized manner. A more detailed discussion of database file managemεnt systεms and in particular of thε various components that are normally accommodated in database file management systems can be found, e.g. in Chapter 7 of "Database System - 36 -
Concepts" ibid.
Having described the genεral structure of a systεm of thε invεntion, attεntion is now directed to Fig. 2 showing a sample database structure rεpresεntεd as Entity Rεlationship Diagram (ERD), and sεrving for illustrativε purposεs. Thus, the ERD 30 of Fig. 2 consists of the entities "CLIENT" 32 and "ACCOUNT" 34 as well as an "n to m" "DEPOSIT" 36 relationship indicating that a given client may have more than one account and by thε samε tokεn a givεn account may be owned by more than one client.
As shown, the entity "CLIENT" has the following attributes (fields): "Client_Id" 38 bεing a kεy attribute that uniquely identifies each client, "Name" 39 standing for the client's name and "Address" 40 standing for the client's address. The εntity "ACCOUNT" has thε following attributεs (fiεlds): "Acc_No" 42 bεing a key attribute that uniquely identifiεs εach account, and "Balance" 43 holding the balance of the account. The relationship "DEPOSIT" consists of pairs of keys of the "CLIENT" and "ACCOUNT" entities, such that each pair is indicative of particular account owned by specific client.
Turning now to Fig. 3, therε is shown a databasε of Fig. 2, rεprεsεntεd as three tables 50, 51 and 52 corresponding to thε relational data model, 32, 34 and 36, rεspεctivεly, with εach tablε holding a few data occurrencεs for illustrative purposes. It should be noted that the length of the key field ("Client D") of the "CLIENT" table is 5 digits, whereas the lεngth of the key field ("AccJD") of the "ACCOUNT" tablε is 6 digits. Thε client table holds 5 data occurrences 55-59, thε account tablε holds 2 data occurrences 65, 66 and the deposit table holds 3 data occurrences 70-72.
In accordance with prior art techniques, for each table thεrε is, as a rulε, a diffεrεnt index file by the primary key. Thus, Fig. 4 illustrates an undεrlying indεxing filε of thε "CLIENT" tablε of Fig. 3, in accordance with file managemεnt systεm εmploying thε conventional B- treε indexing schemε. As - 37 -
shown, the indexing file 80 consists of three blocks 80a-c, standing for a root block and two leaf blocks respectively. The data records are organized randomly in a separatε file 81 holding the five data records 83-87. Each block consists of a succession of pair of fields (e.g. 82a-b and 83a-b in block 80a). In εach pair thε first fiεld stands for a sεarch kεy value and the second field stands for a link such as number that identifies the next block to sεarch, or in the case of a leaf block a link to the data record such as a number identifying the data record. The latter realization form a non limiting embodimεnt of associating a data rεcord to a block. In thε specific embodimεnt of Fig. 4, a search for records with a key that εquals 12355 or smallεr valuε arε dirεctεd from root block 80a to block 80b.
Thus, a sεarch for a rεcord whosε kεy is 12355 (82a) starts in root block 80a and is dirεctεd by thε link 82b to block 80b. In block 80b, the search key 12355 (86a) is associated with link 86b indicating the address of the data record identifiεd by this sεarch kεy in thε data file 81. Put differently the data record that is identified by search key "12355" (57 in Fig. 3) is the forth in order in data file 81.
The tables "ACCOUNT" and "DEPOSIT" are likewise arranged in two separate B-treεs tree indexing files, respectively.
The B'treε indεxing filε of Fig. 4 εxhibits onε of the significant shortcomings of this approach in that the keys (i.e. search kεys) arε duplicated, i.e. they are hεld both in thε internal blocks (i.e. in the index scheme) and in the data records associated with the B- treε indεx. Thus, for εxamplε, thε search key of data record 57 (in Fig. 3) is not only held as an integral part of the data record 86 in filε 81 but also in block 80b (sεarch kεy 86a) and sometimes in parent blocks such as 80a (sεarch kεy 82).
This bεing thε casε, one readily notices that for large files (which is thε case in many real-lifε scenarios) the duplication of the search keys (and particularly for long kεys) rεsults in inflatεd indεx which necessitate a large storage volume, which also adversεly affεcts thε performance. - 38 -
Fig. 5 illustrates a differεnt indεxing scheme of the "CLIENT" table of Fig. 3, in accordance with a file managεmεnt systεm εmploying a .known trie indexing schemε. Thus, trie indexing file 90 includes plurality of nodes and links whεrεin each node stands for an offset position and the link stands for a value at this offset. Table 91 has four columns. Thε first column indicates which digit position is to be usεd, thε sεcond column thε valuε of that digit. A digit valuε partitions the key into two subsεts. Columns thrεε and four dirεct thε sεarch procedure to the next step.
In order to locate a given sεarch kεy, ε.g. 12355, a digit at the position indicated by the root (position "5" indicated by nodε 90a, bεing also thε first column in thε first linε of tablε 91) is compared to the value specified at the second column of the same line (valuε "5" indicated also by link 90b in the trie index). Since the digit at position 5 of the sought search key 12355 is indeεd 5, control is transferred to line 2 (as indicated by the third column of line 1 of table 91). Next, the digit at position 3 of the sought search key (90c in the treε, bεing also thε valuε of thε first column of thε sεcond linε in tablε 91) is compared to thε valuε 3 (link 90d, being also the second column in thε second line of the table 91). Since match occurs control is transferred to line 3 in the table. In this step the digit at position 4 of the sought search key doεs not match the value specified at the second column of line threε (i.ε. "5" vs. "4") and accordingly as indicated in the fourth column of table 91 ("not equal") a link to the sought data record 57 (86 in fig. 4) is obtained.
The tables "ACCOUNT" and "DEPOSIT" are likewisε arranged in two separate trie indεxing filεs, rεspεctivεly. In contrast to thε B-trεε indεxing filε of Fig. 4, the one shown Fig. 5 does not necεssitatε duplication of the search key. Put diffεrεntly, only the offsets and the link values and not the entirε kεys arε held in the trie (90). In this sεnsε it constitutes an advantage over the B- technique.
However, and as specified, the above trie is associated with some shortcomings: it retains an evεn distribution of thε data at thε cost of knowing - 39 -
a priori the contents of the database and consequεntly partitioning thε kεys so as to obtain balanced structure. Knowing a priori the contents of the database is obviously undesirεd as it poses undue constraint since databases of the kind described in Fig. 2 are of a dynamic nature, e.g. for thε spεcific databasε of Fig. 2, nεw clients open accounts, senior clients close accounts, nεw clients are registered as co-owners of existing accounts etc.
Another drawback of the above tree is that it does not support sequεntial processing. Navigating in the treε would rεsult in accessing the data by the following ordεr - 83, 86, 87, 84, 85 (fig.4) and not by the order of the kεy.
Having shown a known triε indεx schεmε (with reference to Fig. 5), there follows a description of various embodimεnts of an indεx of thε invεntion which includes basic partitioned index and which cope with the drawbacks dεscribεd above in connection with hitherto .known techniques. Specifically there will be shown a preferred embodiment of the index in the form of layered index, and preferred embodiment of basic partitioned indεx in the form of trie. Thesε εxamplεs are by no means binding.
Before turning to the explanation of the various embodimεnts there is described, with refεrεncε also to Fig. 6A-C, a nεw trie index schemε dεsignatεd P.AIF. As will be shown below, the PAIF is not confined to a treε structure. On the basis of the PAIF, various embodimεnts of layεred index are described, with reference to FIG. 7-9, which include representative index constructed over thε representative keys of the PAIF. By the embodimεnts of Figs. 7 to 9, thε indεx scheme of the representative index and that of the basic partitioned index being substantially thε samε PAIF.
In Fig. 10 thεrε is dεscribεd yεt another embodimεnt of thε layεrεd indεx, with a diffεrεnt triε. As will bε shown, in thε embodiment of Fig. 10, the representativε indεx and thε triε arε also substantially thε samε. This, howεvεr, is not obligatory and as is εxemplified, ε.g. with referεnce to Fig. 11, wherε the trie and thε represεntative index are differεnt. - 40 -
Turning now to Figs. 6A-C, there is shown a succession of schematic illustration of thε "CLIENT" tablε of Fig. 3, in accordance with the file management system employing the P.AIF. The terms "transaction" and "operation" are used interchangeably.
In the description below the basic commands which enable data manipulation in the PAIF will be reviewεd, i.ε. insert new data record to a PAIF, find data record in PAIF, and delεtε existing data record. Those versεd in thε art will no doubt apprεciatε that on thε basis of thesε basic primitives more compound data manipulation opεrations, (ε.g. "Join") may bε rεalizεd.
Turning at thε onset to Fig. 6A, therε is shown thε Cliεnt's data record 103 (56 in table Client of Fig. 3) having search key "12345" (i.e. a 5-bytε-long sεarch kεy). Thε P.AIF of Fig. 6A (100) is, of course, trivial and consists of a single node 101 (standing for both the root nodε and thε leaf node) linked by means of a long link 102 to data record 103.
Thε nodε 100 rεprεsεnts an offsεt 0 in said sεarch kεy and thε link 102 represεnts a value "1" of the search key portion (being by this particular embodiment 1 -byte-long) at the specified offset.
As clearly shown in Fig. 6A, the data record 103 is associated with a search path being a unit that consists of a nodε 101 and a link 102 which defines an offset and a pertinent search key portion valuε that conforms to thε coirεsponding search key portion value at that particular offset within the search key of the specified data record. More specifically, thε value of the onε-bytε search-key-portion at offset 0 within search key "12345" is indeεd
11 1 II
Turning now to Fig. 6B-1 therε is shown a P.AIF 108 aftεr the termination of a successive transaction in which the data record having Cliεnt_Id_No "12445" 107 has bεεn insεrtεd (data occurrence 58 in table Client of Fig. 3). Thε search keys of data rεcords 103 and 107 are distinguished only in the third byte (offset 2), i.e. "3" and "4" respεctivεly.
The unit defined by root node 101 and the link 102 is not sufficient to - 41 -
distinguish bεtwεεn data rεcords 103 and 107, since the value of the 1-byte search key portion at offsεt 0 for both data records is "1". Hence, node 104 indicates thε lowεst offsεt which distinguishes betwεεn thε two records and links 105 and 106 indicate on the rεspective 1-byte sεarch kεy portion "3" and "4" at offsεt 2. It should bε notεd that the realization of the FAIF is not bound by the specific examplεs illustrated in the drawings and various implemεntation thereof may apply, depεnding upon thε particular application. Thus, for example, Figs. 6B-2 and 6B-3 illustrate other two options of realizing the PAIF of Fig. 6B-1, where in Fig. 6B-2 the full key is reprεsεntεd in thε P.AIF (ε.g. all thε digits of thε rεcord 12445 arε spεcifiεd in thε links commεncing from thε root nodε and ending at the data record). Thε latter realization is more explicit and less efficient in terms of space, as compared to the sparse realization of Fig. 6B-3 where only the nodes which arε absolutεly necessary appear in thε tree. Other variants are, of course, applicable
Before moving on to describe a procedure of inserting a new data record to an existing database it should be borne in mind that the higher the node in the trie P.AIF the smaller is the offsεt indicated thereby (e.g. in the P.AIF of Fig. 6B, nodε 101 is highεr than modε 104 and accordingly it is assigned with smaller offset - "0" vs. "2").
Generally speaking, the prefεrrεd procedure for inserting a new data record into an existing P.AIF includes thε execution of the following steps: i. advancing along a reference path commencing from the root node and ending at a data record associated to a lεaf node (referred to as "reference data record"); in each node in the refεrεncε path, advancing along a link originated from said node if the value reprεsεnted by the link equals the value of the 1-bit-long key portion at thε offsεt spεcifiεd by said nodε; in thε casε that thε offsεt spεcified in the node is beyond any corresponding key portion in the key, or if therε is no link with said value, advancing along an arbitrary path to any refεrεncε data rεcord ; - 42 -
ii. comparing thε search key of the reference data record to that of the new data record for determining the smallest offsεt of thε sεarch kεy portion that discerns the two (hereinafter discerning offset). iii. proceed to one of the following steps (iii.0-iii.3) depεnding upon thε valuε of thε discerning offset: iii.O if the data records are equal then terminatε; or iii.1 if thε discerning offset matches the offset indicated by one of the nodes in the rεfεrεnce path, add another link originating from said one node and assign to said link the value of the search key portion at the discerning offset takεn from thε sεarch kεy of thε nεw data record; or iii.2 if the discerning offset is larger than that indicated by thε lεaf nodε that is linkεd, by means of a link, to the refεrεncε data rεcord: iii.2.1 disconnect the link from thε rεfεrεnce data record (i.e. it remains temporarily "loosε") and movε thε link to a nεw nodε; thε nεw nodε is assignεd with a value of the disceming offset; iii.2.2 connect the refεrεnce data record and thε nεw nodε (which now bεcomεs a lεaf nodε) and assign to the link (long link) a value of the search-kεy-portion at thε discerning offset taken from the search key of thε refer- ence data record; iii.2.3 connect by means of a link the nεw data rεcord and the new node and assign to the link (long link) a value of the search-kεy-portion at thε discerning offset taken from thε search key of the new data record; or iii.3 if conditions iii.0,iii.1 and iii.2 are not mεt, thεrε εxists, in the refεrence search path, a father node and a child node therεof such that thε discerning offset is, at the same time, larger than - 43 -
thε offset assigned to the father node and smaller than the offsεt assignεd to thε child nodε -(- considered case A), or all the nodes in the refεrεnce search path have a value greater than the disceming offset - (-- considered casε B); accordingly, apply thε following sub-stεps: iii.3.1 for casε A and B, create a new node and assign the node with the value of said discrening offsεt, for casε A only - disconnect the link from the father node to the child node and shift the link to a new internal node (i.ε. the child node remains temporarily "loosε"); iii.3.2 for casε A and B, connect by means of a link (long link) the new data record and said new internal nodε; thε valuε assignεd to thε link is that of thε sεarch-kεy-portion at the discerning offset, as taken from the sεarch kεy of thε nεw data rεcord; iii.3.3 for casε A and B, connect by means of a new link thε nεw node and for case A - the child node, for case B - the root nodε (i.e. the new node becomes for case A - a new fathεr nodε, for casε B - a nεw root nodε), and the value assigned to said link is the sεarch-kεy-portion at thε offsεt indicated by the new node, taken from the search key of the refεrence data record. UUH It should bε notεd that for a different reference path a different PAIF may be obtained.
For a better understanding, the aforemεntionεd "insεrt data rεcord" operation will be successivεly appliεd to thε spεcific PAIF of Fig. 6B, εach timε with a diffεrεnt data rεcord so as to exemplify the threε distinct scenarios stipulated in steps iii.l - iii.3. above, therεby rεsulting in three PAIF illustrated in Figs. 6C-1 to 6C-3, respεctively.
In the first εxamplε the CLIENT data record having Client_Id (or - 44 -
sεarch kεy) "12546" (59 in tablε Cliεnt of Fig. 3) is inserted to the P.AIF of Fig. 6B. As stipulated in stεp (i), a movε is madε along thε rεfεrεnce path commencing from the root 101 and ending, for εxample, at data record 103 which stands for thε "reference data record". This being implemεntεd by advancing from node 101 along link 102 (where in offset '0' of the insεrted data record the value of the 1 long digit is ' 1 ') and thereafter since at offset '2' (as specified by node 104) nonε of the values of links 105 and 106 (4 and 3 respectively) matches the value of the insertεd key at offset 2 ('5') advance is made at arbitrary path (by this particular embodimεnt through link 106) to thε rεfεrεncε data rεcord 103.
Thε comparison opεration stipulated in step (ii) results in that the search key of the new data rεcord in distinguished from the search key of the reference data record (103) at offsets 2 ("5" vs. "3") and 4 ("6" vs. "5"). The smallest offsεt ("discerning offset") is therefore 2.
Turning now to step (iii), thε condition of step iii.1 is met since thε discerning offset is εqual to that assignεd to nodε 104. Accordingly, and as is shown in Fig. 6C-1, nεw link 111 connects node 104 to thε nεw data rεcord 112. Thε value assigned to link 111 is 5, bεing thε bytε value at position 2 in the search key of the new data record 112. P.AIF 110 of Fig. 6C-1 is therefore the result of inserting the data record 112 into the PAIF 108 ofFig. 6B-l.
Moving now to the second example, the CLIENT data record having Client_Id (or search kεy) "12355" (57 in tablε Cliεnt of Fig. 3) is insεrtεd into thε P.ALF of Fig. 6B-1. Steps i and ii, stipulated above result in a refεrεncε path starting at nodε 101 and εnding at data rεcord 103.
Turning now to stεp (iii), the condition of step iii.2 is mεt since the discerning offset 3 is larger than the offset 2 of lεaf node 104 in the refεrεncε search path. Accordingly, in compliance with step iii.2 J and as is shown in the resulting PAIF 120 of Fig. 6C-2, thε link 106 is disconnected from reference data record 103 and is connected to a new node 121. The new node - 45 -
is assigned with thε discerning offset 3. Next, in compliance with step iii.2.2, the refεrεnce data record 103 and the new node 121 are connected by means of new link 122. The nεw link is assignεd with thε valuε 4 (being the digit value at the disceming offset 3 taken from the search key "12345" of the reference data record 103); and finally, as stipulated in step iii.2.3, the new data record 123 is connected to node 121 by means of link 124 which is assigned with the valuε "5" (bεing thε digit at thε disceming offset 3 taken from thε sεarch kεy "12355" of thε new data record 123). PAIF 120 of Fig. 6C-2 is, therefore, the result of inserting the data record 123 into the PAIF 108 of Fig. 6B-1.
The third and last εxamplε concerns inserting the CLIENT data record having Client_Id (or sεarch key) "H346" (55 in table Cliεnt of Fig. 3) into thε PAIF of Fig. 6B-1. Applying thε aforεmentioned stεps i and ii result in advancing from node 101 to data record 103 (in Fig. 6B) and establishing that the disceming offset is 1.
Thus in step iii, thε condition of step iii.3 is met. Accordingly, in compliance with step iii.3 J and as is shown in the rεsulting PAIF 130 of Fig. 6C-3, thε link 102 is shiftεd to a nεw intεmal node 131. The new internal node 131 is assigned with the value 1 (bεing thε discerning offset). As stipulatεd in step iii.3.2, the nεw data rεcord 132 and node 131 are directly connected by means of new link 133. The value assigned to link 133 is 1 (being the digit at the disceming offset 1 taken from the search key "H346" of thε new data record 132), and finally, in compliance with step iii.3.3 the new internal nodε 131 is linked to node 104 by mεans of link 134 assignεd with thε valuε 2 (being the digit at thε discerning offset (1) taken from the search key "12345" of the reference data record 103).
Although the PAIF described above with referεncε to Fig. 6A-6C may bε accommodated within one block it is nεvεrthεlεss prεfεrablε to sεparatε bεtween "nodεs" and "data rεcords" such that data rεcords are grouped in a distinct file or files. Applying this approach to thε PAIF of Fig. 6C-3, results - 46 -
in thε generation of thε data rεcord filε holding thε records 132, 103, 107. Links 133, 106 and 105 bεcomε, of course, long links.
Obviously, if an insert procedure results in finding that the data record to be insertεd already εxists in thε P.AIF an appropriatε εrror message is reτurnεd to the procedure that invoked thε Insεrt command.
It should bε notεd that in thε lattεr εxamplεs it is assumεd that thε εntirε P.AIF rεsidεs in a single block. Obviously when additional data rεcords arε inserted by following the foregoing "insεrt procedure" a block overflow may occur, which necessitatεs (as will bε εxplained in greatεr dεtail below) invoking "split block" procedure, and thereafter it is nεεdεd to advance to the sought block and perform the insert procedure in thε manner specified above. Having described a typical "Insert" transaction, a "Find (or Retriεvε) data rεcord" transaction will be now described. Thus, for finding a data record by a given sεarch kεy (hεrεinaftεr thε sought data record) in an existing P.AIF, the following steps should be εxεcuted: i. advance along a search path commencing from the root node and εnding at a data rεcord linked to a leaf node, and for each node in the sεarch path (hεr inaftεr "current node") perform thε following sub-stεps: i.l for εach link originated from the current node: compare the search-kεy-portion of thε sought data rεcord at thε offsεt defined by the current node value to a valuε assignεd to said link; in casε of a match advance along said link and return to step i.1 ; i.2. if nonε of the links originated from the current node matches the search-kεy-portion of thε sought data rεcord, return "NOT FOUND" and tεrminatε thε find procedure; i.3 if a data record is reached (hereinafter "rεfεrεncε data rεcord"), compare the sεarch kεy of the sought data - 47 -
rεcord as a whole, to the key of the reference data record; i.3.1 in casε, return "FOUND" (and in case of "Retriεvε", return also thε entire data record) and terminatε thε find procεdurε; or i.3.2 in the case of mismatch return "NOT FOUND" and terminatε the find procedure. For a better understanding the "find" procedure will be applied, twice, to the specific P.AIF of Fig. 6C-3 giving rise to "found" and "not found", results respectively.
Thus, consider a find data record by search key "12445" (herein after sought data record). According to step i.l the value of the digit "I" at the offset assigned to the root nodε (offsεt 0) of thε sought data rεcord is compared to the one assigned to link 102 (being the sole link originated from node 101). Since a match is found, control is shifted to node 131. Again according to step i.l the valuε of the digit ("2") at the offset assigned to node 131 (offset 1) of the sought data record is compared to the onε assignεd to link 134. Here also a match is found so control is shifted to node 104. Next, according to stεp i.l, thε value of the digit "4" at the offsεt assignεd to nodε 104 (offset 2) of the sought data record is compared for εach link originating from mode 104. The comparison results in a match for link 105 and accordingly control is shifted to data record 107.
According to stεp i.3 the search key of the sought data record and that of data record 107 are compared and since a match is found a "FOUND" result is returnεd (stεp i.3.1).
Turning now to a sεcond εxamplε, consider the case when the sought data record has a search key "12463". Thε procedure described with reference to the previous example is rεpεated, however at step i.3 the comparison betwεεn the sought data record and data record 107 results in a mismatch, and - 48 -
according to step i.3.2 a "NOT FOUND" result is returnεd.
A gεnεral "Dεlete Data Record" transaction will now be dεscribed. Thus, as a first stage a "Find data record" transaction is applied to the PAIF. In case of "NO FOUND", an appropriate error message is retumεd to thε procεdurε that invokεd thε "Dεlete" command. Altematively, the sought data rεcord is found. For clarity of explanation of the "Dεlεtε" procεdurε, thε following nomεnclaturεs arε introduced:
The leaf node that is linked to the sought data rεcord is rεfεrrεd to as thε "targεt node". The father of the target nodε is rεfεrrεd to as thε "predecessor target node". The link that connects the predεcessor target node to thε targεt nodε is refeirεd to as thε "prεdεcεssor link" and thε link that connects the target node to a child nodε thereof (or to a data record other than the sought data rεcord) is referred to as thε "targεt link". Bεarϊng this nomεnclaturε in mind, the following steps are exεcutεd: i. deletε the sought data record and the link that links the targεt nodε thεrεto; ii. if thε numbεr of links that remain in thε targεt nodε is largεr than or εqual to 2, then the dεlεtion procεdurε tεrminates; iii. if, on the other hand, thε numbεr of links that remain in thε targεt nodε is εxactly onε (i.ε. onε targεt link), then: iii.l "bypass" the target node by connecting the prεdεcessor link from the predecessor node to said child node (or to a data record); and iii.2 delete the target node and the target link; tεrmi- nating the delεtion procεdurε. It should bε notεd that the current stεp is morε of "prudent memory managemεnt" stεp in ordεr to rεlease thε space occupied by the target node and link, so as to enable allocation therεof to other nodes and links in the block. It should be further noted that said stεp (iii) is optional. - 49 -
For a bεtter understanding the foregoing "delεtε data rεcord" procεdurε will bε appliεd to thε specific P.AIF of Fig. 6C-3.
Thus, responsive to a command "delεtε rεcord having sεarch key = "H346", the latter record is searched in the PAIF according to the procedure described above. Having found the data record 132 and in compliance with step i above, the data record as well as the link 133 leading thereto arε both dεlεtεd. Sincε aftεr the latter delεting stεp, the target node 131 remains only with thε solε targεt link 134, stεp iii and iii.l apply, and accordingly thε predecessor link 102 bypasses targεt nodε 131 and is directly linked to the child node thεreof 104. Next, in compliance with step ii.2, target node 131 and the target link 134 are delεtεd therεby obtaining thε ?A1¥ shown in Fig. 6B-1. .Another Example is given with reference to the P.AIF of Fig. 6C-1. Thus, responsivε to a command "delete record having search key = "12546", the latter record is searched in the P.AIF according to the procedurε described above. Having found thε data rεcord 112 and in compliance with step i above, the data record as well as the link (111) leading thεrεto are both delεtεd. Sincε, as stipulatεd in stεp ii, thε numbεr of links that remain in the target node 104 is two (i.e. links 105 and 106), then the deletion procedure termi- natεs. Thε rεsulting P.AIF is again thε onε shown in Fig. 6B-1.
.Anothεr common primitive is the "Modify existing data record", e.g. change the home address of an existing client. The "Modify" primitive is normally realizεd by sεlectively utilizing the aforementionεd primitives. For executing a "Modify" command one should distinguish bεtwεεn thε following cases:
1. The "modify" applies to fields other than the search key (e.g. modify thε address of a cliεnt having Cliεnt_Id_No ="xxxxx") - in this case the modify procedure simply involves a "Find" operation (data record having Client_Id_No ="xxxxx"). Having found thε sought data rεcord, the old address is replaced by a new one.
2. The "modify" applies to a search key fiεld (e.g. change an account - 50 -
no. from "xxxxxx" to "yyyyyy"). This command is realized as a sequεncε of two othεr primitives, i.e. delete data record having "Account_No" ="xxxxxx" and thereafter insert data record having "Account_No" ="yyyyyy", or vice versa. Obviously a Modify transaction may consist of both cases.
In the previous examplεs each search key is represented as a series of bytes and accordingly the search procedure is performεd by partitioning thε sεarch-kεy into sεarch kεy portions εach consisting of at lεast onε bytε.
Thosε vεrsed in the art will readily appreciate that bytes are not the only possible reprεsεntation of a sεarch key. Thus, for example, a search key can be represented in binary form, i.e. a sεriεs of l's and 0's and accordingly thε sεarch procεdurε is performed by partitioning the search-kεy into sεarch key portions each consisting of one bit (i.e. 1=1) or more, e.g. one byte (i.ε. 1=8 bits) and others. In certain scenarios, it may well be the case that the / value is not identical for all the nodes in the P.AIF.
It should be further noted that differεnt links in a given PAIF may be assignεd with sεarch-kεy-portions of different length as long as the respεctivε sεarch-kεy-portion is .known thε corresponding node.
As is clearly evident from thε various PAIF of Figs. 6A-6C, thε data rεcords are hεld in a sorted foim according to search key. Navigating , for example, in the PAIF of Fig. 63-C (from right to left) brings about the ordered series "11346", "12345" and " 12445". This characteristics constitutes yεt anothεr advantagε which εasε data manipulation as compared to the tree of Fig. 5 whεrε thε data rεcords arε not sorted. As spεcified before, a node in the P.AIF is not necessarily classified uniquely. Thus, for examplε, in thε PAIF 120 of Fig. 6C-2, nodε 104 is at thε samε time a leaf nodε (linkεd, by mεans of a long link 105 to data rεcord 107) and an internal node (linked by means of a short link 106 to node 121).
Those versed in the art will readily understand that the "Insert", "Deletε" "Find" and "Modify" procedures described herεin are only one out - 51 -
of many possiblε variants for rεalizing thεsε procedures and they may be modified, all as required and appropriatε dεpending upon the particular implεmεntation.
The specified insert, delεtε and find transactions apply to a so called intra-block transaction. As will be explained in greater detail bεlow, applying thε lattεr transactions in intεr-block context necεssitatεs to addrεss fεw scenarios which are irrelevant in the intra-block operation.
Having explained the structure of the PAIF trie, there follows a description of various εmbodimεnts according to thε invention, wherε thεrε is shown a layerεd indεx basεd on a P.AIF indεx schεmε that includes a PAIF treε (as basic partitioned index).
Turning now to Fig. 7A-H, therε arε shown schematic illustrations of a layerεd index constructed in response to a succession of split block operations, according to one embodimεnt of the invention. Considεr for example a block 140 in Fig. 7 A (in the basic partitioned index) which overflows in terms of memory space. This being the case a "split block" procedurε is invokεd which results in a layεrεd indεx 142 of Fig. 7B consisting of root block 144 and a duplicated node A' (155) linked to leaf block 146 by means of direct link 145 and by means of long link 147 to a leaf block 148.
By this specific example, the split point was selεcted to be link 149 (fig. 7A) (herεinaftεr "split link") thεrεby shifting nodεs A,B,E D and F to nεw block 146 and nodεs C,G,I,J,K,L and H to a nεw block 148. Thε split link is prεfεrably sεlected in ordεr to accomplish an εssεntially even distribution of nodes and links between the new blocks (e.g. the size of the sub P Fs that resides in blocks 148 and 146 is essεntially thε samε). In thε casε that a fathεr block does not exist, a father block -144 (constituting Ix ) is created with a duplicated node A' (155) of the split node A (156). In the case that a duplicated node of split node from which the split link is originated does not already residε in thε fathεr block 144, the node is copied - 52 -
to the latter block (marked A') and the connection bεtween A' (155) node and the block in which A residεs (146) is implεmεntεd by mεans of said direct link 145. The split link 149 (being originally a short link between A and C ) is replaced by long link 147 betwεεn A' and thε block in which C resides. Optionally nodes A and C (156,153 respectively), may also bε linked by means of split link marked as dashed line 150.
The net effεct is that in Fig. 7B thεre is provided a layered index constituted by blocks 144, and the blocks of thε triε are 146 and 148. Those versεd in thε art will readily appreciate that it is now possible to access or update data rεcords not through thε triε (i.e. commencing from node A 156 ), but rather through thε layered index (i.e. commencing form node A' 155). In this connection it should be noted that link 147 has thε samε valuε as link 150, which in turn has thε value of original link 149 of Fig. 7A.
Considering now that block 148 overflows it undergoεs similar block split procεdurε rεsulting in layered index 151 in Fig. 7C. By this examplε thε split link is short link 152 of Fig. 7B and accordingly nodεs C and H reside in block 148A of Fig. 7C whereas nodes G,I,K,L and J r sidε in block 148B. The node from which the split link originatεs (node C -153 of Fig. 7B) is duplicated (yielding a duplicated node 153a of Fig. 7C) and placed in block 140 marked C. As before, direct link 154 connects the copied nodε C 153a to thε block 148A of thε original split nodε 153 whilst thε link 155 is a far link to thε split block 148B and thε valuε of the link is as the original value of link 152 betwεεn nodεs C and G bεforε (and after) the split.
In Fig. 7C, the layerεd indεx 151 is constituted by the trie that includes blocks 141, 148A and 148B forming and block 16 which forms a representative index over the common kεys of thε triε.
It should be noted that in Fig. 7C nodε A in block 141 and nodε C in block 148 A arε optionally disconnected and likεwisε nodε C of 148A and nodε G of 148B arε optionally disconnεctεd. As is clearly shown, nodes A ' and C are connected in block 140 to form a (connected) trie and it is - 53 -
accordingly possible to access blocks 141 through node A' and direct link 156; block 148A through node A', C and direct link 154; and block 148B through nodεs A', C and dirεct link 155. It is noteworthy that the value of the link bεtwεεn nodεs A' and C (in block 140) is identical to the original value betwεen nodes A and C (seε link 149 in Fig. 7A).
As is clearly seen in Fig 7C, the resulting layerεd indεx constitutes a balanced structure of blocks thereby keεping thε index depth to a minimum and consequεntly minimizing thε numbεr of accesses (normally, although not necessarily, I/O operations) that are requirεd in order to find, insert or delete a given data record. Considering now that in order to access data record the layerεd indεx maintains substantially logarithmic function that depends on the number of records, the layerεd indεx is morε εffrciεnt in tεrms of numbεr of 1 0 opεrations rεquired for access a given data rεcord as compared to the numbεr of I/O opεrations required to access a data record through the trie. Thus, for example, for accessing data record that is associated with node J through the layerεd indεx , it is required at first to accεss block 140 and thεrεaftεr block 148B and thereafter the sought data rεcord (i.e. threε I/O opεrations). In contrast, accessing the same data rεcord through the trie brings about 4 I/O accεssεs, namεly block 141, block 148 A block 148B and data record 159. As shown there are few particular instances that the trie is more efficient (e.g. accessing data record associated with node A), however, the larger the trie (i.e. constituted by more blocks) the more εfficiεnt is thε access through the indεx of thε layεred index.
By the particular embodimεnt of Fig. 7, thε rεprεsεntativε indεx and thε triε (bεing one embodiment of basic partitioned index) comply with substantially thε same index schεmε i.e. the P.AIF. By "substantially" thε samε schεme it is meant that thεrε arε somε diffεrεncεs as will εxplainεd with r fεrence to Fig. 9G bεlow.
The considerations in connection with duplicating nodεs to highεr layεrs I j in the layered index are further illustrated with referεncε to - 54 -
additional εxamplεs dεpicted in Figs. 7D to 7H. Thus, Consider the layerεd index of Fig. 7D where block split is performεd in link 400. Thε rεsulting layεred index is illustrated in Fig. 7E, whεrε block 402 is crεatεd node 401 is copies to higher level block 402 (forming part of the layerεd indεx schεmε) and the original link betwεen nodes B and E is optionally retainεd (through dashed link 403). Through node B it is now possible to access the two blocks of the triε (405 and 406), by means of links 407 and 408, respectively.
Next, should it now bε rεquirεd to split block 405 at, say link 409, thε rεsulting structure appears now in block 402 of Fig. 7F, wherε nodεs A and I of block 405 arε duplicated to A' and I' (410 and 411) in block 402 . Node F is obviously a duplicated node of the split nodε I in block 405. However, node A is also copied considering that both nodes B (whose counterpart B' is a priori residing in block 402) and I (whose F is now duplicated to block 402) are descendεnt nodes of A. Node A being the lowest ancestor node of nodes B and I, and thus a (connected) trie is formed in block 402. The valuε associated with short link 414 (betwεεn blocks A' and B' in block 402) is of thε samε valuε as link 412 (bεtwεεn A and B in block 405). Thε valuε of thε link 415 (bεtwεεn nodes A' and F) in block 402 is of the same value as that of link 413 which originates from node A in the direction neεdεd to access node B. The internal structure of block 402 is such that it allows a search to thε represεntativεs of blocks 405, 406 and 407.
Thε direct links 416, 417 of nodes 422 and 411 arε optionally rεtainεd since it is possible to move along direct link 418 to block 405, seεing that node 410 is maintained in thε access path to both nodes 422 and 411.
Fig. 7G shows the resulting layerεd indεx after splitting block 407 of Fig. 7F (in link 420) and Fig. 7H shows thε rεsulting layεred indεx aftεr splitting block 402 (in the link between nodes I' and N'). The resulting layered index in Fig. 7H has, as shown three layers, the first consisting of block 430, the second consisting of blocks 402 and 408 and the trie consisting of blocks 405, 407, 426 and 406. - 55 -
Those versed in the art will readily appreciate that the manner of realizing split block is, of course not limited to the examples of Fig. 7D to 7H.
Having described an embodimεnt of constructing a layεrεd indεx by split processes resulting from the succession of insert transaction (with refεrεncε to Fig. 7), it will bε appreciated that the oppositε procedure, i.ε. "Deletε block" is activated when a data record is deleted leaving only one node in a block having no data records associated therewith.
Those versεd in thε art will rεadily undεrstand that thε layεrεd indεx described with refεrence to Fig. 7 is only one out of many possible variants for realizε thε layered index, where the representative index and thε basic partitioned index being substantially the same.
The utilization of a P.AIF in the mannεr spεcified constitutes an advantage over somε of thε hithεrto .known triεs in thε sεnse that the so accomplished layerεd indεx has a balanced structure of blocks despite the fact that the triε per se may possibly bε unbalanced.
Attention is now directed to Figs. 8A-BB showing respεctivε two illustrations εxεmplifying the application of the technique of thε invεntion to a according to another embodimεnt of thε invention.
Thus, Fig. 8A illustrates a given trie structure having vertical orientation (i.ε. constituting a vertical treε) which, as shown, is unbalanced i.e. three blocks depth (260, 261 and 262) vs. two blocks depth (260 and 264). The description below does not aim at explaining the search scheme of the specified vertical treε but εmphasizεs only thosε aspects which are requirεd to obtain balanced layered index. It should nevεrthεlεss bε notεd that thε nodεs in trie structure 260, signify offsets in a half byte size. (The nodes valuεs arε presented in hexadεcimal represεntation) of thε data rεcords (a-k) that arε shown in Fig. 8A.
It should bε notεd that an εxtra I/O opεration, i.e. accessing threε blocks - (or tlirεε I/O operations) in order to access data record k as compared to one block (or one I/O operation) to access data record b as depictεd in Fig. - 56 -
8A, may bε rεgarded as balanced. In some real-lifε scenarios this does not necessarily requirε applying the technique of the invention in order to bring about εxactly thε samε number of I/O operations. Of course, further insertions of data records may genεratε highεr "unbalance" degrεε, which, if not handled by the technique of the invεntion, will give rise to degradεd performance (due to the unbalanced structure) as discussed in detail above (with referεnce to prior art techniques).
Fig. 8B illustrates one possible embodiment of the invention. As shown, a reprεsεntativε indεx that consists of onε block 270 (forming I/) is constructed with the result that horizontal balanced tree is obtained having a root block 270 from which all the blocks of thε lowεr lεvεl vεrtical trεε (thε lattεr constitutes the unbalanced triε) arε accessed through one I/O operation.
As shown, the actual access to thε blocks in the first vertical tree (being the trie) arε achiεvεd by mεans of thε common kεy valuε of εach block. Bεforε proceeding any further the term common key will bε exemplified with reference to Fig. 8.
The common key of block 260 (in hεxadεcimal rεprεsεntation of half bytε units) is 0x4, Oxl and 0x3, whεre 0x4 stands for the most signficant bits of the bytε of the character A and Oxl stands for the least significant bits of the Character A, and Ox 3 stands for the most significant bits of the characters which reside in offset 2 of the data records.
It should be noted that all data records that can be accessεd through block 266 share the common key prefix specified above. - 57 -
In the same manner, the following table summarizes the common key of each block:
BLOCK COMMON KEY
NO.
260 0x4, 0x1, 0x3
261 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3
269 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3,0x3, 0x3, 0x3, 0x3, 0x3
264 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x4,0x3
Figure imgf000059_0001
It should bε notεd that block 261 can accommodatε a root nodε with valuε 8, thus, the common key, hereafter k of the block, is changed to be 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, i.e. it consists of 8 units. In this case, the represεntative of block 261 in 11 should be changed accordingly. In a a differεnt implementation, the representative of 261 is k, even if the root nodε with thε value 8 does not exist.
The indεx ovεr the common keys is accomplished in the represεntativε indεx (consisting of block 270) such that it constructs a trie that addressεs thε common kεys of thε first vertical treε. Now, for εxample, in order to find data record g, one follows node 290, link 291 to node 292. Then, one advances with the dirεct link 293 to block 261, which is associated with data record g. Thε rεsulting layεrεd index is balanced.
As spεcifiεd above, for the specific casε of triε, thε rεprεsεntativε kεy of a block bεing a common kεy. Generally speaking, the common kεy of a block is thε longest prefix of all keys of thε data rεcords that can bε accεssεd from thε block by thε relevant index scheme. For the P.AIF, thε specified prefix size (calculated in 1-bit-long units) εquals thε valuε of thε root nodε in the block (which as recalled holds offset value). If the prefix sizε is εxprεssed as number of bits, then the prefix size is calculated as the offset value multiplied by the 1-bit-long value. - 58 -
Thεrε follows now a dεscription of yεt anothεr εmbodiment of constructing a layered index of the invention with reference to Figs. 9A-9G.
Accordingly, attention is now directed to Figs. 9A-9G showing a succession of modify (insert) transaction on a PAIF treε (constituting a triε that is suscεptiblε to an unbalanced structure) and the so obtained layerεd indεx. For convεniεnce of presentation, the data records are shown as foiming part of the trie. As spεcifiεd abovε, the actual manner in which the data records are associated to thε trie may vary depεnding upon the particular application.
In the following figures, a layεrεd indεx is constructed by inserting successivεly thε following unsortεd data rεcords A-F (which for convenience of presεntation form part of thε blocks): Thε data string is prεsεntεd as a sεriεs of bits whεre the 1-bit portion stands for 1 : A=001000011 B=110011100 C=011011111 D=011011011 E=101010101 F=l 11111111
In thε first step (Fig. 9A), record A is inserted whereafter Block 300, includes node 301 having offsεt 0, being associated to first record A through link 302, having the value 0. At this stage, thε trεε consists of Block 100 having only onε nodε. Thε index schemε dictates that the sεarch path to data rεcord A is dεtεrmined according to value 0 at offsεt 0 as depicted on link 302 and node 301, respectively.
Thereafter (Fig. 9B), data rεcord B is inserted, in which, as can be clearly sεεn and distinguished from data record A, in offset zero, thε kεy valuε is 1 and, accordingly, link 302 lεads to data rεcord B and assigned with the valuε 1.
Thεrεaftεr (Fig. 9C), data record C is inserted, and the valuε thεrεof in - 59 -
offsεt 1, sεrvεs for distinguishing it from rεcord A. Links 303 and 304 connect node 305 (standing for offset 1) to the specified data records C and A rεspεctivεly. Sincε Block 300 accommodates nodes 301 and 305, it is not required, as yet, to split the block.
Next, data record D is insertεd, and the structure of the block following the insert operation is shown in Fig. 9D. Since, howεvεr, thε data block cannot accommodate more than two nodes (overflow occurs), it is now required to split Block 300. Fig. 9E illustrates the treε structure after splitting. Thus, link 306 is the split link with the motivation that approximately the contents of a half block will bε rεtainεd in Block 300, and thε contents of the remaining half block will bε movεd to another block 310. Of course, other links could bε likewise selεctεd to bε the split link.
As a first stage, block 300 in I. is replaced with two blocks 300 and
310. The nodes 0,1 (designatεd as 311 and 313, rεspεctivεly) and thε data rεcords A and B arε rεtainεd in thε splitting block 300, whereas node 6, data records D and C, (standing in this particular embodiment for the remaining nodεs), arε movεd to block 310. Accordingly, thε basic partitionεd index of Fig. 9E consists now of two blocks 300 and 310 (which in fact constitute the unbalanced trie).
Thereafter, since the block of Bi does not exist, it is creatεd, and, accordingly, block 312 is provided. The split node (313) is copied to the block (312) to thereby constitute a duplicated nodε (314). Nεxt, thε duplicated node (314) is connected by means of direct link 316 to block 300, and the duplicated node 314 is linked by means of a far link 318, to the block 310. This far link replaces thε original split link 306 that is markεd in Fig. 9E in a dashεd linε. The value of the far link 318 is the same as the value of the split link. Thus, the reprεsentative index (constituted by block 312), allows to search according to thε common kεys of thε basic partitionεd indεx.
It should be noted that thεrε are no constraints as to whethεr the split - 60 -
link should be deleted or retained. As shown, the so obtained horizontal tree that constitutes thε layered index (consisting herε on blocks 312, 300 and 310, of which 312 belongs to the reprεsεntativε indεx) is balanced.
Next, data record E is insertεd. In this casε advancing in the horizontal treε (being onε foim of the layerεd indεx) from thε first nodε 314 of block 312 (having a value 1) is not possible by means of the far link 318 since it reprεsεnts direction 1 from nodε 314 (having a 1) valuε, and a link in direction 0 is required. Therεforε advancing by means of the direct link 316 to block 300. Thus, the block that needs to be associated with the new data record is found. In the same way data record F is insεrtεd rεsulting in a trεε structure shown in Fig. 9F.
Next, if a split between node 320 and node 321 of block 300 is perfor εd, nodε 320 is copiεd to block 312 (dεsignatεd 323 in Fig. 9G) and since it can not be linked to node 314 of block 312 (since it will not retain the correct inta-block links of thε nodes) - node 311 of block 300 is also copied to block 312 (designated 322 in Fig 9G) in order to crεatε a (connεctεd) triε that εnablεs to sεarch by thε sεarch schεmε to blocks 300, 326, 310 according to the common keys of the blocks.
It should also be noted that instεad of having dirεct links from all copied nodes 314,322,323 of block 312 in Fig 9G, it would be sufficient to have one such direct link from the copied node (322) to block 300. A far link 324 from node 323 is set to block 126 in the direction of the link before the split (the direction of link 315 of Fig 9F). Obviously, if another split is performεd in block 326, it would be reprεsεntεd in block 312 by a nodε connεctεd from nodε 323 by link in direction 1 having a direct link to the B;-!, and a far link to block Bj-r-
Figs. 9A-G and 8A-B illustrate two of many possiblε mannεrs of rεalizing thε split block mechanism that maintains the balance structure of thε invεntion by constructing a layεrεd indεx. The flexibility in adopting another non-limiting variant is shown e.g. in fig. 8B where the near link 271 and - 61 -
direct link 272 are rεprεsεntεd by far link 273 (marked in dashεd linε) with direction as of link 271 rεndεring thus nodε 276 redundant.
Insofar as many embodiments are concemεd, thε balance technique of the invention confers to the so obtained balanced horizontal oriented digital treε (bεing one form of the layerεd index structure) a so called "probabilistic access " characteristics. This means that a sεarch in connection with an input data record (e.g. search for a data record A), may lead to a different data record or to a node where there is no link to the direction prescribed by the index scheme and may requirε to apply "correction" in order to evεntually access the sought data record.
For a better understanding of the foregoing consider, for examplε, Fig. 9E. Consider for example that a search transaction is applied to the layered index of Fig. 9E with the sought data record L= 111011110 . Thε sεarch path will follow nodε 314 and link 318 (offsεt 1 value 1, respεctivεly) and thεn at offsεt '6' (root nodε of block 310) through link 319 (valuε ' 1 ') to data rεcord C. Thε lattεr example exεmplifiεs the probabilistic search characteristics of the so obtained layerεd index.
In order to resolve the specified failure, the size of thε common prefix of the kεy of the sought data record and thε kεy of the data record is calculated. The common kεy of thε block (310) is the prefix portion of thε kεy of thε actual data record C. Thus, the size of the common prefix is zero. Next, climb up the treε to the nodε in the access path that has a value equal to or less than the common prefix size that has a dirεct link . If thε lattεr requirement is not met, i.e. all the nodεs have a value greater than the calculated prefix size, then from thε first node in the access path that has a direct link (which should point to the first block of the indεx /,.;). Now, from thε nodε 311 move by means of direct link 316 to thε lowεr level vertical orientεd trεε (i.ε. to layεr /,_/) and thεrεfrom continue the search path as prescribed by the index schεmε. - 62 -
According to another scenario, should the indεx schεmε prescribes to go in a given direction and there is no link in the desirεd direction, the search path follows the direct link from a node with the largεst valuε on the search path (that maintains a direct link). When advancing from block to block, a comparison to the common kεy (if availablε) or to data rεcords associated with nodes (if available) can lead to a decision as to whεthεr or not to advance by the index schemε or to return to a node with a direct link. It should bε notεd that thε common kεy is not nεcεssarily physically attached to the data records.
Revεrting to the previous example (sought data rεcord L) and associated data record C of fig. 9E, if the common key of block 310 (being 011011) is maintained in the block it is not needed to access data record C. Thus, since the common prefix of the key of L and the common key of the block is 0, one can return to node 314 and link 316 without accessing record C. Avoiding the neεd to access the data record in the manner specified has, of course, the advantage of improving performance. The criterion to .know that the sought data record does not reside in the treε is that thε sizε of thε common kεy prεfix of thε sought data record and the common key of the block is greater than the valuε of the split node.
In the latter examplε, thε value of the split nodε is 1 (of nodε 313), thus block 310 is not thε block that accommodates record L (if such record exists). Therefore, the sεarch for record L is continued from nodε 314 and link 316. This procεdurε appliεs to all modify transactions.
Insofar as insert transaction is concemεd, block 300 is found in thε mannεr spεcifiεd abovε and is associated with the new data record L.
The latter example refεrrεd to a spεcific εxamplε of layεrεd indεx. Those versεd in the art will readily appreciate that the lattεr probabilistic access characteristics applies mutatis mutandis to other types of layered index that utilize a basic partitioned indεx.
The probabilistic search characteristics which leads to "errors" stems - 63 -
from the fact that not nεcessarily thε complεtε common kεy of a block in layer Ih_x is .known from the values of the node that reside on the sεarch path up to thε block in Ih_x . Thus, it is nεcεssary to know the common key of the block in Ih_x in order to verify if the sεarch path to thε spεcifiεd block matches the sεarch path according to thε kεy of thε sought data rεcord. If the common key is not maintainεd in thε block, it might bε nεεdεd to advance in the index to a data record in order to know the common kεy valuε.
The inherent error pronε characteristics of the layered index and the manner of handling it has been exεmplifiεd with refεrεncε to Fig. 9 abovε, and may be described more genεrally as follows: to sεarch a rεcord by kεy k , thε lattεr is sεarched in Ih (and in some cases in Ih_x to /, or to data record(s)) in order to find the block B of Ih_ leading to k . This process is repeated until reaching the block of I- that is associated with the data record with key k (if one exists).
The description in Figs. 7 to 9 exemplified a layerεd index utilizing a P.AIF based indexing scheme as the basic partitioned index and thε rεprεsεntativε indεx . Thosε vεrsεd in thε art will readily appreciate that the layered index of the invention is not bound only to PIAF. Thus, for examplε, U.S. 5,495,609 illustratεs a diffεrεnt triε. Considεr, for example, the trie of Fig. 10A in accordance with the spεcifiεd '609 patent, and assuming that the triε consists of a block that accommodates nodes 11, 12, 13 and 14. Should it now be requirεd to split the block subsequεnt to the insertion of new nodes to the treε, a possiblε approach of splitting thε block in accordance with prior art techniques, would be, for examplε, to brεak thε link bεtwεεn nodε 12 and 14, to thεrεby obtain two blocks, onε accommodating nodεs 11,12 and 13, whεrεas the other accommodating node 14 (hereinafter nεw block). Assuming that thε first block rεsidεs in thε intεrnal memory, if it is now required to reach record 26, only one I/O - 64 -
operation is requirεd. If, on the other hand, record 20 is of interest, a first I/O operation is requirεd, in ordεr to access the new block (i.e. the onε accommodating nodε 14), and thεrεfrom anothεr (i.ε. sεcond) I/O opεration is rεquirεd, in order to access record 20. It is accordingly appreciated that the split block gavε rise to an unbalanced tree. Subsequεnt insεrt transactions may adversely affect the unbalanced characteristic of the trεε, i.ε. nεcεssitatε multiplε I/O accεssεs which is obviously undεsirεd.
Applying thε tεchniquε of thε invεntion will copε with the shortcomings of an unbalanced trεe, and the rεsulting layεred indεx is illustratεd in Fig. 10B, whεrε thε rεprεsentative indεx is constituted by block 159A over thε rεprεsentative keys of the trie (constituted by blocks 159b and 159c). Here also, the link betwεεn nodε 12 and 14 is considered a split link, and the new node, 159D (being replication of node 12) is copiεd into a nεw block designated as 159A. Now, in order to access record 20 and record 26, the same number of I/O operations is rεquirεd, and in this particular case, 2. As the size of the trie grows the more efficient is the access using the layered index.
The layerεd indεx of Fig. 10B brings about, thus, a balanced treε of blocks, assuring that essentially the same number of I/O operations is requirεd to reach εach and εvεry data rεcord in the tree. Those vεrsεd in the art will readily appreciate that prefεrably thε numbεr of I/O opεrations is a logarithmic function dεpεnding upon thε numbεr of data rεcords and the number of links originated from a block. Thus, for examplε, if 1000 far links originate from a block, a layerεd index with 3 levels allows access to 1,000,000,000 data records.
For a better undεrstanding of thε foregoing, therε follows numerical example. Assuming that every block has 1000 far links. Assuming that the size of εach far link is 4 bytεs it rεadily arisεs that the size nεεdεd for rεprεsεnting the far links is 4000 bytes. Assuming further that thε nodεs and the near links within a block occupy another 4000 bytεs, thε rεsulting block - 65 -
size is less than 10,000 bytes. For sake of discussion assuming that each block size is 20,000 bytes.
Considering now a layerεd index that consists of one block (e.g. block 144 in Fig. 7B) as indεx layεr Ix and assuming that it is linkεd to a thousand blocks in thε layer I. (of which only two blocks 146 and 148 are shown in
Fig. 7B), the layerεd indεx amounts for a total of 1001 blocks εach having a sizε of 20,000 bytεs. Accordingly, the total space that should be allocated for holding the blocks of the layεrεd indεx is about 20 mεga bytεs. This order of size can bε εasily accommodated in the intεmal mεmory of say, for εxample, a personal computer. Assuming now that each block in I. is associated with one thousand data records, the net εffεct is that by utilizing a layεrεd indεx of thε invεntion (according to thε lattεr εmbodimεnt) which is wholly accommodated in the internal mεmory, a million data records can be accεssεd without I/O indεx.
By thε samε tokεn accessing billions of records may required practically one more index layer which may require an additional one I/O opεration.
For a better undεrstanding of thε foregoing consider for example thε implεmεntation of thε layered index in Figs. 6B-1 or 6B-3 (P.AIF index scheme). Had the kεys of data rεcords 103 and 107 bεεn longεr in sizε (for εxamplε 100 bytε long), this would havε not changed the size of the P.AIF. .Another non limiting example can be shown in Fig. 8B - the size and thε structure of the layered index would not be changed if the size of the key of data rεcords a-k addressed by the indεx would bε 200 bytεs long. As can bε seen, it is also possible to navigate in the index and to retriεve the data a-k according to thε ordεr of thε kεy. This εxεmplifiεs one farm of sequential opεration.
As shown, thε rεsulting layered index of fig. 10B includes two trees having vertical orientation i.e. the first tree structure consisting of blocks - 66 -
159B and 159C (bεing onε form of the basic partitioned index I. ) and second tree having one block 159A (being one form of the basic partitioned index Ix ).
The so accomplished horizontal treε of blocks (being one form of the layerεd index) is balanced, i.e. root block 159A which, through one I/O εnablεs to access all the links to the data records. Further insertions of data records which will lead to additional splits in thε blocks of I- , will require, of course, updating thε layεr indεx /, . When the number of nodes in block 159A of /, excεεds a givεn number, block 159A is split according to the split mechanism.
The trie index with which the technique of the invention is of concern, is not confined to the search trεε disclosed in the '609 patent, and it may encompass other types of treεs as εxplained above.
It should be noted that the intra-block structure is not necessarily balanced , i.e. nodes inside block are not necessarily arranged in a balanced sfructure. Whilst this fact is sεεmingly a drawback, those versεd in the art will readily appreciate that its implications on the overall database performance are virtually insignificant. This stems from the fact that intra-block search schemε is normally pεrfoπnεd in thε fast internal memory of the computer system. As opposed to the intra-block search schemε, thε arrangement of a block within a layered index is retained in a balanced structure thereby the number of blocks in a search path is a logarithmic function depεnding on the number of data records and reflεcts therefore the number of I/O accessεs to thε εxtεrnal mεmory (an opεration which is inherently slow) in order to load a desired block to the internal memory.
In this connection those versεd in thε art will rεadily appreciated that thε prεsεnt invεntion is by no mεans bound to a givεn physical realization. Thus, for εxamplε, insofar as sεarch scheme is of concern whilst the intra-block retains the search schεmε aftεr applying thε tεchnique of the - 67 -
invεntion this appliεs to the logical concept of e.g. advancing in the layered index according to offsets and values of offset. The latter genεral concept may be realized in many manners all of which are encompasses by the technique of the invention. Thus, for εxample, the offset size (in tεrms of numbεrs of bits) that is accommodated within each node may be altεrεd, thε mannεr of realizing empty pointers (i.e. pointers that point to null - having no children) and others. The latter physical realization flexibility applies also to thε intεr-block portion.
Thε layered index described with refεrεncε to Figs. 7 to 10 all, rεtain essentially the same index schemε for both thε triε and the reprεsεntativε indεx scheme , (except for the error handling which may be encountered when accessing data records through the index, as εxplained in detail with refεrence to Fig 10G abovε).
Thε rεtention of the index scheme for both the trie and the reprεsεntativε indεx is not obligatory as will bε εxεmplifiεd with rεfεrεncε to Fig. 11.
Fig. 11 illustratεs another approach of balancing an unbalanced treε of Fig. 8A (i.ε. constructing a layered index) using a conventional B treε as a represεntativε indεx ovεr thε rεprεsentative keys of the unbalanced trie. The so obtained horizontal orientεd balanced treε (layεrεd indεx) includεs blocks 272 at the upper level (index layer I. ), 270 and 271 at a lower levεl (indεx layεr Ix ) and the original blocks of the unbalanced vertical orientεd tree of Fig. 8 A at the lowest (blocks 260,261,262,264) - index layεr I. . Fig. 4 dεmonstrates thus that the indεx scheme of the reprεsεntativε indεx is not nεcessarily the same as that of the original unbalanced triε. If dεsirεd, the B-treε in its εntirεty (forming a rεprεsεntativε index) may be regarded as an indεx layεr I .
Thε databasε filε management system of the invention not only copes with the drawbacks of thε conventional trie indεxing filε but also offers - 68 -
othεr bεnεfits which facilitate and improve data access by user application programs.
Thus, the fact that a balanced structure of blocks is retainεd assurεs that, on thε avεragε, thε numbεr of slow I/O opεrations is rεtainεd essentially optimal, i.e. a more efficient result is obtained, particularly when large files consisting of multitude of blocks arε concεmεd.
Thosε vεrsed in the art will readily apprεciatε that whilst preferably thε construction of layεred index apply to slow I/O operations, e.g. for minimizing the number of accessεs to slow external storage medium, the invention is by no mεans bound to thε spεcifiεd storagε mεdium. Thus, for εxamplε the storage mεdium with which thε prεsεnt invεntion is applicable may also be an internal memory. This is of particular relεvance considering the evεr increasing volumes of internal mεmoriεs which although being faster than extεmal mεmory, may also rεquirεd efficient access control which is realized according to the invεntion.
Thεrε follows a dεscription of the second aspect of the invention.
For convenience of explanation, the second aspect of thε invεntion will be described with referεncε to thε P.AIF indεx (constituting a dεsignatεd index). The invention is by no means bound by this specific εxample.
As stated before, thε databasε file managemεnt system of the invention enables to address diffεrεnt typεs of data rεcords using a singlε indεx.
In ordεr to better distinguish bεtwεεn data records of differεnt typεs that arε addrεssεd by the same P.AIF index, each data record belonging to a given typε is associated with a given designator. The latter forms part of the key of the data rεcord constituting a dεsignator kεy. The designator is unique for evεry typε of data. Thus, for example, the key of data records that bεlong to thε εntity "Borrowεr" is prefixed with thε dεsignator 'A', whεrεas all thε kεys of data rεcords that bεlong to thε entity "Book" are prefixεd with thε designator 'B'. The new key of the data records that bεlong to Borrowεr - 69 -
becomes a designated key that consists now of the concatenation of 'A' and the original key of Borrower, and by the same token, the new designatεd kεy of the data records that belong to Book consists now of the concatenation of 'B' and the original key of Book.
Having discussed the so called "designator" fεaturε of the second aspect of the invention, there follows a description of thε so called meta data.
According to an aspect of the invention, a data dictionary maintains meta-data information, which provides information on the data records as a function of the type of thε rεcords. Thus, in addition to thε data records it is needεd to maintain a dεsignator, to bε ablε to idεntify thε dεsignator and by using thε meta-data information, to bε ablε to identify or construct the designated key as wεll as other information such as the rεcord sizε. The search schemε of the index is oblivious to the meta-data. It locates thε rεcord from thε dεsignator (or composite) key without using the meta-data. The meta-data is required to construct the (composite) designator key and, oncε the record is retrieved, to determine the propertiεs of thε rεcord. Thus, for εxample, having retrieved the data record of book the designator -B- is identified, and information on the record designated B is available from the meta-data. For example the size of the book record, its fields and the fields that arε thε kεy fields.
The use of dεsignatεd data rεcords is not bound to only onε typε, but rather (prefεrably) morε than onε typε may bε trεatεd by the designated indεx and as will be explained bεlow with subordination rεlationship.
Thus, whilst according to hitherto known solutions, data of different types are typically held in several files (and is addrεssεd by sεvεral indεx files), according to a database managemεnt systεm utilizing a dεsignatεd index of the invention, data records of different types may bε addrεssed from the same index. It should be noted that the keys of data records that belong to different types (and are addressεd by the same designatεd indεx) do - 70 -
not nεcεssarily havε thε same length. Thus, for example, consider a layerεd index which is also a designated index based on a trie as its basic partitioned layered index of the kind dεpicted in Fig. 8A. Thε sizε of thε kεy of thε rεcords that bεlong to thε "Borrowεr" εntity is 6 bytεs long, whereas the size of the key of the records that bεlong to thε "Book" εntity is 5 bytes long. Inserting books to the designated index of fig. 8A with the designator keys Bl l l l l and B22222 rεsult in thε data structure of fig. 12 that includes a designated index that address 2 types of data rεcords - data rεcords a-k which are assigned with the designator A and data records w-x which are assigned with thε dεsignator B. In thε description below, the terms record of type X or rεcord designated X are used to describe a record having a designatεd kεy and thε designator is X.
Whilst thε latter example illustrated onε manner of realizing designated data (i.e. pre-pεnding as prefix a character, string or any number of bits) to the key of thε data rεcord, those versεd in thε art will readily appreciate that this is only one out of many possible variants. In fact, the proposed designator may be realizεd in any known manner provided that the designator distinguishes betwεεn diffεrεnt data rεcords, treated as part of the key, and therεfore forms part of the search.
The latter statement applies, regardless of whethεr thε designator: (i) forms part of the data record (or key portion), (ii) being stores elsewherε (ε.g. in a different data structure), or (iii) it may bε defined elsewhεrε, or εvεn dεfinεd othεrwisε. .An εxamplε of thε lattεr is a trie structure that is associated with data records all of the samε type (for examplε, all arε dεsignatεd with a character A ). Obviously, by this examplε, it is not rεquirεd to physically attach thε dεsignator to thε instances of the data records, seεing that thε dεsignator is common to all records. Howevεr if data record is accessed it is needed to idεntify the designator and add it to the key. Another possible solution is to prεfix thε dεsignator to the data record such that when the data record is accessεd the designator is availablε. - 71 -
For examplε, consider Fig. 12, data record d is accessεd from node 266 by link 270. The first character of data record d is A - the designator.
For a bεttεr undεrstanding of thε subordination relationship, attention is directed to Fig. 13A-13E. Fig. 13A illustrates a designated index 800 (in the form of PAIF) with four data records 802, 804, 806 and 808 (of which only the designator keys are shown) associated thereto. The data records are all of the samε type as readily arises from the designator 'A' that is prepεndεd to εach of the data records.
Turning now to Fig. 13B, therε is shown thε PAIF 800 with new data record (812) with a composite key A12355B940201333333 (the designator of rεcord 81 is B). Thε new data record is subordinated to data rεcord 806 whosε kεy is A 12355. According to the PAIF index, node 814 indicated that the discerning offset is 6 and that the value B links to data record 812 (having the value B at offset 6). Seεing that rεcord 806 has no valuε at offsεt 6, it is assignεd with virtual value (say null) at this offset in order to determine the disceming offset vis-a-vis the other record and accordingly , then link 818 is set with direction marked null.
Fig. 13C illustrates the PAIF 800 in which another data record 820 is inserted. Data record 820 which represents another instance of B type data record that is subordinated to A typε data rεcord (806) is inserted to thε PAIF. Thε disceming offset is 11 (the value of the new node 822) and the link values therεof are '0' and ' 1 ' to data records 812 and 820, respectively.
Fig. 13D illustrates the PAIF 800, where a differεnt typεs of records are subordinated to record 806. Data record of typε 'D' (824) bεing subordinatεd to data record of type 'A' is linked from node 814 by link 823 having the value D. As recalled, the PAIF already represents data record dεsignatεd B whεrε thε lattεr is subordinatεd to thε data record designated A. An example of the 'B' type subordinated to 'A' typε is items ('B') storεd by supplier ('A') and ('D') type subordinated to ('A') is clients ('D) served by the supplier ('A'). - 72 -
Turning now to Fig. 13E, there is shown another embodiment of the P.AIF of Fig. 13D implemented slightly differently. In particular, the subordinated data records 812, 820 and 824 are reprεsεntεd and maintained in the data file without their key prefix that is the designator kεy of the record 806 (i.e. the prefixed key A12355 is omitted). When accessing, for examplε, data record 812 the infoimation availablε from the meta-data according to the designator B allows to εxtract the following information: (i) identify that part of the key is missing,
(ii) that record 812 is subordinated to a record designatεd A that can be accessed from node with valuε 6 (814) and by a link with valuε null (818).
Thus it is possiblε to access data record 806 and construct the completε kεy of record 812. If the PAIF 800 is a layerεd index, it might bε that nodεs 814 and 822 rεsidε in diffεrεnt blocks and thε access path to the block associated with record 812 does not include node 814. In that case, a link from the subordinated records (links 826, 828 and 830) to record 806 allows to access data record 806 and construct the key. Thε implemεntation described above obviate the nεcessity to duplicate the reprεsεntation of thε dεsignatεd kεy of data rεcord 806 in respect of each subordinated data record (by the particular εxamplε of Fig. 13D, thε spεcifiεd prεfix A12355 is duplicated threε timεs for rεcords 812, 820 and 824). Replacing the key prefix with a link can save space (if the size of thε prεfixεd is largεr than the representation of the link) and allows to access the record that the subordination relates to without necessitating a separatε sεarch.
Fig. 13D, 13E illustrate that the subornation relationship characteristics of the invention is not limited to any spεcific realization.
The subordination relationship of the invention enablεs, thus, to rεndεr more efficient the low lεvεl implementation of data as compared to hitherto .known techniques in the sensε that onε indεx can bε associated with various data types and subordination relationships as compared to separatε indεx filεs according to thε prior art. This notwithstanding, there - 73 -
may of course be applications according to the invention, where more the one index filε is utilized.
Obviously, each of the subordinated records 812, 820, 824 can havε rεcords subordinated to it.
Moreovεr, thεre are somε othεr advantagεs that arε brought about using thε proposed technique of the invεntion, ε.g. maintaining data intεgrity. Consider, for examplε, an insert transaction that is applied to the PAIF 800 of Fig. 13E, of data record designated B with a composite kεy A12355B930101123456 subordinatεd to data rεcord 806 (having designated key A12355). Thε sεarch leads to node 822. The value at key offset 11 of the insertεd data rεcord is 0 thus rεcord 812 is accessed. The search key of record 812 needs to be constructed (by accessing record 806 via link 826) and the insertion of thε nεw data record can be complεtεd. It should be noted that thε link to rεcord 806 obviates the neεd to conduct a separate search for record 806 by it's key in order to confirm it's existεncε. Thus thε maintenance of data integrity is more εfficiεnt.
Pεrforming thε samε data intεgrity check using the spεcified B-treε indεx implies considerablε ovεrhεad sincε it is rεquired two phase operation. At first, a search is applied to the index of data records of type 'A' in order to find data record whose key is 12355. Only upon finding it record of type B can be insertεd (and a sεparatε index file is normally updated).
When sεarching data, thε data structure of fig 20E exemplifies other advantages rεsulting from thε fact that subordinatεd data rεcords arε linkεd to thεir "parent" rεcord. For example, if record from type A is a customer and record from type B is an invoice, it is usually needed to access the invoice details with the customer details. The link from the invoice to the customer obviatεs a separate search for the customer details. - 74 -
Thε so obtainεd dεsignatεd indεx of the invention brings about another important advantage in that navigation in the index for accomplishing sεquεntial opεrations.
Consider, for examplε, the P.AIF of Fig. 13E, where it is required to "retriεve" all data records in an ascending order. Thus, it is possible to navigate in the PAIF (known also as sequεntial operation) and data records 802, 804 806, 812,820,824, and 808 arε rεtriεvεd according to thε order of the designator key. If_only records of certain type are neεded, for examplε thε rεcords of typε A, onε would navigatε in thε indεx in thε samε mannεr whilst avoiding thε accεss of nodεs and rεcords that arε not relevant. Accordingly, from nodε 814 data rεcord 806 is accεssεd and it can bε predicted that the data records that can be accessεd from node 814 by its links and descendent nodes are subordinated to record 806, therεby avoiding links 833, 823. In this εxamplε only rεcords 802,804,806 and 808 arε retrievεd. In thε samε mannεr, onε would avoid to movε along link 823 if only records of type A and B are needεd since it can be predicted that a link with a value D from a node with a value 6 addressing record 806 is a link to subordinated data record designated D.
If the PAIF index is a layered index and assuming that nodes 814 residε in a different block than of node 822, the movε from node 814 to node 812 can be by the split link. If the split link does not exist, for examplε in fig. 7F onε nεeds to use the link 421 of node B' (422) when it is needed to advance by link 400 from node B (423) to node E (424).
Having exεmplifiεd thε subordination rεlationship with rεfεrεncε to the specific embodiment of Fig. 13, there follows a description that pertains to the multi-dimensional characteristic according to the second aspect of the invention.
Turning now to Fig. 14, thεrε is shown a schematic illustration of a designatεd indεx according to onε embodiment of thε invention. The indεx contains two sεarch paths to onε dεsignated data record ("DEPOSIT" data - 75 -
rεcord) such that the deposit can be accessεd by εach of the two composite keys - a designatεd kεy that includes the key fields account number, date and client number and a second designated key that includes thε kεy fiεlds cliεnt numbεr, datε and account number. Reverting to the above examplε, thε account data record has a dεsignatεd kεy 'A 133333' (1201), Updating a dεposit for the account (deposit subordinatεd to account) can bε implεmεnted by means of designated record 203 subordinated to designated record 201. The P.AIF would allow to access records 201,203 from node 207 by link 206. By the same token, data record 204 rεprεsεnts a deposit of a client. The key of record 202 is B133333. Updating a deposit 204 to a client 202 can bε implεmεntεd by thε index 200 and node 209 linked (208) to data record 204. The kεy of data rεcord 203 is. 'A133333C01019811346' (jfc, ). The key of record 204 is Bl 1346D010198133333 (k. )
As shown thε fiεlds of Cliεnt and Account are duplicated in records 203, 204 (as well as additional information such as the date and the sum) which is an obvious drawback which results in an unduε inflated file.
This drawback may be overcome by reprεsεnting a single DEPOSIT record as a multidimεnsion rεcord 210.
Data rεcord 210 (Fig. 14) is a multi-dimension record that is updated and accessed by the designatεd indεx 200 according to the designator key kx (designator C) and according to the designator key k2 (designator D). (note that when data record is a multi-dimension record, the designator of thε rεcord dεpεnds on thε kεy that is bεing used) The path in the index by kx leads to nodε 207 and from that node to the designator C of record 210. The information in the mεta-data according to thε dεsignator C allows to construct thε rεlεvant structure. For εxamplε construct a data structure that includes the key kx .by links 213, 214 records 201 and 202 are accessed an thus with the datε field of record 210 all the key fields are constructed. The path in the index by k2 lεads to nodε 209 and from that nodε to the - 76 -
dεsignator D of rεcord 210. Thε information in the meta-data according to the designator D allows to construct thε rεlevant structure, for example construct a data structure that includes the key k2 . As shown, the search path defined by the search keys of rεcord 203 leads to thε first fiεld 212 having a valuε 'C (which is thε dεsignator according to sεarch key kx ). The third fiεld points to data rεcord 201. Thε sεcond field 215 (having a value 'D' - which is the designator according to search key k2 ) of thε same data structure 210 is accessiblε by sεarch path that is defined by the sεarch kεy of rεcord 204. The fourth field has a link to the actual data record 202. In this manner thε record DEPOSIT represεnts subordination of both account and cliεnt, whilst avoiding duplication of thε fields account, client date and sum. It should bε noted that the data εlεmεnts account and client are accεssεd by means of link to the original data records (201 and 202) and the rest of the data (date and sum) exists only once within data element 210. Obviously, data record 210 can include other fields. The invεntion is by no mεans bound to a givεn realization and accordingly the manner of realizing data record 210 as depicted in Fig 14 is only one out of many possible variants. Thε number of search paths is not limited. As had been εxplainεd above with refεrεnce also to Fig. 13E, if the sought data record is Axxxx (i.e. the account record 201 per se), then one simply moves in thε index with a search key of '.Ajixxx' to any of it's subordinated records and access the record of type A by the link from the subordinatεd rεcord to record of type A.. Such for example link 213 of fig. 14. Other implementation are of course feasiblε (e.g. maintaining a link in the index to record A), all as required and appropriate. The specified description which provides two (and in the gεnεral casε at lεast two) sεarch paths to onε physical occurrence of data records constitutes the multi-dimεnsional data structure which is a designated index that contains at least two search paths to one data record (called multi-dimension record).
Relation among data elεments - Fig. 15 illustrates another feature of - 77 -
thε invεntion, i.ε. data rεlationship fεature. Thus, data record A (a book data record) has C, F, J, K and L data records subordinated thereto. The realization of this hierarchy was illustrated above. According to the present rεlationship fεaturε, onε-to-onε and onε-to-many relations may easily be rεalizεd. Considεr, for εxamplε, that a book has many categories (L), i.e. one-to-many, howevεr, it has only onε abstract (K), i.e. one-to-one.
According to the proposed feature, a one-to-onε data relationship is implemεntεd by a dεsignatεd (compositε) kεy of two components: the first is thε dεsignatεd kεy of its subordinating rεcord and thε sεcond is thε dεsignator of thε subordinatεd rεcord (sincε it is a onε-to-onε relation thεrε is no nεεd to usε thε kεy field of the subordinated rεcord). Whεrεas a one-to-many relationship is implεmented by a designator (composite) key whose first component is the designator key of the subordinating record, and whose second component consists of thε dεsignator and kεy of thε subordinatεd record.
In this example, the one-to-onε rεlation bεtwεεn a book and its abstract is maintained by defining thε kεy of L to be .AxxxL, wherε Axxx is thε dεsignatεd key of A, L is the designator of thε kεy of record L. The one-to-many relation betwεεn a book and a category is maintained by defining the key of L to be AxxxLyyy, whεrε Axxx is the designated key of A, L is the designator of the key and yyy are the key field(s) of record L.
There follows now a description that pertains to another fεaturε according to thε sεcond aspect of the invention that pertains to multi-model represεntation. In accordance with this feature, and as will be εxplainεd in grεatεr dεtail below, one or more of thε following (and possibly other) models may bε represεnted by the specified designatεd indεx.
Rεprεsεnting relational tables by a multi-model dεsignatεd index -
The rεlational modεl considers all data as consisting of tables. Each table consists of records of the same structure, callεd tuples. Supposε, thε - 78 -
tuples consist of fields FI, F2 and F3. Each such field is a key. If kεy F2 is subordinatε to key FI, and key F3 is subordinate to key F2, we can easily construct thε tablε: to rεtrieve its tuplεs, follow the designator of key FI, and from there for each value of FI, follow thε dεsignator of F2, and in thε same manner continue to F3. Each such triple definεs a tuplε of thε table. Some projεctions arε even easiεr: to find all thε pairs of values of FI and F2 for which therε εxists a value of F3 in the tablε, wε tεrminatε thε sεarch after processing (FI, F2). Performing the projεction of (F2, F3) might bε εxpεnsivε, sincε it requires searching all valuεs of FI first. Howεvεr, if this opεration is common, the designatεd index should also maintain the search path (F2, F3, FI). I.e., we construct a new designator composite kεy F2'F3'F1' with new designators, and insert the additional paths to the dεsignated index. Thus each record can be reached via both paths and constitute multi-dimεnsion rεcord.
Additional models on the multi-modεl dεsignatεd index -
The designatεd indεx enables to reprεsεnt additional data modεls, including . relational database, an objεct oriented system, and a hierarchical database, wherε substantially no data is duplicated.
Implementing object oriεntεd (pεrsistent data structures) by multi-model designatεd indεx -
Thε objεct oriεntεd approach considers all data as objects. Every object belongs to a class, which determines its structure and which methods (functions) can be applied to it. Thε classes are organized in a hierarchy, from which structure and method may be inheritεd. Thε objεct-oriεntεd approach is εphεmεral — an objεct εxists only whilε thε program that crεatεd it is active Objects that need to be supported for a longεr pεriod of timε, arε dεfinεd as persistent. Thεsε objεcts are storεd on thε disk and arε availablε to - 79 -
other (authorized) programs. The multi-model dεsignatεd indεx can easily support such object. Since their structure is uniformly encoded with the aid of designators, later incarnations of the program as well as other programs can access thesε pεrsistεnt objεcts. Notε that at thε samε time a persistεnt object can also be part of a relational table. Thεrε is no nεεd to duplicate data.
Consider, for examplε, thε data structure 220 of fig. 16. Data records 223, 224, 225, and 226 arε subordinatεd to data record 221 and together with record 221 are considerεd as an objεct. It is possible to search efficiently in the index for all data rεcords with a key prefix εquals to thε dεsignatεd key of record 221 (partial key search) and retriεve the entire object. If only part of the object's data is needεd such as the A type record and the subordinated B type records, again a partial kεy sεarch is donε for data rεcords with key prefix that is equal to the designated key of record type A (for examplε 221) and thε dεsignatore B as the next key fiεld.
Implementing object-relational by multi-model designatεd index -
As opposed to the object-oriεntεd approach, thε relational approach considers all data as tables. Thus it is difficult to integratε SQL quεries in an object-oriεntεd programming languagε (C++ or Java). The object-relational approach provides an intεrfacε to convert tables to objects. The intεrfacε requires the user to spεcify thε rεlationship between the objεcts and the table attributεs. If somε attributes themsεlves are tables, we nεεd to allow relational algebra operations on thesε tablεs too. Thεsε conversions are performed by the application program. Thus thε databasε is unablε to optimize the queriεs. Thε dεsignatεd indεx trεats data in a uniform mannεr, thus providing an idεal intεrface betwεεn thε objεct-oriented application program and the data structures. The application program's queriεs are - 80 -
formulatεd in tεrms of designated keys, so the database can optimize the query strategy. The databasε rεtums designated keys, which the object-oriented application program can readily process by the object-oriεntεd mεthodology. Thε sεquence of dεsignators of thε search path to the object detεrminεs its class, and thε dεsignators to various fields allow the object-oriented program to resolve polymorphism of the method calls.
Thε dεsignatεd addrεsses all relating data. For examplε assuming that fig. 16 dεscribεs a data structure of an insurance company wherε records of type A are customers, records of type B arε customers claims and records of type C arε customers payments. As it is clearly shown, all the data records arε addressed by a single index structure.
Now, one is able to efficiently access all thε objεct instances since the index allows to navigate from a customer to its relatεd data - claims and payments. At the same time one is able to navigate on the index structure efficiently and effect the customer tablε (thε collections of records of typε A), customer claims table (the collections of records of type A and B) and customers payments tablε (the collection of records A and C). Since the data structure doεs not imposε physical clustering of the data, if data is shared among differεnt objεcts, it can be efficiently accεssεd by thε different object views - and thus such data record is a multi-dimension record. In this example, a claim can be efficiently accessεd both from thε customer object and the policy object and being from a typε structured as for example in fig.16 (structure 210).
The object-orientεd approach allows users to add user-dεfinεd typεs (UDT) and usεr-dεfmed functions (UDF). For examplε one could add the photos of accidents to the insurance company database. In the examplε, a nεw dεsignatεd data rεcord subordinatεd to the A type data record is definεd. Whεn a claim's details are searched, the photo of the accident is accessεd and sent to the photo printout application. With a dεsignatεd - 81 -
index, the relation bεtween the photo data to the claim is handled in the same manner as with built in classes and relations. The new UDT can be basεd on or bε related (by subordination) to any other data type. Now, with the designatεd indεx, thε application can navigate to the new UDT from the definεd classes from which the new UDT can inherent mεthods and other properties. In the examplε, whεn navigating in the index, one would navigate to a claim from which onε could reach the photo as well as any other part of the claim's data.
Network and Hierarchical Models:
Implemεnting network and hierarchical models by multi-model designated index -
The network and hierarchical models have beεn rεplacεd by thε relational model. However, even though these models are obsolete, they have some advantages (as well as many disadvantages) over the tablε-oriεntεd implεmεntation. Oncε a rεcord is retrieved the addrεssεs of related records are readily available.
Consider, for examplε, a bank with customers and loans. Each customer has an address and sevεral loans, whilε εach loan is taken by one or more customers. In the network modεl, εach customer is representεd by a nodε containing link to thε customer and links to nodes representing the loans taken by thε customer. A node reprεsεnting a loan is likεwise linked to thε nodεs of the customers that took that loan. Thus given a loan one can easily access of the customers that took thε loan and gεt thεir homε addresses.
The B-treε implementation, requires us to maintain two treεs: onε of thε customers and home addressεs, and thε sεcond of loans and customers. Thus having retrievεd the data of a loan, the names of the customers that - 82 -
took the loan are availablε. To find thεir addrεssεs, an indεpεndεnt B-trεε sεarch is required for each customer.
In the proposed multi-model designatεd indεx (such as for example in fig. 16), once reaching the node reprεsεnting thε loan , onε can continue to a designator that identifies the customers that took that loan (for examplε rεcords of typε B). Normally, at most onε disk access is required for each customer. The proposed multi-dimensional dεsignatεd indεx has the advantages of the network model, without its disadvantages. While the network model treated each node separatεly, and was susceptible to long search paths, the multi-model designatεd index treats all data uniformly and the length of the search paths in probably logarithmic such that the basε of thε logarithm is thε block sizε. Thus, in practice, the search requirεs a singlε disk access.
Implemεnting server-client model with object orientεd basεd on a designated index-
Thε client-servεr model enablεs εfficiεnt implεmεntations of thε relational model. According to this model, all the data rεsides at a central computer (called the server), and the application programs run at othεr computers (called clients). When an application nεεds data, it formulatεs an SQL quεiy, which is sent by thε cliεnt to thε sεrvεr. Thε sεrvεr evaluates the query and returns the resulting tablε to thε client.
Thus, the interface betwεεn the client and the servεr is via SQL queries — the servεr is unaware of thε intεmal data structures and code of the application. The client and the sεrver havε just to agree on the names of the tables and their attributes.
In the object-oriεntεd approach this modεl brεaks down. Sincε εach data item is an object, the sεrvεr must be aware of its internal structure. This problem is aggravated in thε prεsεncε of polymorphic methods. The sεrvεr must bε aware of the structure and thε dεtails of the entire class hierarchy. - 83 -
Thε designated index allows to apply the client-sεrver approach for the object-oriented and object-relational models. For examplε, to rεach an attribute, the application program sends the path of kεys and link dεsignators leading to the desirεd nodε to the server. Based on this data the server can fulfill the request without any .knowledgε of thε data structure of the application program.
The client and the sεrvεr should agrεε on thε namεs of thε fεlds and thεir dεsignators. Thε sεrvεr nεεd not be aware of the type of data of each such field, and its semantic content.
According to yet another aspεct of the invention it is proposed to further compress the rεprεsentation of the index therby render it more efficient. Herεon there is an estimation of the space requirεd by a triε and mεthods to reduce the space requirements.
If the trie is a layered index the analyzing of the trie index structure will concentrate on the last layεr ( /„):
Storage requiremεnts for primary key index of a triε -
Onε of the most important fεaturεs of a triε basεd data structure is the modest size of its representation. The PAIF for example maintains evεn smaller size than a conventional trie bεcausε of it's comprεssεd rεprεsentation.
The last levεl of the P.AIF index contains a trie with links that point to other trie nodes in thε samε block, and links that point to rεcords. Lεt N bε thε number of records in the database. Thε indεx contains exactly N pointers to these records. If each pointer rεquirεs 4 bytεs, the size needed for the pointers is 4N bytes. In addition, each pointer has a direction, (1 bytε) thus the total is 5N bytes.
Now consider the space requirεd for a PAIF trie. Since N pointers emanate from the index and each trie node has at least 2 children, therε arε at most - 84 -
n ≤ N - l trie nodes. Let d denote the avεragε numbεr of children of a trie nodε thεn n < N l{d -\) . Sincε in practice d » 2 , n « N . Each trie node has a lεvel numbεr (1 bytε). Sincε each trie node has at most one incoming triε link, thεrε arε at most n - 1 triε links, εach triε link has a label, which is a single character and an intra-block pointer (1 byte), thus a total of 3n bytes. Thus in the worst casε it is nεεdεd 3n + 4N ≤ IN bytεs in thε worst casε. And bεtwεεn 4N and 6N bytεs in practice.
Perfoπning thε samε analysis but from anothεr anglε: Considεr two pointεrs p and p2 that εmanatε from nodε v of lεvεl k . Let x be a kεy reachable from p\ andx2 a key reachable from p2. Then jtj and x2 share the first & -1 characters. In A PAIF structure, each one of these characters is represented at most once. In the B-tree reprεsεntation it is needed to explicitly represent thε first k character of each key.
The savings in the PAIF are twofold: First evεry character of is stored at most once on each levεl, and sεcond, not all characters neεd bε representεd.
Furthεr indεx compression -
In the abovε discussion, most of thε space is requirεd for thε pointers to records. It will be now presentεd a mεthod that allows to save pointer space. The method is basεd on allowing sεvεral links to rεcords to share the samε pointεr. Supposε, first, that thε records havε fixεd sizε. If the first two records reside in the same block, then it is possible to keep a single full sized pointεr for the first pointer to a block, and instead of keeping a pointer for each of the rεmaining outgoing links to that block, computing their displacement, i.e., if the first two records reside in block number 2000 and the third record in block 7000 it is possible to maintain the structure 2000(e,f) 7000(h).Thε savings would be much more substantial if a larger number of outgoing links point all to the same block. If k such links point to - 85 -
a block, then the 4B of the pointer are divided among all k rεcords, thus the space for addressing each record is reduced to 4/k bytes plus the space for the direction (1 bytε). For k > 4 this mεans that εach record requires 2 bytes in the index.
For variablε sizεd rεcords It is possible to maintain the displacement within the block, for example: 2000(e:<ie , t df ) 7000(h: A ). Instεad of maintaining a full pointεr, a displacement that could fit into a single byte is maintained. Thus, for εach rεcord it is needed 1 byte for its share in the pointer, 1 byte for the direction, and 1 byte for the displacemεnt; a total of 3 bytεs per record.
Looking at the examplε of fig. 17, fig. 17A shows a nodε 2000 of a trie with the links 2010, 2011, 2012 (values 5,9,A respεctivεly) that address 3 data records - 2002, 2004, 2006 at disk address 3000, 5000, 7000 respectively. . The size neεdεd to rεprεsεnt thε link valuεs (1 bytε for each link) and the pointers (4 bytεs) to thε data is 15 bytes.
Turning now to fig. 17B where node 2000 maintains a shared link (2010) to thrεε data rεcords (2002,2004,2006). Thε information that rεprεsεnt thε link is the address to block 2020 (4 bytes) and thε link values to the data records 2002, 2004, 2006 that reside in the block (1 byte for each link value). The size neεdεd to rεpresent thε pointεr to the data block and the valuε of thε links is only 7 bytεs - (3000:5,9,A).
Now in ordor to access data record 2004 one can calculate it's address as the address of the data block + the displacemεnt which dεpεnds on thε rεcord sizε assuming that the records in thε data block arε all of equal size. As had been explained, node 2000 can include links to other data records or data blocks (such as link 2024 to data block 2022 accommodating data rεcord 2008).
Prεfεrably, thε database file managεment system of the invention - 86 -
should be associated with .known per se concurrency and/or distributed capabilities so as to enablε a plurality of usεrs to access virtually simultaneously to the database. The database may bε located in a central location, or distributed among two or more rεmotε locations.
Turning now to Figs. 18A-D, thεrε arε shown four bεnchmark graphs demonstrating the enhanced performance, in terms of response time and file size of databasε utilizing a file managemεnt system that employs a system of the invention vs. commercially available Ctrεε based database. The inserts are realized through Uniface application running in Windows (for workgroup) opεrating systεm.
Thε benchmark of Fig. 18A concerns measuring the time in minutes for inserting an evεr increasing number of a priori sorted data records to a file (0-1,000,000). As shown in Fig. 18A, the larger number of inserts thε grεatεr is thε improvεmεnt in terms of response time of the database file managemεnt systεm of thε invεntion. Thus inserting 1 million records takes about 669 minutes in the Ctree based database as compared to only 65 minutes in the systεm of thε invεntion. Mor ovεr, thε rεsponse time in thε filε management system of the invention increases by only a small extent as the numbεr of records increases, as opposed to significant increasε in thε rεsponsε timε in the counterpart systεm according to the prior art.
The bεnchmark of Fig. 18B illustrates the file size in mega bytes as a function of number of data records in the file (0-1,000,000). As shown in Fig. 18B, the larger number of rεcords the greater is the improvemεnt in tεims of file size in thε databasε filε managεmεnt systεm of thε invεntion. Thus for 1 million rεcords thε filε sizε of Ctrεε basεd filε is about 151 mεga bytε as compared to only 22 mega byte in the database file managemεnt systεm of thε invεntion.
Graphs 18C and 18D are similar to thosε shown in Figs. 18A and 12B apart from the fact that in the former (18C and 18D) thε data rεcords arε insεrted randomly whereas in the latter (18A and 18B) the data records are a - 87 -
priori sorted according to search key. As shown the rεsults arε as bεforε i.ε. the system of the invention is more efficient in terms of both responsε time and file size.
Figs. 19A-D illustrates a benchmark graphs of a system of the invention (operating under DOS operating system) vs. commercially available Btreε basεd databasε systεm. The results are as before i.e. the system of the invention is more efficient in terms of both responsε time and file sizε.
Thosε vεrsεd in thε art will appreciate that alphabetic and roman characters designating claim steps arε made for conveniεncε of explanation only and should by no means construes as imposing order of stεps, or how many timεs each step is executed vis-a-vis other steps of the method.
The prεsεnt invεntion has been described with a certain degrεε of particularity, but thosε vεrsεd in the art will appreciate that various modifications and alterations may be implεmεntεd without dεparting from thε scope and spirit of the following claims:

Claims

- 88 -CLAIMS:
1 . In a storage m╬╡dium us╬╡d by a databas╬╡ fil╬╡ manag╬╡m╬╡nt syst╬╡m executed on data processing system, a data structure that includes: a layered index arranged in blocks; the layered index includes a basic partitioned index that is associated with data records; the basic partitioned index enabl╬╡s accessing or updating the data records by key or keys, and being susceptibl╬╡ to an unbalanced structure of blocks; said layered index enabl╬╡s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
2 . The layer╬╡d ind╬╡x of Claim 1, wh╬╡r╬╡in said basic partition╬╡d ind╬╡x b╬╡ing a tri╬╡.
3 . In a storage medium used by a database file management system executed on data processing syst╬╡m, a data structure that includes: an index arranged in blocks and b╬╡ing constructed over the keys of data r╬╡cords; the index includes a basic partitioned ind╬╡x that is associated with the data records; the basic partitioned index ╬╡nabl╬╡s accessing or updating the data records by key or keys, and being susceptible to an unbalanced structure of blocks; said index ╬╡nabl╬╡s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
4. In a storage medium used by a database file managem╬╡nt system ex╬╡cut╬╡d on data processing system, a data structure that includes: an index arrang╬╡d in blocks and b╬╡ing construct╬╡d over the keys of data records; th╬╡ ind╬╡x includes a trie that is associated with the data records; the trie enables accessing or updating the data records by key or keys, and being susceptibl╬╡ to an unbalanced structure of blocks; said index enables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks. - 89 -
5 . The layered ind╬╡x of Claim 1, wh╬╡rein said storage medium being an ext╬╡mal m╬╡mory.
6 . The layer╬╡d ind╬╡x of Claim 5, wh╬╡r╬╡in said storag╬╡ m╬╡dium b╬╡ing furth╬╡r an int╬╡rnal m╬╡mory.
7 . The layered index of Claim 1, wher╬╡in said storag╬╡ m╬╡dium being an internal memory.
8 . The layered index of Claim 2, wh╬╡r╬╡in said trie being a PAIF tri╬╡.
9 . Th╬╡ layered index of Claim 1, wher╬╡in th╬╡ basic partitioned index and the r╬╡pr╬╡s╬╡ntativ╬╡ ind╬╡x of said lay╬╡r╬╡d ind╬╡x b╬╡ing substantially th╬╡ sam╬╡ ind╬╡x sch╬╡m╬╡s.
10 . Th╬╡ layered index of Claim 1, wher╬╡in th╬╡ basic partitioned index and the r╬╡pr╬╡sentative index of said layer╬╡d index being different index schem╬╡s.
11 . The layered index according to Claim 8, wherein the repres╬╡ntative index of said layered index being the Btre╬╡ ind╬╡x sch╬╡m╬╡.
12 . Th╬╡ lay╬╡red index according to Claim 10, wher╬╡in th╬╡ repres╬╡ntativ╬╡ ind╬╡x being the Btree index schem╬╡.
13. Th╬╡ lay╬╡red index according to Claim 8, wh╬╡rein the representative index of said layered index being substantially the P.AIF ind╬╡x scheme.
14. Th╬╡ lay╬╡red index according to Claim 9, wherein the r╬╡presentativ╬╡ index being substantially the PAIF index schem╬╡.
15 . Th╬╡ layered index according to Claim 1, capable of supporting the ODBC standard.
16 . The lay╬╡r╬╡d ind╬╡x I0,...,Ih according to Claim 1, comprising: a r╬╡pr╬╡s╬╡ntativ╬╡ ind╬╡x Ix,...,Ih constructed such that any Ij is constructed over the representativ╬╡ k╬╡ys of Ij - 1 .
17 . The layer╬╡d ind╬╡x I0,...,Ih according to Claim 16, wh╬╡rein Ih is - 90 -
fully contained in one block.
18 . The layer╬╡d ind╬╡x of Claim 3, wh╬╡r╬╡in said storag╬╡ medium being an ext╬╡rnal m╬╡mory.
19 . The layered index of Claim 18, wher╬╡in said storage medium being further an internal memory.
20 . The layered index of Claim 3, wher╬╡in said storag╬╡ medium being an internal memory.
21 . The layered index according to Claim 3, capable of supporting th╬╡
ODBC standard.
22 . The layer╬╡d index of Claim 4, wher╬╡in said storag╬╡ medium being an ext╬╡rnal m╬╡mory.
23 . Th╬╡ layered index of Claim 22, wherein said storage medium being further an internal m╬╡mory.
24 . The layered index of Claim 4, wher╬╡in said storag╬╡ medium being an internal memory.
25 . The layer╬╡d ind╬╡x according to Claim 4, capable of supporting the
ODBC standard.
26. In a database file management system for accessing data records and being executed on data processing system; the data records are associated with a basic partitioned ind╬╡x arrang╬╡d in blocks and b╬╡ing stor╬╡d in a storag╬╡ medium; the basic partition╬╡d ind╬╡x ╬╡nabl╬╡s accessing or updating th╬╡ data r╬╡cords by k╬╡y or k╬╡ys and being susceptible to an unbalanced structure of blocks; a method for constructing a layered index arranged in blocks, comprising the steps of:
(a) providing said basic partitioned index;
(b) constructing a representative index over th╬╡ r╬╡pr╬╡s╬╡ntative keys of said basic partitioned index; said lay╬╡red index enables accessing or updating the data r╬╡cords by k╬╡y or k╬╡ys and constitutes a balanced structure of blocks. - 91 -
27. Th╬╡ layered index of Claim, 26 wherein said basic partition╬╡d index being a trie.
28 . In a database file manag╬╡m╬╡nt syst╬╡m for accessing data records and being ex╬╡cut╬╡d on data processing system; th╬╡ data r╬╡cords are associated with a basic partitioned index arranged in blocks and being stor╬╡d in a storag╬╡ m╬╡dium; the basic partitioned index enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks; a method for constructing an index over the keys of the data records, the index being arranged in blocks, comprising the steps of:
(a) providing said basic partition╬╡d index;
(b) constructing an index ov╬╡r th╬╡ representativ╬╡ keys of said basic partitioned ind╬╡x; said index enabl╬╡s accessing or updating the data r╬╡cords by k╬╡y or k╬╡ys and constitutes a balanced structure of blocks.
29 . In a database file management system for accessing data records and b╬╡ing executed on data processing system; the data records ar╬╡ associated with a trie arranged in blocks and being stored in a storage medium; the trie enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks; a method for constructing an index over the keys of the data records, the index being arranged in blocks, comprising the st╬╡ps of:
(a) providing a trie;
(b) constructing an index over the representative keys of said trie; said index enabl╬╡s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
30 . Th╬╡ m╬╡thod of Claim 26, wh╬╡r╬╡in said storag╬╡ medium being an external memory. - 92 -
31 . The method of Claim 30, wherein said storage m╬╡dium being further an internal memory.
32 . Th╬╡ m╬╡thod of Claim 26, wh╬╡r╬╡in said storag╬╡ medium being an internal memory.
33. The method of Claim 27, wherein said trie being a P.AIF trie.
34 . The method of Claim 26, wher╬╡in the basic partitioned index and the representativ╬╡ index being substantially the same index schem╬╡s.
35 . Th╬╡ m╬╡thod of Claim 26, wherein the basic partitioned ind╬╡x and the repres╬╡ntative index being different index sch╬╡m╬╡s.
36 . Th╬╡ method of Claim 33, wher╬╡in the representativ╬╡ index being the Btre╬╡ ind╬╡x sch╬╡m╬╡.
37 . Th╬╡ m╬╡thod of Claim 35, wherein the r╬╡pr╬╡s╬╡ntative index being the Btr╬╡e index scheme.
38 . The layered index according to Claim 33, wherein the r╬╡pr╬╡s╬╡ntativ╬╡ index being the PAIF ind╬╡x sch╬╡m╬╡.
39 . Th╬╡ layered index according to Claim 34, wher╬╡in th╬╡ representative index being the PAIF ind╬╡x scheme.
40 . The method of Claim 26, capable of supporting the ODBC standard.
41 . Th╬╡ method of Claim 28, wher╬╡in said storag╬╡ m╬╡dium b╬╡ing an external memory.
42 . The method of Claim 41, wh╬╡r╬╡in said storag╬╡ m╬╡dium b╬╡ing furth╬╡r an int╬╡mal m╬╡mory.
43. Th╬╡ m╬╡thod of Claim 28, wherein said storage medium being an internal m╬╡mory.
44 . Th╬╡ m╬╡thod of Claim 28, capabl╬╡ of supporting th╬╡ ODBC standard.
45 . Th╬╡ m╬╡thod of Claim 26, wh╬╡r╬╡in said index supports sequential operations.
46 . The method of Claim 28, wher╬╡in said ind╬╡x supports - 93 -
s╬╡qu╬╡ntial op╬╡rations.
47 . Th╬╡ m╬╡thod of Claim 29, wh╬╡r╬╡in said ind╬╡x supports sequential operations.
48 . The m╬╡thod for accessing a sought data record rby k╬╡y k m th╬╡ lay╬╡r╬╡d ind╬╡x of Claim 1, comprising:
(a) sεarching k in Ih to Ik whεrε h ≥ k ≥ O and in thε casε it is not found in thε kεy of a data rεcord in order to find the block of Ih_x leading to k ;
(b) repeating step (a) until reaching the block of I. that is associated with the data record with key k , if exists.
49 . The method for ins╬╡rting a data r╬╡cord rby k╬╡y k in th╬╡ lay╬╡r╬╡d ind╬╡x of Claim 1, comprising:
(a) searching k in Ih to Ik where h ≥ k ≥ O and in the casε it is not found in thε key of a data rεcord in order to find the block of Ih_x leading to k ;
(b) repeating step (a) until reaching the block B of I0 that is associated with the data record with key k , if exists;
(c) associating r to B .
50 . Th╬╡ m╬╡thod for deleting a data record r by key Hn the layered index of Claim 1, comprising:
(a) searching k in Ih to Ik wherε h ≥ k ≥ O and in thε case it is not found in the key of a data record in order to find the block of Ih_ leading to k ;
(b) repeating step (a) until reaching the block B of I. that is associated with the data record with key k , if ╬╡xists;
(c) disconnecting r from B .
51. The method for accessing a sought data record rby key A: in the layer╬╡d index of Claim 3, comprising: - 94 -
(a) searching k in Ih to Ik where h ≥ k ≥ O and in the case it is not found in the key of a data record in order to find the block of Ih_x leading to k ;
(b) repeating step (a) until reaching the block of I0 that is associated with the data record with key k , if exists.
52 . The method for inserting a data record rby key k in the layered index of Claim 3, comprising:
(a) searching k in Ih to Ik where h ≥ k ≥ O and in the casε it is not found in thε kεy of a data record in order to find thε block of Ih_x leading to k ;
(b) rep╬╡ating step (a) until reaching the block B of I. that is associated with the data r╬╡cord with k╬╡y k , if exists;
(c) associating r to B .
53 . The m╬╡thod for deleting a data r╬╡cord rby k╬╡y A: in th╬╡ layered index of Claim 3, comprising:
(a) searching k in Ih to Ik where h ≥ k ≥ O and in thε casε it is not found in the key of a data record in order to find the block of Ih_x leading to k ;
(b) r╬╡p╬╡ating st╬╡p (a) until reaching th╬╡ block B of I. that is associated with the data record with key k , if exists;
(c) disconnecting r from B .
54 . The method of Claim 26, wher╬╡in said construction st╬╡p (b) includes:
(a) If B (in Ih_x ) overflows, it is split into two (or mor╬╡) blocks and the repr╬╡s╬╡ntativ╬╡ ofB inIh is r╬╡plac╬╡d by the repr╬╡s╬╡ntativ╬╡s of the new blocks.
(b) If the block of Ih overflows an additional layer Ih+X is creat╬╡d and add╬╡d to th╬╡ layered index. - 95 -
55 . The method according to Claim 54, perform╬╡d on th╬╡ fly.
56 . The method according to claim 54, performed post factum.
57 . The method of Claim 28, wherein said construction step (b) includes:
(a) If B (in Ih_x ) overflows, it is split into two (or mor╬╡) blocks and th╬╡ repres╬╡ntativ╬╡ ofB v's\Ih is replaced by the repr╬╡sentatives of the new blocks.
(b) If the block of Ih overflows an additional layer Ih+X is creat╬╡d and added to the layered index.
58 . The method according to Claim 57, perform╬╡d on th╬╡ fly.
59. Th╬╡ method according to claim 57, performed post factum.
60 . The method according to claim 26, wherein the construction step (b) includes:
(a) at least one short link among the short links of a nod╬╡ (h╬╡r╬╡on split node) in the block (of B._ ) is del╬╡ted (hereon split link) in a way that at least two tries ╬╡xist in th╬╡ block;
(b) ╬╡ach of the sub-tre╬╡s is moved to a separate block;
(c) if the block of B{ does not exist, Bt is creat╬╡d and a copi╬╡d nod╬╡ of th╬╡ split nod╬╡ is created in Bt ;
(d) if the block of Bi exists and a copied nod╬╡ of th╬╡ split nod╬╡ do╬╡s not exist in Bt , then a copi╬╡d nod╬╡ of th╬╡ split nod╬╡ is cr╬╡at╬╡d in
5,. and conn╬╡ct╬╡d to the trie of B; such that Bt_x ' (at th╬╡ ╬╡nd of th╬╡ split process) is accessible in a search path that includes the root node in Bt and th╬╡ copi╬╡d nod╬╡ and its lab╬╡l╬╡d links according to th╬╡ r╬╡pr╬╡s╬╡ntativ╬╡ k╬╡y of Bt_ ' ;
(e) if the copied node has no direct link, a direct link is added from the copied nod╬╡ to the block 5M ; - 96 -
(f) a far link added from the copied node to the block £,._, ' or if thε copiεd nodε has a short link to a child node in the direction of the far link, the far link is replaced by a direct link from the child node to block 5,._, ' .
61 . In a storag╬╡ medium used by a database file manag╬╡m╬╡nt syst╬╡m executed on data processing system, a data structure that includes at least one probablistic access indexing file (PAIF) having a plurality of nodes and links; th╬╡ l╬╡av╬╡ nod╬╡s of said P.AIF ar╬╡ associated each with at least one data record accessibl╬╡ to said us╬╡r application program and wh╬╡r╬╡in at l╬╡ast portion of said data r╬╡cord constitutes at least one search-k╬╡y; s╬╡l╬╡cted nodes in said PLAF represent, each, a given offset of a search key portion within said inset search key; link(s) originat╬╡d from each given nod╬╡ from among said s╬╡l╬╡cted nod╬╡s, r╬╡pr╬╡s╬╡nt, ╬╡ach, a unique value of said search key portion; the PLAF having at least two sub-PIAF's being arranged, each, in a block; said data base fil╬╡ manag╬╡m╬╡nt system is further capable of arranging said blocks as a balanced structure of blocks.
62 . The data processing system according to Claim 61, wherein at least some data records that ar╬╡ associated to said leaf nodes are held in at least one separate file.
63 . Th╬╡ data processing system according to Claim 61, wherein at least one leaf is associated with more than one data record.
64 . A m╬╡thod for ins╬╡rting a n╬╡w data r╬╡cord into an existing PAIF according to Claim 61 including the execution of the following steps:
i. advancing along a ref╬╡r╬╡nc╬╡ path commencing from the root nod╬╡ and ╬╡nding at a data r╬╡cord associated to a leaf node (referred to as - 97 -
"refer╬╡nc╬╡ data r╬╡cord"); in ╬╡ach nod╬╡ in th╬╡ r╬╡f╬╡r nc╬╡ path, advancing along a link originated from said node if the valu╬╡ represented by the link equals the value of the 1-bit-long key portion at the offset specified by said node; in the case that the offset specified in the node is beyond any corresponding key portion in the k╬╡y, or if th╬╡r╬╡ is no link with said valu╬╡, advancing along an arbitrary path to any reference data record ; ii. comparing the search key of the ref╬╡r╬╡nce data r╬╡cord to that of th╬╡ n╬╡w data r╬╡cord for d╬╡t╬╡rmining th╬╡ small╬╡st offs╬╡t of the search k╬╡y portion that discerns the two (hereinafter disceming offs╬╡t). iii. proceed to one of the following st╬╡ps (iii.0-iii.3) d╬╡p╬╡nding upon th╬╡ valu╬╡ of th╬╡ discerning offset: iii.O if the data records ar╬╡ ╬╡qual then terminate; or iii.l if the disceming offset matches th╬╡ offs╬╡t indicated by one of the nod╬╡s in th╬╡ reference path, add another link originating from said one node and assign to said link the valu╬╡ of the search key portion at the disceming offset taken from the s╬╡arch k╬╡y of th╬╡ new data record; or iii.2 if th╬╡ discerning offset is larger than that indicated by the leaf node that is linked, by m╬╡ans of a link, to th╬╡ ref╬╡renc╬╡ data r╬╡cord: iii.2.1 disconnect the link from th╬╡ r╬╡f╬╡r╬╡nc╬╡ data r╬╡cord (i.╬╡. it remains temporarily "loose") and mov╬╡ the link to a new node; the new node is assigned with a value of the disceming offset; iii.2.2 connect th╬╡ r╬╡f╬╡r╬╡nce data record and th╬╡ new node (which now becomes a l╬╡af nod╬╡) and assign to th╬╡ link (long link) a valu╬╡ of th╬╡ s╬╡arch-k╬╡y-portion at th╬╡ discerning offset taken from the search key of the ref╬╡r- - 98 -
╬╡nc╬╡ data record; iii.2.3 connect by means of a link the new data record and the new nod╬╡ and assign to the link (long link) a value of the search-key-portion at the disceming offset taken from the search key of the new data record; or iii.3 if conditions iii.0,iii.l and iii.2 are not met, ther╬╡ ╬╡xists, in th╬╡ r╬╡f╬╡r╬╡nc╬╡ s╬╡arch path, a fath╬╡r node and a child node ther╬╡of such that the disceming offset is, at th╬╡ sam╬╡ time, larger than the offset assign╬╡d to th╬╡ father node and smaller than th╬╡ offs╬╡t assign╬╡d to the child node -(- consider╬╡d case A), or all the nod╬╡s in th╬╡ r╬╡f╬╡r╬╡nc╬╡ s╬╡arch path hav╬╡ a value greater than the disceming offset - (-- considered case B); accordingly, apply the following sub-steps: iii.3.1 for case A and B, create a new node and assign the node with the valu╬╡ of said discr╬╡ning offs╬╡t, for case A only - disconnect th╬╡ link from th╬╡ fath╬╡r nod╬╡ to the child node and shift the link to a new internal node (i.e. the child node remains temporarily "loose"); iii.3.2 for case A and B, connect by means of a link (long link) the n╬╡w data record and said n╬╡w int╬╡mal nod╬╡; th╬╡ valu╬╡ assign╬╡d to th╬╡ link is that of th╬╡ s╬╡arch-key-portion at the disceming offset, as taken from the search key of the new data record; iii.3.3 for case A and B, connect by means of a new link the new node and for case A - the child node, for cas╬╡ B - th╬╡ root nod╬╡ (i.╬╡. th╬╡ n╬╡w nod╬╡ b╬╡comes for case A - a n╬╡w father node, for case B - a new root nod╬╡), and th╬╡ valu╬╡ assign╬╡d to said link is th╬╡ search-key-portion at the offset indicated by the new node, taken from the search key of the reference data record. - 99 -
65 . A method for obtaining a balanced PAIF ind╬╡x; the PAIF including blocks each accommodating a plurality of nodes and links originated from said nodes; leaf nodes from among said nodes ar╬╡ associated with data records; the method comprising ex╬╡cuting the following steps as many times as r╬╡quir╬╡d:
(i) replacing a block, constituting a replaced block, with at least two split blocks such that few from among the nodes of said split block are accommodated within one of said split blocks and the remaining nod╬╡s from among th╬╡ nod╬╡s of said split block ar╬╡ accommodated within other split blocks; (ii) coping at l╬╡ast one node from among th╬╡ nodes of said replaced block into a block such that said at least two split blocks being children blocks ther╬╡of.
66 . In a computer system having a storag╬╡ medium of at least an internal memory that ranges betwe╬╡n 10 to 20 M byt╬╡ or mor╬╡, and an external m╬╡mory; a data structure that includes an index over th╬╡ k╬╡ys of th╬╡ data r╬╡cords; the index is arranged in blocks; such that for on╬╡ billion data r╬╡cords substantially no mor╬╡ than two acc╬╡ss╬╡s to said ╬╡xt╬╡mal m╬╡mory are requir╬╡d in order to access a block that is associated with any one of said billion data records, irrespective of th╬╡ siz╬╡ of the key of said data r╬╡cords.
67 . In a computer system having a storage medium of at least an internal m╬╡mory that rang╬╡s b╬╡tw╬╡╬╡n 10 to 20 M byt╬╡ or mor╬╡, and an ╬╡xt╬╡mal m╬╡mory; a data structure that includes an index over the keys of the data records; the index is arrang╬╡d in blocks; such that on╬╡ million data r╬╡cords substantiallv all th╬╡ blocks of th╬╡ ind╬╡x ar╬╡ accommodat╬╡d in said int╬╡mal - 100 -
m╬╡mory regardless of the size of the key of said data records.
68 . In a computer syst╬╡m having a storag╬╡ m╬╡dium, a data structure that includes an index over the keys of data records; the index is arranged in a balanced structure of blocks and enabl╬╡s to perform sequential operations on said data records; the index size is essentially not aff╬╡ct╬╡d from the size of said keys.
69. In a storage medium used by a database file management system executed on data processing system, a data structure that includes: an index over the keys of data records; the data records being of at least two typ╬╡s wh╬╡r╬╡ data r╬╡cords of the second type are subordinated to th╬╡ data r╬╡cords of th╬╡ first type.
70. In a storage medium used by a database file managem╬╡nt syst╬╡m executed on data processing system, a data structure that includes: a designat╬╡d index over designat╬╡d k╬╡ys of data r╬╡cords; the data records, constituting designat╬╡d data r╬╡cords, b╬╡ing of at l╬╡ast two typ╬╡s where designated data records of th╬╡ second type ar╬╡ subordinat╬╡d to th╬╡ d╬╡signat╬╡d data r╬╡cords of th╬╡ first type.
71. The storage m╬╡dium of Claim 69, wherein said ind╬╡x constitutes a layered index.
72. The storag╬╡ medium of Claim 70, wherein said designat╬╡d index constitutes a layer╬╡d ind╬╡x.
73. The storage medium according to Claim 70, wher╬╡in said d╬╡signated index constituting a multi-dimensional index;
74. Th╬╡ storage medium according to Claim 72, wher╬╡in said d╬╡signat╬╡d index constituting a multi-dimensional index;
75. The storage medium according to Claim 70, wher╬╡in said d╬╡signat╬╡d ind╬╡x constituting a multi-model ind╬╡x.
76. The storage m╬╡dium according to Claim 72, wh╬╡r╬╡in said - 101 -
designated index constituting a multi-model ind╬╡x.
77. The storage medium according to Claim 74, wherein said designated index constituting a multi-model index.
78. The storage medium according to Claim 69 wh╬╡rein data record of the first type and subordinated data record of the second type constitute one to one relationship.
79. The storage medium according to Claim 70, wherein data record of the first type and subordinated data record of the second type constitut╬╡ on╬╡ to many relationship.
80. The storage medium according to Claim 71, wh╬╡r╬╡in data record of th╬╡ first type and subordinated data record of the second type constitute one to one relationship.
81. The storage medium according to Claim 73, wherein data record of the first type and subordinat╬╡d data r╬╡cord of th╬╡ s╬╡cond type constitute on╬╡ to many r╬╡lationship.
82. Th╬╡ storag╬╡ m╬╡dium of Claim 69, wh╬╡r╬╡in said index includes trie.
83. The storage medium of Claim 70, wher╬╡in said index includes trie.
84. The storage medium of Claim 71, wher╬╡in th╬╡ basic partition╬╡d ind╬╡x of said lay╬╡r╬╡d ind╬╡x b╬╡ing a tri╬╡.
85. Th╬╡ storag╬╡ m╬╡dium of Claim 69, wherein for accessing or updating transaction in respect of subordinated data record having composite key KL.Kn, there exists in the index a subordinated search path that leads to the subordinated data record according to the composite key KL.Kn; the subordinated search path includes a search path to a data record having key K 1..kn- 1.
86. The storag╬╡ m╬╡dium of Claim 70, wh╬╡r╬╡in for accessing or updating transaction in resp╬╡ct of subordinat╬╡d data r╬╡cord having composite key KL.Kn, ther╬╡ exists in the index a subordinated search path that leads to the subordinated data record according to the composite key KL.Kn; the subordinated search path includes a search path to a data record - 102 -
having k╬╡y K 1..kn- 1.
87. Th╬╡ storag╬╡ m╬╡dium according to Claim 75, wh╬╡r╬╡in said multi-mod╬╡l includes relational model.
88. The storag╬╡ medium according to Claim 75, wher╬╡in said multimod╬╡l includes object ori╬╡nt╬╡d mod╬╡l.
89. The storage m╬╡dium according to Claim 75, wherein said multimodel includes object relational model.
90. The storage medium according to Claim 75, wherein said multimodel complies with a client serv╬╡r mod╬╡l.
91. Th╬╡ storage medium according to Claim 76, wher╬╡in said multimod╬╡l includes relational model.
92. The storage medium according to Claim 76, wher╬╡in said multimodel includes obj╬╡ct ori╬╡nt╬╡d mod╬╡l.
93. Th╬╡ storage medium according to Claim 76, wher╬╡in said multimodel includes object relational model.
94. The storage medium according to Claim 76, wh╬╡rein said multimod╬╡l complies with a client serv╬╡r model.
95. In a storage medium used by a databas╬╡ fil╬╡ management system ╬╡xecuted on data processing system, a data structure that includes: an index being stored in the storage medium and constructed over the keys of said data records that are stored in blocks; the index being arranged in blocks with the leaf blocks being linked to data records by means of links; said index is characterized in that at least one of said links is shared by at least two data records stored in the same block.
96 . The storage medium of claim 95, wher╬╡in said ind╬╡x b╬╡ing constituted by a trie.
97 . In a storage medium used by a databas╬╡ fil╬╡ management system ex╬╡cut╬╡d on data processing system, a data structure that includes: an ind╬╡x b╬╡ing stor╬╡d in a storag╬╡ medium and constructed over - 103 -
th╬╡ keys of said data records that are stored in blocks; the index being arranged in blocks with the leaf blocks being link╬╡d to data r╬╡cords by means of links; said index is characteriz╬╡d in that at least one of said links is shared by at least two data records stored in th╬╡ sam╬╡ block; said index constituting a layered index according to claim 1, and blocks of said basic partitioned index are linked to said data records.
98 . The storage medium of claim 97, wher╬╡in said basic [partitioned inex b╬╡ing constituted by a trie.
PCT/IL1999/000038 1998-01-22 1999-01-22 Database apparatus WO1999038094A1 (en)

Priority Applications (10)

Application Number Priority Date Filing Date Title
CA002319177A CA2319177A1 (en) 1998-01-22 1999-01-22 Database apparatus
HU0101298A HUP0101298A3 (en) 1998-01-22 1999-01-22 Data structure, as well as method for constructing a layered index arranged in blocks, method for constructing an index over the keys of the data records, and method for obtaining a balanced index
NZ505767A NZ505767A (en) 1998-01-22 1999-01-22 Database management system with layered index arranged in blocks
BR9907227-0A BR9907227A (en) 1998-01-22 1999-01-22 Data structure on a storage medium, layered index, process to build a layered index distributed in blocks in a database file management system, processes to build an index on the data record keys, to access a data record searched by the key k in the layered index, to insert a data record r by the key k in the layered index, and to delete a data record r by the key k in the layered index, data processing system , processes for inserting a new data record into an existing paif file, and for obtaining a balanced paif index, and storage support used by a database file management system
EP99901096A EP1049990A4 (en) 1998-01-22 1999-01-22 Database apparatus
JP2000528930A JP2002501256A (en) 1998-01-22 1999-01-22 Database device
AU20719/99A AU759360B2 (en) 1998-01-22 1999-01-22 Database apparatus
IL13734799A IL137347A0 (en) 1998-01-22 1999-01-22 Database apparatus
IL137347A IL137347A (en) 1998-01-22 2000-07-18 Database apparatus
NO20003759A NO20003759L (en) 1998-01-22 2000-07-21 Database means

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL9800029 1998-01-22
ILPCT/IL98/00029 1998-01-22

Publications (1)

Publication Number Publication Date
WO1999038094A1 true WO1999038094A1 (en) 1999-07-29

Family

ID=11062302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL1999/000038 WO1999038094A1 (en) 1998-01-22 1999-01-22 Database apparatus

Country Status (12)

Country Link
EP (1) EP1049990A4 (en)
JP (1) JP2002501256A (en)
CN (1) CN1292901A (en)
AU (1) AU759360B2 (en)
BR (1) BR9907227A (en)
CA (1) CA2319177A1 (en)
HU (1) HUP0101298A3 (en)
NO (1) NO20003759L (en)
NZ (1) NZ505767A (en)
RU (1) RU2000122092A (en)
TR (1) TR200002119T2 (en)
WO (1) WO1999038094A1 (en)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175835B1 (en) 1996-07-26 2001-01-16 Ori Software Development, Ltd. Layered index with a basic unbalanced partitioned index that allows a balanced structure of blocks
WO2001008045A1 (en) * 1999-07-22 2001-02-01 Ori Software Development Ltd. Method for organizing directories
US6208993B1 (en) 1996-07-26 2001-03-27 Ori Software Development Ltd. Method for organizing directories
GB2367917A (en) * 2000-10-12 2002-04-17 Qas Systems Ltd Retrieving data representing a postal address from a database of postal addresses using a trie structure
GB2369695A (en) * 2000-11-30 2002-06-05 Indigo One Technologies Ltd Index tree structure and key composition for a database
WO2002069188A2 (en) * 2001-02-26 2002-09-06 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US6675173B1 (en) 1998-01-22 2004-01-06 Ori Software Development Ltd. Database apparatus
US6952521B2 (en) 2000-03-31 2005-10-04 Koninklijke Philips Electronics N.V. Methods and apparatus for editing digital video recordings, and recordings made by such methods
CN100334587C (en) * 2003-09-19 2007-08-29 台湾积体电路制造股份有限公司 Method and system of data management and database
US7287033B2 (en) 2002-03-06 2007-10-23 Ori Software Development, Ltd. Efficient traversals over hierarchical data and indexing semistructured data
US7366725B2 (en) 2003-08-11 2008-04-29 Descisys Limited Method and apparatus for data validation in multidimensional database
US7734661B2 (en) 2003-08-11 2010-06-08 Descisys Limited Method and apparatus for accessing multidimensional data
US7908242B1 (en) 2005-04-11 2011-03-15 Experian Information Solutions, Inc. Systems and methods for optimizing database queries
US7912865B2 (en) 2006-09-26 2011-03-22 Experian Marketing Solutions, Inc. System and method for linking multiple entities in a business database
US8204856B2 (en) 2007-03-15 2012-06-19 Google Inc. Database replication
US8954459B1 (en) 2008-06-26 2015-02-10 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US9147042B1 (en) 2010-11-22 2015-09-29 Experian Information Solutions, Inc. Systems and methods for data verification
US9152727B1 (en) 2010-08-23 2015-10-06 Experian Marketing Solutions, Inc. Systems and methods for processing consumer information for targeted marketing applications
US9251541B2 (en) 2007-05-25 2016-02-02 Experian Information Solutions, Inc. System and method for automated detection of never-pay data sets
US9342783B1 (en) 2007-03-30 2016-05-17 Consumerinfo.Com, Inc. Systems and methods for data verification
US9483606B1 (en) 2011-07-08 2016-11-01 Consumerinfo.Com, Inc. Lifescore
US9529851B1 (en) 2013-12-02 2016-12-27 Experian Information Solutions, Inc. Server architecture for electronic data quality processing
US9563916B1 (en) 2006-10-05 2017-02-07 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US9576030B1 (en) 2014-05-07 2017-02-21 Consumerinfo.Com, Inc. Keeping up with the joneses
US9619579B1 (en) 2007-01-31 2017-04-11 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US9690820B1 (en) 2007-09-27 2017-06-27 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US9697263B1 (en) 2013-03-04 2017-07-04 Experian Information Solutions, Inc. Consumer data request fulfillment system
US9853959B1 (en) 2012-05-07 2017-12-26 Consumerinfo.Com, Inc. Storage and maintenance of personal data
US10102536B1 (en) 2013-11-15 2018-10-16 Experian Information Solutions, Inc. Micro-geographic aggregation system
US10223637B1 (en) 2013-05-30 2019-03-05 Google Llc Predicting accuracy of submitted data
US10242019B1 (en) 2014-12-19 2019-03-26 Experian Information Solutions, Inc. User behavior segmentation using latent topic detection
US10262362B1 (en) 2014-02-14 2019-04-16 Experian Information Solutions, Inc. Automatic generation of code for attributes
US10678894B2 (en) 2016-08-24 2020-06-09 Experian Information Solutions, Inc. Disambiguation and authentication of device users
US10963434B1 (en) 2018-09-07 2021-03-30 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US11227001B2 (en) 2017-01-31 2022-01-18 Experian Information Solutions, Inc. Massive scale heterogeneous data ingestion and user resolution
US11257126B2 (en) 2006-08-17 2022-02-22 Experian Information Solutions, Inc. System and method for providing a score for a used vehicle
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1437662A1 (en) * 2003-01-10 2004-07-14 Deutsche Thomson-Brandt Gmbh Method and device for accessing a database
US8478798B2 (en) * 2008-11-10 2013-07-02 Google Inc. Filesystem access for web applications and native code modules
US20130013605A1 (en) * 2011-07-08 2013-01-10 Stanfill Craig W Managing Storage of Data for Range-Based Searching
CN110807028B (en) * 2018-08-03 2023-07-18 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing a storage system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5404510A (en) * 1992-05-21 1995-04-04 Oracle Corporation Database index design based upon request importance and the reuse and modification of similar existing indexes
US5551027A (en) * 1993-01-07 1996-08-27 International Business Machines Corporation Multi-tiered indexing method for partitioned data
US5651099A (en) * 1995-01-26 1997-07-22 Hewlett-Packard Company Use of a genetic algorithm to optimize memory space
US5701467A (en) * 1993-07-07 1997-12-23 European Computer-Industry Research Centre Gmbh Computer data storage management system and methods of indexing a dataspace and searching a computer memory
US5717921A (en) * 1991-06-25 1998-02-10 Digital Equipment Corporation Concurrency and recovery for index trees with nodal updates using multiple atomic actions
US5765168A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Method for maintaining an index

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5717921A (en) * 1991-06-25 1998-02-10 Digital Equipment Corporation Concurrency and recovery for index trees with nodal updates using multiple atomic actions
US5404510A (en) * 1992-05-21 1995-04-04 Oracle Corporation Database index design based upon request importance and the reuse and modification of similar existing indexes
US5551027A (en) * 1993-01-07 1996-08-27 International Business Machines Corporation Multi-tiered indexing method for partitioned data
US5701467A (en) * 1993-07-07 1997-12-23 European Computer-Industry Research Centre Gmbh Computer data storage management system and methods of indexing a dataspace and searching a computer memory
US5651099A (en) * 1995-01-26 1997-07-22 Hewlett-Packard Company Use of a genetic algorithm to optimize memory space
US5765168A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Method for maintaining an index

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1049990A4 *

Cited By (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208993B1 (en) 1996-07-26 2001-03-27 Ori Software Development Ltd. Method for organizing directories
US6175835B1 (en) 1996-07-26 2001-01-16 Ori Software Development, Ltd. Layered index with a basic unbalanced partitioned index that allows a balanced structure of blocks
US6675173B1 (en) 1998-01-22 2004-01-06 Ori Software Development Ltd. Database apparatus
WO2001008045A1 (en) * 1999-07-22 2001-02-01 Ori Software Development Ltd. Method for organizing directories
US6952521B2 (en) 2000-03-31 2005-10-04 Koninklijke Philips Electronics N.V. Methods and apparatus for editing digital video recordings, and recordings made by such methods
US7574102B2 (en) 2000-03-31 2009-08-11 Koninklijke Philips Electronics N.V. Methods and apparatus for editing digital video recordings, and recordings made by such methods
US6879983B2 (en) 2000-10-12 2005-04-12 Qas Limited Method and apparatus for retrieving data representing a postal address from a plurality of postal addresses
US7366726B2 (en) 2000-10-12 2008-04-29 Qas Limited Method and apparatus for retrieving data representing a postal address from a plurality of postal addresses
GB2367917A (en) * 2000-10-12 2002-04-17 Qas Systems Ltd Retrieving data representing a postal address from a database of postal addresses using a trie structure
US8224829B2 (en) 2000-11-30 2012-07-17 Bernard Consulting Limited Database
WO2002044940A2 (en) 2000-11-30 2002-06-06 Coppereye Limited Method of organising, interrogating and navigating a database
EP2009559A1 (en) * 2000-11-30 2008-12-31 Coppereye Limited Database
GB2369695B (en) * 2000-11-30 2005-03-16 Indigo One Technologies Ltd Database
GB2406680A (en) * 2000-11-30 2005-04-06 Coppereye Ltd A database having a hierarchical index structure
GB2406680B (en) * 2000-11-30 2005-05-18 Coppereye Ltd Database
AU2002222096B2 (en) * 2000-11-30 2008-08-28 Bernard Consulting Limited Method of organising, interrogating and navigating a database
GB2369695A (en) * 2000-11-30 2002-06-05 Indigo One Technologies Ltd Index tree structure and key composition for a database
WO2002069188A2 (en) * 2001-02-26 2002-09-06 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US8489597B2 (en) 2001-02-26 2013-07-16 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
GB2389690B (en) * 2001-02-26 2005-09-07 Ori Software Dev Ltd Encoding semi-structured data for efficient search and browsing
US6804677B2 (en) 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
WO2002069188A3 (en) * 2001-02-26 2004-06-10 Ori Software Dev Ltd Encoding semi-structured data for efficient search and browsing
US8065308B2 (en) 2001-02-26 2011-11-22 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US7287033B2 (en) 2002-03-06 2007-10-23 Ori Software Development, Ltd. Efficient traversals over hierarchical data and indexing semistructured data
US7366725B2 (en) 2003-08-11 2008-04-29 Descisys Limited Method and apparatus for data validation in multidimensional database
US7734661B2 (en) 2003-08-11 2010-06-08 Descisys Limited Method and apparatus for accessing multidimensional data
CN100334587C (en) * 2003-09-19 2007-08-29 台湾积体电路制造股份有限公司 Method and system of data management and database
US7908242B1 (en) 2005-04-11 2011-03-15 Experian Information Solutions, Inc. Systems and methods for optimizing database queries
US8065264B1 (en) 2005-04-11 2011-11-22 Experian Information Solutions, Inc. Systems and methods for optimizing database queries
US11257126B2 (en) 2006-08-17 2022-02-22 Experian Information Solutions, Inc. System and method for providing a score for a used vehicle
US7912865B2 (en) 2006-09-26 2011-03-22 Experian Marketing Solutions, Inc. System and method for linking multiple entities in a business database
US9563916B1 (en) 2006-10-05 2017-02-07 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US11631129B1 (en) 2006-10-05 2023-04-18 Experian Information Solutions, Inc System and method for generating a finance attribute from tradeline data
US10121194B1 (en) 2006-10-05 2018-11-06 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US10963961B1 (en) 2006-10-05 2021-03-30 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US11954731B2 (en) 2006-10-05 2024-04-09 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US10078868B1 (en) 2007-01-31 2018-09-18 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US10891691B2 (en) 2007-01-31 2021-01-12 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US11443373B2 (en) 2007-01-31 2022-09-13 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US9619579B1 (en) 2007-01-31 2017-04-11 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US10402901B2 (en) 2007-01-31 2019-09-03 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US10650449B2 (en) 2007-01-31 2020-05-12 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US11908005B2 (en) 2007-01-31 2024-02-20 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US8204856B2 (en) 2007-03-15 2012-06-19 Google Inc. Database replication
US11308170B2 (en) 2007-03-30 2022-04-19 Consumerinfo.Com, Inc. Systems and methods for data verification
US10437895B2 (en) 2007-03-30 2019-10-08 Consumerinfo.Com, Inc. Systems and methods for data verification
US9342783B1 (en) 2007-03-30 2016-05-17 Consumerinfo.Com, Inc. Systems and methods for data verification
US9251541B2 (en) 2007-05-25 2016-02-02 Experian Information Solutions, Inc. System and method for automated detection of never-pay data sets
US9690820B1 (en) 2007-09-27 2017-06-27 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US10528545B1 (en) 2007-09-27 2020-01-07 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US11347715B2 (en) 2007-09-27 2022-05-31 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US11954089B2 (en) 2007-09-27 2024-04-09 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US8954459B1 (en) 2008-06-26 2015-02-10 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US11769112B2 (en) 2008-06-26 2023-09-26 Experian Marketing Solutions, Llc Systems and methods for providing an integrated identifier
US10075446B2 (en) 2008-06-26 2018-09-11 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US11157872B2 (en) 2008-06-26 2021-10-26 Experian Marketing Solutions, Llc Systems and methods for providing an integrated identifier
US9152727B1 (en) 2010-08-23 2015-10-06 Experian Marketing Solutions, Inc. Systems and methods for processing consumer information for targeted marketing applications
US9147042B1 (en) 2010-11-22 2015-09-29 Experian Information Solutions, Inc. Systems and methods for data verification
US9684905B1 (en) 2010-11-22 2017-06-20 Experian Information Solutions, Inc. Systems and methods for data verification
US11665253B1 (en) 2011-07-08 2023-05-30 Consumerinfo.Com, Inc. LifeScore
US10798197B2 (en) 2011-07-08 2020-10-06 Consumerinfo.Com, Inc. Lifescore
US10176233B1 (en) 2011-07-08 2019-01-08 Consumerinfo.Com, Inc. Lifescore
US9483606B1 (en) 2011-07-08 2016-11-01 Consumerinfo.Com, Inc. Lifescore
US11356430B1 (en) 2012-05-07 2022-06-07 Consumerinfo.Com, Inc. Storage and maintenance of personal data
US9853959B1 (en) 2012-05-07 2017-12-26 Consumerinfo.Com, Inc. Storage and maintenance of personal data
US9697263B1 (en) 2013-03-04 2017-07-04 Experian Information Solutions, Inc. Consumer data request fulfillment system
US11526773B1 (en) 2013-05-30 2022-12-13 Google Llc Predicting accuracy of submitted data
US10223637B1 (en) 2013-05-30 2019-03-05 Google Llc Predicting accuracy of submitted data
US10580025B2 (en) 2013-11-15 2020-03-03 Experian Information Solutions, Inc. Micro-geographic aggregation system
US10102536B1 (en) 2013-11-15 2018-10-16 Experian Information Solutions, Inc. Micro-geographic aggregation system
US9529851B1 (en) 2013-12-02 2016-12-27 Experian Information Solutions, Inc. Server architecture for electronic data quality processing
US10262362B1 (en) 2014-02-14 2019-04-16 Experian Information Solutions, Inc. Automatic generation of code for attributes
US11107158B1 (en) 2014-02-14 2021-08-31 Experian Information Solutions, Inc. Automatic generation of code for attributes
US11847693B1 (en) 2014-02-14 2023-12-19 Experian Information Solutions, Inc. Automatic generation of code for attributes
US9576030B1 (en) 2014-05-07 2017-02-21 Consumerinfo.Com, Inc. Keeping up with the joneses
US10936629B2 (en) 2014-05-07 2021-03-02 Consumerinfo.Com, Inc. Keeping up with the joneses
US10019508B1 (en) 2014-05-07 2018-07-10 Consumerinfo.Com, Inc. Keeping up with the joneses
US11620314B1 (en) 2014-05-07 2023-04-04 Consumerinfo.Com, Inc. User rating based on comparing groups
US10445152B1 (en) 2014-12-19 2019-10-15 Experian Information Solutions, Inc. Systems and methods for dynamic report generation based on automatic modeling of complex data structures
US11010345B1 (en) 2014-12-19 2021-05-18 Experian Information Solutions, Inc. User behavior segmentation using latent topic detection
US10242019B1 (en) 2014-12-19 2019-03-26 Experian Information Solutions, Inc. User behavior segmentation using latent topic detection
US11550886B2 (en) 2016-08-24 2023-01-10 Experian Information Solutions, Inc. Disambiguation and authentication of device users
US10678894B2 (en) 2016-08-24 2020-06-09 Experian Information Solutions, Inc. Disambiguation and authentication of device users
US11681733B2 (en) 2017-01-31 2023-06-20 Experian Information Solutions, Inc. Massive scale heterogeneous data ingestion and user resolution
US11227001B2 (en) 2017-01-31 2022-01-18 Experian Information Solutions, Inc. Massive scale heterogeneous data ingestion and user resolution
US11734234B1 (en) 2018-09-07 2023-08-22 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US10963434B1 (en) 2018-09-07 2021-03-30 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution

Also Published As

Publication number Publication date
CN1292901A (en) 2001-04-25
TR200002119T2 (en) 2000-12-21
NO20003759L (en) 2000-09-20
NZ505767A (en) 2003-09-26
JP2002501256A (en) 2002-01-15
NO20003759D0 (en) 2000-07-21
AU2071999A (en) 1999-08-09
BR9907227A (en) 2001-09-04
HUP0101298A2 (en) 2001-08-28
EP1049990A4 (en) 2004-09-08
HUP0101298A3 (en) 2003-07-28
RU2000122092A (en) 2002-07-27
CA2319177A1 (en) 1999-07-29
AU759360B2 (en) 2003-04-10
EP1049990A1 (en) 2000-11-08

Similar Documents

Publication Publication Date Title
AU759360B2 (en) Database apparatus
US6175835B1 (en) Layered index with a basic unbalanced partitioned index that allows a balanced structure of blocks
US6208993B1 (en) Method for organizing directories
US6240418B1 (en) Database apparatus
US20230006144A9 (en) Trie-Based Indices for Databases
US9870382B2 (en) Data encoding and corresponding data structure
US20180150494A1 (en) Value-id-based sorting in column-store databases
US6675173B1 (en) Database apparatus
EP2788896B1 (en) Fuzzy full text search
EP3362916A1 (en) Signature-based cache optimization for data preparation
US7363284B1 (en) System and method for building a balanced B-tree
EP2788897B1 (en) Optimally ranked nearest neighbor fuzzy full text search
US8312050B2 (en) Avoiding database related joins with specialized index structures
US10599614B1 (en) Intersection-based dynamic blocking
WO2017065888A1 (en) Step editor for data preparation
US20070094313A1 (en) Architecture and method for efficient bulk loading of a PATRICIA trie
EP1208479A1 (en) Method for organizing directories
WO2013097065A1 (en) Index data processing method and device
US8812453B2 (en) Database archiving using clusters
IL137347A (en) Database apparatus
US20090319541A1 (en) Efficient Identification of Entire Row Uniqueness in Relational Databases
Roumelis et al. Bulk Insertions into xBR-trees
CA2262593C (en) Database apparatus
US20240054122A1 (en) Method of building and appending data structures in a multi-host environment
US20230177034A1 (en) Method for grafting a scion onto an understock data structure in a multi-host environment

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 99803698.6

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 505767

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: PA/a/2000/007026

Country of ref document: MX

Ref document number: 137347

Country of ref document: IL

WWE Wipo information: entry into national phase

Ref document number: IN/PCT/2000/147/KOL

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 2319177

Country of ref document: CA

Ref document number: 2319177

Country of ref document: CA

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2000/02119

Country of ref document: TR

NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 20719/99

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 1999901096

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999901096

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWG Wipo information: grant in national office

Ref document number: 20719/99

Country of ref document: AU