WO1999038094A1

WO1999038094A1 - Database apparatus

Info

Publication number: WO1999038094A1
Application number: PCT/IL1999/000038
Authority: WO
Inventors: Moshe Shadmon
Original assignee: Ori Software Development Ltd.
Priority date: 1998-01-22
Filing date: 1999-01-22
Publication date: 1999-07-29
Also published as: CN1292901A; TR200002119T2; NO20003759L; NZ505767A; JP2002501256A; NO20003759D0; AU2071999A; BR9907227A; HUP0101298A2; EP1049990A4; HUP0101298A3; RU2000122092A; CA2319177A1; AU759360B2; EP1049990A1

Abstract

A database file management system for accessing data records is being executed on data processing system, the data records are linked to a trie index that is arranged in blocks (402, 405, 406 and 407) and being stored in a storage medium. The trie index (A, B and I, element 402) enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks. There is provided a method for constructing a layered index arranged in blocks, which includes the steps of providing the trie index and constructing a representative index over the representative keys of the trie index. The layered index enables accessing or updating the data records by key or keys and it constitutes a balanced structure of blocks.

Description

- 1 -

DATABASE APPARATUS

FT. E D OF THE INVENTION

This invention relates to databases and database management systems.

BACKGROUND OF THE INVENTION

As is well known, a database system is a collection of interrelated data files, indexes and a set of programs that allow one or more users to add data retrieve and modify the data stored in these files. The fundamental concept of a database system is to provide users with a so called "abstract" and simplified view of the data (referred to also as data model or conceptual structure) which exempts a conventional user from dealing with details such as how the data is physically organized and accessed.

Some of the well .known data models (i.e. the "Hierarchical model", "Network model", "Relational model" and "Object Relational Model" will now be briefly reviewed. A more detailed discussion can be found for example in: Henry F. Korth, Abraham Silberschatz, "Database System Concepts", McGRA -ffill International Editions, 1986 (or the 3^rd edition (1997))., Chapters 3-5 pp. 45-172

Generally speaking, all the models to be discussed below have a common property in that they represent each "entity" as a "record" having one or more "fields" each being indicative of a given attribute of the entity (e.g. a record of a given book may have the following fields "BOOK ID", "BOOK NAME", "TITLE"). Normally one or more attributes constitute a "key" i.e. it identifies the record. In the latter example "BOOK-ID" serves as a key. The various models are distinguished one from the other, inter alia, in the way that these records are organized into a more comϋlex structure: Relational Model - The relational model, introduced by Codd, is a landmark in the history of database development. In relational databases an abstract concept has been introduced, according to which the data is represented by tables (refεired to as "relations") in which the columns represent the fields and rows represent the records.

The association between tables is only conceptual. It is not a part of the database definition. Two tables can be implicitly associated by the fact that they have one or more columns whose values are taken from the same set of values (called "domain").

Other concepts introduced by the relational model are high level operators that operate on tables (i.e., both their parameters and results are tables) and comprehensive data languages (now called 4th generation languages) in which one specifies what are the required results rather than how these results are to be produced. Such non-procedural languages (SQL - Structured Query Language) have become an industry standard. Furthermore, the relational model suggests a very high level of data independence. There should not be any effect on the programs written in these languages due to changes in the manner data are organized, stored, indexed and ordered. The relational model has become a de-facto standard for data analysts.

Network Model - In the relational model, data (and relationship between data) are regarded as a collection of tables. In distinction therefrom in the network model data are represented as a collection of records whereas relationship between the records (data) are represented as links.

A record in the network model is similar to an "entity" in the sense that it is a collection of fields each holding one type of data. The links may be effectively viewed preferably (but not necessarily) as pointers. A collection of records and the relation therebetween constitutes a collection of graphs. Hierarchical Model - The Hierarchical Model resembles the network model in the manner that data and relations between data are treated, i.e. as records and links. However, in distinction from the network model, the records and the relations between them constitute a collection of trees rather than of arbitrary graphs. The structure of the Hierarchical Model is simple and straightforward particularly in the case that the data that needs to be organized in a database are of inherent hierarchical nature. The hierarchical model has some inherent shortcomings, e.g. in many real life scenarios data cannot be easily arranged in hierarchical manner. Moreover, even if data may be organized in hierarchical manner, it may require larger volumes as compared to other database models.

Consider for example a basic entity "Employee" with the following subordinated attributes "Employee_Salary" and "Employ ee_Attendance". The latter may also have subordinated attributes e.g. "Employ eeJEntries'' and "Employee_Exits". In this scenario the data is of inherent hierarchical nature and therefore should preferably be organized in the hierarchical model. Consider, for example, a scenario where "Employee" is assigned to several "Projects" and the time he/she spends ("Time_Spent") in each project is an attribute that is included in both the "Employee" and "Projects" entities. Such arrangement of data cannot be easily organized in the hierarchical model and one possible solution is to duplicate the item "Time_Spent" and hold it separately in the hierarchies of "Employee" and "Project". This approach is cumbersome and error prone in the sense that it is now required to assure that the two instances of "Time_Spent" are kept identical at all times.

Object Oriented Model -A comprehensive explanation can be found in "Object Oriented Modeling and Design^"', James Rumbaugh, Michael Blaha, William Premerlani, Fredrick Eddi and William Lorεnsen.

The object-oriented approach views all entities a objects. Each object belongs to a class, with each class there are associated methods and fields. - 4 -

To enable encapsulation some the fields are private, accessible only to methods of the class while others axe public accessible to all. Thus "Joe Smith" belongs to the class of persons. For that class, the private fields age can be defined. Applying the class method update_age() to the object Joe will change his age. The methodology allows to define sub-classes which inherit all the methods and fields of the super-class. Thus, for example, the employee class can be defined as a subclass of the person class. In addition one may define additional fields and methods to the subclass. Thus, the employee class could support a salary field, and the get_raise ( ) method.

Object Relational Model allows an object view on relational-organized data. Thus, one is able to operate on the data as if it is organized as objects and at the same time, support the relational approach.

As mentioned in the foregoing, data models deal with the conceptual or logical level of data representation and "hide" details such as how the data are physically airanged and accessed. The latter characteristics are normally dealt with by a so-called database file management system.

The database file management system maps the logical structure (in terms of database model) to a data structure, pertinent operations and possibly other data. The data structure includes index -and data records. The index enables accessing or updating the data records by a key. .In the context of search, the term search key is used. Database file management system should preferably operate on the data records so as to accomplish enhanced performance in terms of time (i.e. from the user's standpoint fast response time of the database), and space (i.e. to minimize the storage volume that is allocated for the database files). As is well known in the art, normally, there is a trade off between the time and space requirements. The performance of the database depends on the efficiency of the data structures that are used to represent the data and how efficiently the system can operate on these data. A detailed discussion on conventional file and management systems is given for example in Chapters 7 (file system structure) and 8 (indexing ) in "Database System Concepts", ibid.

Known database file management systems typically utilize the following indexing schemes, which fall into the following main categories that include: Multi-way trees indexes and others.

Multi-way trees indexes- These techniques can be used to create a one or more access paths (referred to also as search paths) to the same data record. The search paths form a multi-way tree. Its main disadvantages are that it requires space (usually all the keys to the records plus some pointers) and maintenance (addition and/or deletion of keys whenever an update transaction (see definition below) occurs i.e. record is added and/or deleted. Normally, the nature of the indexing scheme as well as the volume of the data held in the files determine the number of accesses that are required to find or update (update encompasses, insert, delete or modify) a given data record. In the case that the storage medium under consideration is an external memory, the number of accesses is effectively the number of .I/O accesses. As will be explained below, in each access to the storage medium a block of data is loaded into the memoiy.

Various types of tree inde.xing schemes have been developed but, normally, an indexing implementation is more costly than the specified direct access indexing techniques. On the other hand, tree indexing allows sequential and sub-range processing. One of the most widely used indexing schemes is the B-tree (under various commercial product names and implementation variants such as B tree) in which the keys are kept in a balanced tree structure and the lowest level points at the data itself. Detailed explanation of the B-tree indej ing scheme is found in "Database System Concepts" ibid. pp. 275-282. The number of I/O accesses obeys the algorithmic expression LogκN ÷ 1 where K is an implementation dependent constant and N is the total number of records. This means that the performance slows down logarithmically as the number of records increases.

It is possible, of course, to use a combination of the above or other techniques, e.g. an indexing scheme which is implemented in accordance with two or more of the above techniques.

One of the significant drawbacks of the aforementioned popular B-tree indexing scheme is that the keys are not only held as part of the data records, but also as part of the index

This results, of course, in the undesired inflation of the index size and the latter drawback is fuilher aggravated when indexes of large size are utilized (i.e. when a relatively large number of bits is required for representing the key).

One possible approach to cope with this problem is to exploit the Trie indexing scheme. An example of the latter is the trie discussed in G. Wiederhold, "File organization for Database design"; Mcgraw-Hill, 1987, pp. 272, 273, or in D.E. Knuth, "The Art of Computer Programming"; Addison- Wesley Publishing Company, 1973, pp. 481-505, 681-687.

Generally speaking, the trie indexing scheme enables a rapid search whilst avoiding the duplication of keys as manifested for example by the B tree technique. The trie indexing scheme has the general structure of a tree wherein the search is based on partitioning the search according to search key portions (e.g. search key digit or bit). Thus, for example each node in the trie indexing file represents an offset of the search key and the link to any one of its children represents the character's value at said offset. The trie structure affords efficient data structure in terms of the memory space that is allocated therefor, since, as specified before, the search-key is not held, as a whole, in internal nodes and hence the duplication that is exhibited for example in the B -tree indexing technique is avoided.

In a specific variant of the trie such as the trie described in "File organization for Database design" ibid., in order to achieve enhanced - 7 -

performancε in teπns of response time, a trie indexing file should be built by selecting the digits (or bits) from the search key such that the best possible partition of the search space in obtained, or in other words so as to accomplish a tree which is as balanced as possible. This, however, requires a priori .knowledge of the data records of the trie and is accomplished at the penalty of obtaining an unsorted data, which in many real-life scenarios is inapplicable. It is noteworthy that if sorted data is mandatory, a balanced structure can not be guaranteed even if there is sufficient a prioiri knowledge of the data records of the trie. It should be noted that the specified trie does not support sequential sub-range processing.

When considering a large amount of data, it is of particular importance to maintain a so-called balanced structure of the tree index in order to avoid long paths for accessing a given data record from the root node to the leaf node that is associated with the sought data record. The specified B-tree indexing scheme, constitutes an inherent balanced tree structure, even after the tree has been subject to update transactions. The inherent balanced (or essentially balanced) structure is accomplished, however, and as explained above, at the penalty of inflating the contents of the blocks in the tree and, consequently, unduly increasing the file size that holds the index, particularly insofar as large trees which hold multitude of data records are concerned. The large volume of the files adversely affects the performance of the data management system in terms of number of accesses (and consequently in terms of accessing time) to the storage medium in order to reach a sought data record, which is obviously undesired.

Turning now to the "others" category of index schemes it includes for example the so called Skip list index: A skip list is a randomized data structure: It consists of levels, the lowermost level, level 0, consist of a list of all records ordered by non-decreasing order. Each node of level i (i = 0,...,h ) chooses, with probability p, whether to be a representative of level i + 1. The rεϋresentatives cf level i constitute the nodes of level i - 1 . These - 8 -

representatives, too, are organized as an ordered list. Level h+1 is the first empty level.

Having discussed the major drawbacks of hitherto known index schemes i.e. inflated data volumes (e.g. B-tree and variants thereof) and susceptibility to unbalanced structure (e.g. trie), there follows a discussion in another aspect which pertain to various characteristics including subordination of data records and multi-dimensional characteristics .

Thus, consider for example, two types of data records reprεsεntεd as two entities (tables), i.e. Books and borrowers, each being associated with respective unique key, e.g. borrower is identified by Borrower Jd and book is identified by Book d. In real life scenario, such as in a public library, one is interestεd to view for example all books borrowed by a given borrower. The latter transaction exemplifiεs subordination of data records, where "books" are subordinated to "borrower". .In order to resolve this query, one should apply two queries - one for the borrower information and another for the boolcs borrowed by him (according to the composite key - book borrower)

Insofar as B-tree indexing scheme is concerned, in order to support the subordination of data in the manner specified, several separate index files are requires, as follows:

• Boolcs index file, accessible via book-Id key;

• Borrowers index file, accessible via borrower-Id key;

• Transactions via borrowers, accessible via the composite key {borrower-Id book-Id).

Accordingly, the index scheme includes here three index files. This obviously poses undesired overhead insofar as data volumes and additional integrity maintenance and checking are concerned. Thus, for example, rεmoval of a givεn book from the book file requires a preliminary tεst to inquire whether it exists in the borrower-book index file.

Having discussed the drawbacks of hitherto known techniques insofar as subordination of data records are concerned, the cumbersome - 9 -

representation and manner of operation therεof becomes even worth considering implemεntations of thε so called multi-dimensional data records

Reverting now to the latter examplε, the tables Books and borrowers are now regardεd as multi-dimensional tables, which can be reached from sevεral views. Thus, in addition to the above mentionεd borrowεr-> book viεw (books borrowεd by borτower(s) which is implemεnted by an index ovεr thε borrowεr-book composite key, the database should support thε altεrnativε viεw of borrowεrs that borrowed a given book(s), which requires, of course, to utilize the alternative composite key (book-borrower).

In the Btree reprεsεntation, it is accordingly rεquirεd to add anothεr indεx filε accεssiblε via the composite kεy {book-Id borrower-Id), giving rise to a total of four index files.

The pertinent drawbacks are self explanatory and bεcomε εvεn worth for n dimεnsional tablεs {n >2).

There is accordingly a need in the art to reduce the drawbacks of data processing systems that exploit hitherto .known database file managemεnt systεm. Spεcifically, there is a neεd in the art to provide for a data processing system that exhibits database performance by utilizing an efficient database file managεment system.

There is yet further neεd in thε art to providε for a database file managemεnt systεm that utilizes an index which inhεrεntly bεing not susceptible to unbalanced structure in thε manner specified above.

There is still further nεεd in thε art to providε for an index which inherently supports reprεsεntation of multiple types of data, subordination of data records and/or multi-dimensions.

GLOSSARY OF TERMS:

For clarity of explanation, therε follows a glossary of additional tεrms used frequently throughout the description and the appended claims. Some of the terms arε conventional and others have been coined: - 10 -

Block - a storage unit which can be accessεd by a singlε I/O opεration. A block may contain data arrangεd in any dεsirεd mannεr, ε.g. nodεs arrangεd as a tree and possibly also links to actual data records. A block may reside in main (refεrrεd to also as intεrnal) or sεcondary (referred to also as extεrnal) storagε.

Tree - a data structure which is cither empty or consists of a root node linked by means of d {d ≥ ) pointers (or links) to d disjoint trees called subtrees of the root. Thε roots of thε subtrεεs arε referred to as children nodes of the root node of thε tree, and nodes of the subtrees are descendent nodεs of thε root. A node all the subtrees of which arε εmpty is called a leaf node. The nodes in thε trεε that arε not lεavεs arε dεsignatεd as internal nodes.

In the context of the invention, leaf nodes are also nodes that are associated with data records.

Nodes and trees should be construed in a broad sense. Thus, the definition of treε encompasses also a treε of blocks whεrεin each node constitutes a block. In the same manner, descεndεnt blocks of a said block arε all thε blocks that can be accessεd from thε block. For detailed definition of "trεε", refer also to the book Cormen, Lεisεrson and Rivεst, or Lεwis and Dεnεbεrg "Data structures and thεir algorithms".

It should bε notεd that thε association (ε.g. link) betweεn lεaf nodε and data rεcord encompasses any realization, which enablεs to access data records from lεaf nodεs. Thus, by way of example, a data record may be accessed directly (i.e. through pointer) from the leaf node. By another non-limiting examplε, thε lεaf nodε points to data structure, (e.g. a table) which, in turn, enables to access data records. Othεr variants arε of course, also feasible.

Depth of an index - is definεd as thε maximum number of blocks from a root block to a block associated with a data record. - 11 -

Balanced Index - An indεx is balanced if thεrε εxists a constant c such that thε numbεr of accesses needed to reach any data record is at most clogrc , where n is the number of records in the structure.

Obtaining a balanced treε εncompasses, applying balancing technique, post factum, (on an unbalanced structure), bringing about a balanced structure, or, if desirεd, applying thε balancing technique on the fly, so as to maintain, a balanced balanced structure.

Accessing in an index would be considerεd as a process of moving from a node to another node within a block or to another block usually, although not necessarily, in order to reach sought data records.

Navigating is considerεd as accessing data records, usually (although not necessarily), in order to collect them in an orderεd mannεr by thεir kεy.

Search scheme: mεaning thε algorithm that is associated with an index that is used for accessing a given data record by key; intra-block search scheme meaning the algorithm that is usεd insidε thε block for accessing a given data record or another block. Thε data rεcord is not necessarily accommodated within said block.

Common key of a block - The common key of a block is the longεst prefix of all kεys of thε data rεcords that can bε accessed from the block by the relεvant search scheme. If dεsirεd, part or all of thε common kεy may bε hεld explicitly in the block.

Update transactions - transaction consisting of eithεr inserting a new data record, or delεting an εxisting data rεcord or modifying an existing data record or portion therεof .

Vertical orientated trie structure - conventional orientation of digital treε from root to lεavεs. As will be εxemplified bεlow, it is not always obligatory to maintain all thε links bεtwεεn nodes and/or blocks in the vertical trie. As will be explainεd in greater detail below, in an index of the invention, thε triε that is susceptible to an unbalanced structure constitutes a vertical treε. As will bε dεscribed below, in some specific embodimεnts, thε - 12 -

construction of indεx ovεr thε kεys of thε data records of triε constitute vertical orientεd triεs.

Horizontal oriented trie structure - having h lεvεls of vεrtical orientated trie structures with the first levεl standing for thε uppεrmost lεvεl and the h ^th levεl standing for thε lowεπnost level (constituting the triε that is suscεptiblε to an unbalanced structure) which is normally associated with data rεcords, and allows to movε from a block in the z^{' th} levεl to a block in thε i + 1 ^st level according to a common key value of the block. In various embodiments of the invention, and as will be explainεd in greater detail below, the h upper levels constitute a representativε indεx ovεr thε common kεys of thε blocks of thε lowεrmost level treε.

Storage medium - .Any medium that may be used to store data, including eithεr or both of intεmal and external memory. Extεrnal mεmory may bε one or more of the following: magnetic tape, magnetic disk, optical disk, or any othεr physical medium used for storing data. Intεrnal mεmory includes any known main memory including cache memory as well as any other physical storage medium that serrε as internal memory.

Short link - (refεrrεd to also as near link) a link labεlεd k bεtween a node a having the value r to node b in the same block such that the keys of the data records that include node b on their access path havε thε value k at key position r.

Long link - (referred to also as far link) a link betwεεn a nodε v in block B of level i to block W of level i - 1 or to a data record. If v has value r and the label of the link is k, then thε valuε of thε common kεy of block B' or thε kεy of the data record is k at position r.

The label of a short link or a far link is also referred as the value or direction of the link.

Split link - If a block overflows and a split procεss is performed such that if node a is linkεd to node b, and after the split node b and its descεndent - 13 -

nodεs arε accommodated in a different block — block B — then the link between node a and nodε b is a split link. Aftεr thε split, thε split link is the link betwεεn nodε a and block B (that is accommodating nodε b). A split link is a labεlεd link.

In sεveral implementations such as PAIF maintaining the split link from node a to to the block B where node b resides is optional since one can access block B through the layered index.

Direct link - a link betweεn nodε v in block B of lεvεl i to block B' of level i - \ , that includes a node v' such that nodes v and v' have the same value. If a search path to data record with a key k includes node v but does not include any of its near and far links then it should contain the dirεct link to block B'. A dirεct link has no label.

There follows a description that pertains to the terms duplicated node and copied node that are utilized in thε block split procedure.

Thus, if a node v ' has value k then all the keys of data records accessible from v ' and its labelεd links agrεε on positions 0,...,k -1.

If a nodε v is crεatεd such that it has a value equals to the value of node v ' and all data records accεssiblε from v and its labεled links are accessiblε from node v' and its labelεd links, v is considered a duplicated node of v'. A duplicated node maintains a direct link to the block that includes node v . (a duplicated node is also refεrrεd as copied node). - 14 -

GENERAL DESCRIPTION OF THE INVENTION

There follows a discussion in various additional terms and procedures that are used in the description and thε claims in thε context of the present invention.

Data records consist as a .rule of several fields, some of which are designatεd as keys. Somεtimεs thε records arε ordεrεd by onε of thε keys, called the primary key. .An index (or index schemε) ovεr thε keys of data records or over representativε kεys (for the definition of the latter seε bεlow) is a data structure that facilitates search by one or more of the keys. Examples of index are any of the specified Multi-way tree index schemes. An index according to the invention may be constituted by using more than one index schemε.

Thε indεx may be stored in a file or files that reside partially or entirely in the internal memory or extεrnal mεmory.

In accordance with the invention there is provided an index that includes a partitioned index — a dynamic data structure - that allows search by key, and is partitioned into blocks, each of which contains a representative key. The representative keys should be sufficient to find the block associated with a record whose key equals the sεarch kεy (if onε εxists). Having located the block, the data record may easily be retrieved. The reprεsεntative keys are not necessarily stored physically in the block.

Examples of partitioned index are:

1. The sequence of blocks of a file orderεd by increasing key value of the primary key. The index leads the search to the block containing thε kεy. To allow sεarchεs by a kεy that is not the primary key, a partitioned index is constructed such that for each record the - 15 -

partitionεd index contains its key and its link. Thesε pairs arε ordεrεd by non-dεcreasing value of thε kεy. Thε indεx lεads to thε block containing thε address of the desirεd rεcord.

2. A triε arrangεd in blocks.

3. Other types of index schemεs that mεεt thε provision of partitioned index.

A partitioned indexεs ovεr thε kεys of data rεcords is called a basic partitioned index and is denotεd indεx layer I..

This partitioned index might become non-balancεd, thus giving rise to some long search paths.

To search the partitioned index εfficiεntly, an additional index layer (an index layer is denotεd in short also index) I_x is constructed over the representativε kεys of I_Q. If I_x is also a partitionεd indεx thεn an additional index I. may be constructed over the reprεsentative kεys of thε blocks of I_λ . This process may be repεatεd until creating an index I_h (herεinaftεr root indεx) which preferably is fully contained within a single block. Thε root indεx I_h is not necessarily a partitioned index. The layered index

(which constitutes also an index) is the collection of I₀,...,I_h .

I_v... ,I_h constitute a so called representative index.

To search a record by key k , the latter is searched in I_h (and in some cases in /.,_, to /, and data record(s)) in order to find the block B of

I_h__x leading to k . This process is repeated until reaching the block of I₀ that is associated with the record with key k (if one exists).

To insert a new record r with key k , a search is performεd as above to find the block B . Having found B in I. , r is added to B .

If B (in 7₀) overflows, it is split into two (or more) blocks and the reprεsentative of B in/, is replaced by the reprεsεntatives of the new blocks. - 16 -

Thε ovεrflow of block B_x in I_x entails a splitting of B_x and the reprεsentative of B_x in I. is replaced by the representativεs of thε new blocks etc. If the block of /,, overflows an additional layer I_h+X is created and added to the layerεd indεx. It should bε notεd that an "ovεrflow" statε may bε dεtεrmined according to the particular application, and doεs not necessarily triggerεd whεn block is rεndεrεd full. Thus, for example, by one embodimεnt ovεrflow occurs whεn a block is at least half size full.

Deletion is similar to insertion, and might involve merging — revεrsε process of splitting. The updatε or thε split nεεd not nεcessarily be performed on the fly, but may bε dεlayεd (i.e. performed post factum).

It should be noted that the const ction of the layerεd indεx preferably retains a balanced index.

It should be notεd that in some embodimεnts thε balanced index is sufficient, and in some cases wherε thε lay red index (without I_Q) is of relatively small volume (e.g. may be accommodated mostly or entirely in the internal memory) the "balanced structure" requirεmεnt may bε exemptεd.

In accordance with a first aspect of the invεntion, it has bεεn found that thε inhεrεnt limitations of a basic partitioned index (e.g. trie) that is susceptiblε to an unbalanced structure may bε copεd with by providing an indεx and, morε spεcifically, a layεred indεx in the manner specified.

Focusing, for example, in the layerεd indεx as compared to the basic partitioned index (e.g. trie), it readily arises that accessing selected data records through thε layered index is substantially more efficient than the accessing the same data records through said trie.

In the context of thε invεntion, "morε εfficient" means that the number of accesses to the storage medium through the layerεd indεx in ordεr to pεrform an update transaction (e.g. insert, delεtε or modify) on a data rεcord or access data record is smaller compared to the number of accesses to the storage medium through the basic partitioned index. - 17 -

Numbεr of accesses should be construed such that in each access a block is handled (e.g. loaded or procεssεd) from thε storagε mεdium.

Thεrε may bε εxceptional scenarios where the latter "morε εfficient" provision does not apply ε.g. in thε casε of vεry small filε having only fεw blocks, whεrε accessing a data record through the basic partitioned index may requirε thε samε or even lεss opεrations than through said layεrεd indεx.

In ordεr to implement partitionεd indεx as a triε - thε construction of a layered index from a basic partitioned index which is a trie, requires somε further considerations.

Thus, each kεy is rεgardεd as a character or bit string. Moreover, if the trie cannot be accommodated in a single block, it is partitionεd into blocks, such that εach block contains a singlε subtree of the trie. The reprεsεntativε kεy of the block is the string associated with the root node of the trie in thε block, i.e., the sequence of labels of the path from thε root of thε trie of /,. to the root of the trie of thε block. As in thε gεnεral layered index schemε, the rεpresentative kεys of /,. are the kεys of I_i+ . To search a key k in I_M , one sεarches for the longest prefix k in the blocks of I_i+X and from there moves to the appropriate block of /,..

The insertion of a rεcord εntails adding its kεy to 7₀, i.e., adding a value to the triε of I- . If as a result a block overflows, the block is split — it is partitioned into typically two (in some implemεntations morε) blocks, such that εach block contains a (connεctεd) triε. To accomplish this a link bεtwεεn a nodε w and its child v is severed, and the subtreε rootεd atv is movεd to anothεr block. The reprεsεntative key of the nεw block, is addεd to I_x . As in thε gεnεral layεrεd indεx schεmε, this process is continued to y..y.

If the basic partitionεd indεx is a comprεssεd trie like Patricia or PAIF, only part of the keys are saved, this savεs indεx space. Howevεr, - 18 -

thεsε savings affect the manner in which the search is performεd. In such compressed tries usually only nodes of dεgrεe greater than or equal to two are maintained. If the sεarch kεy k doεs not bεlong to comprεssεd triε, thε sεarch might tεrminatε at somε record r , and we have to check whethεr k is εqual to the key of r . If the keys arε different thεn thε triε does not contain a record with key k .

The effect of this strategy on the layered index schεmε is that thε prεfix of k might not bε represented in the index. To enable search in such cases a direct link from nodes of blocks of I_i to block of f_ are introduced.

Thesε links do not havε a direction, and arε taken when the appropriatε position of thε sεarch kεy doεs not agree with any one of the directions of the nodε.

Supposε the search reaches block 5_M of /,._, , whose reprεsεntativε key k_t__x is not a prefix of k . (If k₍__x is not recorded explicitly in B_t_ , we can reach any data record r accessible from B_t__x , and from r's key detεrminε &,._, .) To continue the sεarch, wε compare k and&_M to find the position j of thε first character where thεy differ, search up the trie of block B_i until finding a node v with a dirεct link and value less than or εqual to j . Thε search continued from block of I_t__x pointed at by that direct link. (If no such node exists, we go to the first block of the index f__x .) Thus, in the worsε case, each layer might rεquirε one extra access. This notwithstanding, and as will be explained below, 3 layers arε sufficient to address billions of rεcords and usually 2 layers can be maintained in the internal memory of a computer. Thus it is possible to have no more than two I/O accesses to the εxtεrnal storage medium in ordεr to access the block associated with a data record.

The split process also has to accommodate dirεct links. Supposε that thε access path to block B_t_ of /,._, consists of blocks,, of layer I_; , £,._, - 19 -

ovεrflows and is split into blocks B_t__x and By . Block B_l has now to contain links to all its dεscεndεnt blocks in I_t__x . This can bε accomplished by the following non-limiting technique:

Let ky be the representative key of By, this key is insertεd to T, — thε comprεssεd triε of B, — so that thε sεarch to the keys of descendεnts of B reaches By, and the search for thε descendεnts of #,_, reaches B_t__x .

A non-limiting method to accomplishing split process is as follows:

1. at least one short link among the short links of a node (herεon split nodε) in thε block is dεlεtεd (hεrεon split link) in a way that at least two tries exist in the block.

2. each of the sub-treεs is movεd to a separate block.

3. If the block of B_l does not exist, B_l is crεatεd and a copied node of the split node is crεatεd in B_t .

4. If thε block of B_t εxists and a copied node of the split node does not exist in B_l , then a copied node of the split node is created in B_t and connected to the trie of B_l such that By (at the εnd of thε split process) is accessiblε in a sεarch path that includes the root node in B_l and the copied nodε and its labεlεd links according to thε rεpresentativε kεy of

5. If thε copied node has no direct link, add a direct link from the copied node to thε block £,._, .

6. Add a far link from thε copiεd nodε to the block By or if the copied node has a short link to a child node in the direction of thε far link, thε far link can bε rεplacεd by a dirεct link from thε child nodε to block s .

In thε abovε implεmentation, a split of a block in I_k , k>0 is performed such that the split links (of I_k ) are links bεtween copiεd nodεs of - 20 -

split nodes that reside in different blocks.

Accordingly, in accordance with one aspεct the invention provides for in a storage mεdium usεd by a databasε file managemεnt system exεcuted on data processing systεm, a data structure that includes: a layered index arranged in blocks; the layerεd index includes a basic partitioned index that is associated with data records; the basic partitioned indεx εnables accessing or updating the data records by key or keys, and bεing susceptible to an unbalanced structure of blocks; said layerεd indεx εnablεs accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

Thε invεntion furthεr provides for, in a storage mεdium used by a database file management system exεcuted on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the keys of data records; the index includes a basic partitioned index that is associated with the data records; the basic partitioned index enablεs accessing or updating the data records by key or keys, and being susceptiblε to an unbalanced structure of blocks; said index enables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

Still fu.rtlιer, thε invention provides for, in a storage mεdium usεd by a databasε file managemεnt system exεcutεd on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the kεys of data rεcords; the index includes a trie that is associated with the data records; the trie enables accessing or updating the data records by kεy or keys, and being susceptiblε to an unbalanced structure of blocks; said indεx εnables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

Still furthεr, the invention provides for in a database file management - 21 -

systεm for accessing data records and being exεcutεd on data processing system; the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium; the basic partitionεd indεx εnablεs accessing or updating the data records by key or keys and being susceptiblε to an unbalanced structure of blocks; a method for constructing a layerεd indεx arranged in blocks, comprising the steps of:

(a) providing said basic partitioned index;

(b) constructing a reprεsεntativε indεx ovεr thε represεntativε kεys of said basic partitionεd indεx; said layεred index enablεs accessing or updating the data rεcords by key or keys and constitutes a balanced structure of blocks.

Thε invεntion furthεr providεs for in a databasε file management system for accessing data rεcords and being exεcuted on data processing system; the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium; the basic partitioned index enables accessing or updating thε data rεcords by kεy or keys and being susceptiblε to an unbalanced structure of blocks; a method for constructing an index ovεr the keys of the data rεcords, thε indεx bεing arrangεd in blocks, comprising the steps of:

(a) providing said basic partitioned indεx;

(b) constructing an indεx ovεr thε rεpr sentativε kεys of said basic partitionεd indεx; said indεx εnablεs accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

In accordance with the invention thεrε is furthεr providεd in a database file managemεnt system for accessing data records and being exεcutεd on data processing system; the data records are associated with a triε arrangεd in blocks and bεing storεd in a storagε medium; the trie enables accessing or updating the data records by key or kεys and being susceptible to an - 22 -

unbalancεd structure of blocks; a method for constructing an index over the keys of the data records, thε indεx bεing arrangεd in blocks, comprising thε stεps of:

(a) providing a triε;

(b) constructing an index over the reprεsεntative keys of said trie; said index enablεs accessing or updating the data rεcords by key or keys and constitutes a balanced structure of blocks.

Thε indεx, according to thε invention is prefεrably, although not necessarily constructed by onε or morε of thε indεxing schemes selεctεd from the specified index schemεs. Typical, yεt not exclusive, examples of multi-way trees indexes being the B-treε indεxing schεmε.

By one embodiment said basic partitioned search scheme being a triε that is constituted by a digital treε of thε .kind disclosed in U.S patent no. 5,495,609.

By another embodiment said trie is constituted by a so called Probabilistic Access Inde.xing File (PACF).

Thus, by a specific embodimεnt thεrε is provided in a storage medium used by a database filε managεmεnt systεm executed on data processing system, a data structure that includes at least one probablistic access indexing file (P.AIF) having a plurality of nodes and links; the lεavε nodes of said P.AIF are associated each with at least one data record accessiblε to said user application program and wherein at least portion of said data record constitutes at least one search-kεy; selεctεd nodes in said PLAF represent, each, a given offset of a search key portion within said inset sεarch kεy; link(s) originatεd from εach given node from among said selected nodes, represent, each, a unique valuε of said search key portion; the PLAF having at least two sub-PIAF's being arrangεd, each, in a block; - 23 -

said data base file managemεnt systεm is furthεr capable of arranging said blocks as a balanced structure of blocks.

In the context of PAIF, it should bε notεd that said sεlεctεd nodεs, whilst prεfεrably including only a givεn offsεt, this is not always nεcessarily the case. Thus, one or more of said nodes may include other information, such as portions of the keys and/or other information, all as requirεd and appropriatε.

According to a modified embodimεnt, thε triε bεing of thε PAIF type, the indexing schemε is constituted by a search scheme substantially identical to that of the PAIF trie.

Beforε procεεding any furthεr it should bε notεd that for convεniεncε of dεscription only thε invεntion is described mainly with refεrence to triε as a basic partitionεd indεx. Thosε vεrsεd in thε art will rεadily apprεciate that the invention is by no means bound by trie and accordingly any basic partitioned indεx is applicable.

Thus, a database filε managεmεnt system that employs a layerεd index of the invention is advantageous, in terms of enhanced perfoimance as compared to hitherto .known techniques inter alia owing to the following characteristics:

• The data are hεld inhεrently in sorted form according to search key.

Namely, One can navigate in the tree by the order of the kεys of thε data rεcords. The layerεd indεx inhεrεntly supports sεquential operations likε "get next" and "get previous". In this rεspect, the proposed layered index constitutes an advantage ovεr ε.g. hashing scheme and some implemεntations of digital trees.

• There is no requirεmεnt for in advance I owledgε of thε contents of the database, in ordεr to maintain balanced index.

• A balanced layerεd indεx is retained and the depth of thε layered index is relatively small, thereby minimizing the number of accesses (normally - 24 -

slow I/O operations) that are requirεd to pεrform updatε transaction or access data record. According to onε εmbodiment, practically one I/O

(and no more than two I/O) operation (constituting one or two access) is requirεd in order to access a given data record from among billions data records.

The invention thus furthεr provides for in a computer system having a storage medium of at least an internal mεmory that rangεs bεtween 10 to 20

M bytε or more, and an extεrnal mεmory; a data structure that includes an index over thε kεys of thε data rεcords; thε indεx is arrangεd in blocks; such that for one billion data records substantially no more than two accessεs to said εxternal memory are required in order to access a block that is associated with any one of said billion data records, irrespective of the size of the kεy of said data rεcords.

Still furthεr, thε invεntion providεs for in a computer system having a storage medium of at least an internal memory that ranges between 10 to 20 M byte or more, and an external memory; a data structure that includes an index over the keys of the data records; the index is arrangεd in blocks; such that onε million data rεcords substantially all thε blocks of thε indεx arε accommodated in said internal mεmory regardless of the size of the key of said data records.

Thε invεntion furthεr provides for In a computer system having a storage medium, a data structure that includes an index over thε kεys of data rεcords; thε index is arranged in a balanced structure of blocks and enables to perform sequεntial opεrations on said data records; the index sizε is εssεntially not affεcted from the size of said kεys.

It should bε notεd that the data records may residε in thε blocks of thε layεred index, or may reside in separate data files (one or more). In thε latter embodiment the data records should be associated, of course, to the corre- - 25 -

sponding layεrεd indεx. As will furthεr be clarified with refεrence to thε dεscription of specific embodiment below, a given data record may accommodate more than one search key.

Thε indεx, according to thε invεntion is prεferably, although not necessarily constructed by one or more of the indεxing schεmεs sεlεctεd from the specified index schemes. Typical, yet not exclusive, examples of multi-way treεs indεxεs bεing thε B-trεe indexing scheme.

There follows now a discussion that pertains to the second aspect of the invention.

Thus, normally data consists of records of several types (e.g. in the examplε abovε books and borrowers). The type of the record determines its fields (attributes) and its keys. In a conventional systεm e.g. of the kind employing a B-treε indεx, thε typε of each key is not kept with the rεcord and not considered part of the key. Thε program "k. nows" thε typε of the record, and therefrom the fields of the data records and their structure.

According to the second aspect of the invention there is proposed a different approach. Each typε of key is assigned with a designator — a string of bits, e.g. a series of one or more characters which, normally but not necessarily, (is) are addεd as a prefix to all keys of this type. A designated key is a key with its designator. The designator is treated as part of the key (for search or update purposes), and therεforε is part of the index schemε.

Thε dεsignator εnables to obtain the properties of the data record as a function of thε typε. Thus by looking at thε dεsignator of thε kεy, onε obtains thε dεsignator hεncε can dεducε thε typε of thε rεcord, onε need not .know the record type a priori. Data records in which thε kεys arε dεsignatεd arε called designated data records. A designated index is an index that enablεs sεarch on designated data records.

Therε follows a dεscription which exemplified the use of designators in accordance with the invεntion. Thus, consider a class C , such that all data records of this class have a key field (or fields) k_λ , and possibly sevεral - 26 -

other non-key fields. Let R bε a data rεcord of class C, whεr R.k_x =FIAT. Lεt thε dεsignator of k_x bε A. By adding thε designator one gets thε key AFLAT. To access a rεcord with R.k =FIAT, the designatεd indεx is sεarchεd for thε kεy AFIAT.

Having dεscribεd thε designator fεaturε, thεrε follows a dεscription of another feature according to the second aspect — subordination of data records. Consider a record Rl with a kεy Kl, and rεcord R2 with a composite key consisting of the ordered pair of keys Kl, K2. (In this case, the designated key of R2 is the composite key K1',K2' , where K2' consists of thε kεy K2 prεfixεd by a designator D2. (D2 is considerεd thε dεsignator of R2.) In a dεsignated index, one can select Rl by searching the key Kl' — the key Kl with its designator Dl, and select R2 by searching the same index by the key K1'K2' — the concatenation of Kl' and K2' wherε K2' is thε kεy K2 with its designator D2. In this case K2 is subordinated Jo Kl.

The subordination relationship is εxtεndεd also to rεcords. If K2 is subordinated to Kl, the designator of K2' is D2 and the designator of R2 is also D2 (or Dl, D2). If R2 is subordinated to Rl, the key of R2 is composed by concatenating K2' to Kl . Note that in K2', D2 is prefixed to K2.

In thε ERD modεl, the type of record Rl and the type of rεcord R2 may stand in a one-to-many relationship, meaning that several records of type R2 may be related to a single record of type Rl. Such a relation can be implemεntεd by thε subordination rεlation: sεvεral records of type R2 will be subordinatεd to a singlε rεcord of typε (ε.g., sεvεral books can bε borrowεd by thε samε borrowεr). In particular, if this relationship is one-to-one (e.g. onε to one is the relationship where only one book can bε borrowεd by εach borrower) then the key K1'D2, where D2 is the designator of R2, is sufficient to locate R2. In a designatεd indεx thε sεarch path to K1'K2' includes the search path to Kl'. (This doεs not preclude the possibility of reaching the record R2 via another path.) The latter characteristic exhibits another important feature according to the second - 27 -

aspεct, i.ε. inhεrεnt maintenance of data integrity. Thus, the insεrtion of a rεcord whose key is K1'K2' (or K1'D2) can only be perfoi εd if thε record whose key is Kl' exists. In thε example above, an insertion of a transaction of a borrower (Borrower_Id = 111111) who boirowed a book (book_Id = 2222) should result in insεrtion of a rεcord R2 whose designatεd key is A111111B2222 (hereon borrower-book record)_only if the specified borrower (data record Rl with Kl=l l l l l l) exists (in thε abovε εxamplε, the designator of the borrower is A and the designator of the subordinated borrower-book data record is B). Data integrity is accomplished with just small overhεad since the path in the index to the borrower-book record includes sufficient information to detεrminε whether the borrower exists. If the borrower doεs not εxist, thε path to thε composite key will not pass through the borrower. This will be automatically detected in the insertion process. In contrast, according to the prior art, records of different types werε associated with different index files. Bεfore inserting a new data record (with a composite key) in the Borrower-Book indεx filε, a sεparatε check must be performεd in the Borrower index file in order to ascertain whether the specified borrower (record Rl, key Kl) exists, thus posing undue overhead.

Note that the subordination relation is not limited to just two levεls, thε subordinatεd record can itself have a record subordinated to it and accordingly n level of subordination may be accomplished. For εxamplε, consider a banking database, wherε thε account rεcords are subordinated to the branch rεcords, and deposits records arε subordinated to accounts.

Turning now to the multi-dimension feature according to the second aspect of the invention, lεt R bε a rεcord that is idεntifiεd by εithεr of two kεys Kl and K2. Thεn, thε designatored index should contain two search paths to R, one by the designated key Kl' and one by thε dεsignatεd kεy K2'. Accordingly, R constitutes a multi-dimensional record. A multi-dimensional index includes the desisnated index and the - 28 -

multi-dimεnsional data rεcord(s).

Consider a first embodimεnt where multi-dimensional index does not apply to subordinated data records. Thus, for example, consider a class C, such that all data records of this class have two key fields k_x — the car model — and k₂ — its licensε platε number, and possibly sevεral non-kεy fields. Let R bε a data rεcord of class C , whεrε R.k_x =FIAT and R.k₂ = 127. Lεt thε dεsignator of k_x be A and that of k. be B. By adding the designators one gets the keys AFIAT and B 127. These extended keys are insertεd into a single designatεd indεx. To access a record with R.k_x =FIAT, the designatεd indεx is sεarchεd for thε key AFIAT, and to select a record with R.k₂ = 127, the same dεsignatεd indεx is searched for B 127.

The above discussion and examplε considered a multi-dimensional index wherε the data records do not necessarily exhibit subordination relationship. The multidimensional index may optionally applied also to subordinatεd data rεcords. For εxamplε, consider a banking database, where the dεposits arε subordinatεd to both accounts and depositors. A single designated index provides access to accounts (by the designated key k_x account-number), to depositors (by the dεsignatorεd kεy &₂' depositor-name) and to deposits by both k_x k₂ and k₂ k (It is possible, of course, to use differεnt designators for the k_x when it is subordinated to k₂ and to k₂ when it is subordinated to k .)

The designator of a multi-dimεnsional rεcord dεpεnds on thε dεsignator of thε kεy usεd to sεarch or update the record. Thus, the dεsignator of a car rεcord (FIAT, 127) is A whεn sεarching or updating thε rεcord by thε kεy AFIAT, and is B whεn accessing it via the license plate number B 127.

In addition to the data records it is neεdεd to maintain meta-data. The meta-data includes infoπnation on the differεnt rεcords as a function of thεir typε. Thus, it is needed to identify the designator and as a result the - 29 -

information on thε rεcord is available, for examplε a dεscription of thε various fiεlds, kεys, subordination, rεcord sizε εtc. Thε sεarch scheme in the designated index is oblivious to the meta-data. It locates thε record, identifiεs thε dεsignator (for εxample the designator can be prefixed to the record) and construct the (composite) designated key.

There is thus provided in accordance with a second aspect of the invεntion, in a storagε mεdium usεd by a databasε filε managεmεnt systεm executed on data processing systεm, a data structure that includes: an index over the keys of data records; the data records bεing of at lεast two typεs where data records of the sεcond typε arε subordinatεd to thε data rεcords of the first type.

Still further in accordance with the sεcond aspect therε is providεd in a storagε medium used by a database file management system executed on data processing system, a data structure that includes: a designatεd indεx over designatεd kεys of data records; the data rεcords, constituting designated data records, bεing of at lεast two types wherε dεsignatεd data rεcords of thε sεcond typε arε subordinatεd to thε dεsignatεd data rεcords of the first type.

According to the second aspect various advantages arε accomplished including:

α The data structure that includes designated index and designatεd data can maintain the relations bεtwεεn diffεrent data items.

□ The data structure that includes designated index and designatεd data can link logically related items.

□ The data structure that includes designated index and designatεd data can support sεvεral data models simultaneously and efficiently.

□ The data structure that includes designated index and designatεd data allows high εfficiεncy in maintaining data integrity. - 30 -

□ The data structure that includes designatεd indεx and dεsignatεd data allows high efficiency in rεtriεving relating data.

A detailed discussion as regards the various advantages offεred by the database file managemεnt systεm of thε invention is given below with reference to specific embodimεnts.

It should bε notεd that the data records may constitute part of the PAIF, or may residε in onε or morε sεparatε data filεs. In thε lattεr εmbodimεnt thε data records should be linked, of course, to the corresponding P.AIF. As will furthεr bε clarified with refεrεnce to the description of specific embodiment below, a givεn data rεcord may accommodate more than one sεarch kεy.

It would also bε presentεd how complex data structures and data relations can be supported by a new uniform and simple technology.

It would also be presented how an index structure can bε of a minimal sizε, not dεpεnding on the size of the keys.

All of the above mentioned advantages are supported inherently by the invention without any preliminary considerations on the data (i.e. key rangε is unknown, number of records is unknown, random physical location of data records is assumed and so on).

By still another aspεct thε invεntion providεs in a storage medium used by a database file managemεnt system executed on data processing system, a data structure that includes: an index being stored in the storage medium and constructed over the keys of said data rεcords that arε stored in blocks; the index being arranged in blocks with thε lεaf blocks being linked to data records by means of links; said index is characterizεd in that at lεast onε of said links is shared by at least two data records stored in thε same block.

By one embodiment, the index bεing constituted by a trie.

Still further, the invention provides for, in a storage medium used - 31 -

by a database file managemεnt systεm εxεcuted on data processing system, a data structure that includes: an index bεing stored in a storagε mεdium and constructed over the keys of said data records that arε storεd in blocks; the index being arranged in blocks with the leaf blocks being linkεd to data rεcords by means of links; said index is charactεrizεd in that at lεast onε of said links is shared by at least two data records stored in the samε block; said indεx constituting a layεrεd index according to claim 1, and blocks of said basic partitioned index arε linked to said data records.

BRIEF DESCRIPTION OF THE DRAWINGS:

In order to understand the invention and to see how it may be carried out in practice, a preferred embodiment will now be dεscribεd, by way of non-limiting example only, with reference to the accompanying drawings, in which:

Fig. 1 shows a generalized block diagram of a system employing a database file management system;

Fig. 2 shows a samplε databasε structure rεprεsεntεd as an Entity Rεlationship Diagram (ERD), and serving for illustrative purposes;

Fig. 3 shows the database of Fig. 2, represented as tables in accordance with the relational data model, with each table holding few data occurrences;

Fig. 4 shows the "CLIENT" table of Fig. 3, in accordance with file managemεnt systεm employing conventional B⁺ treε indεx schεmε;

Fig. 5 shows thε "CLIENT" tablε of Fig. 3, in accordance with file managεmεnt systεm employing conventional trie index scheme;

Figs. 6A-6C show the "CLIENT" table of Fig. 3, in accordance with file managemεnt system employing a P.AIF index scheme; - 32 -

Figs. 7A-7H show schematic illustrations exεmplifying construction of a layεrεd indεx, according to onε εmbodimεnt of thε invεntion;

Figs. 8A-B show schematic illustrations exεmplifying construction of a layεrεd indεx, according to yεt another embodimεnt of thε invention;

Figs. 9A-G show schematic illustrations exεmplifying construction of a layεrεd indεx, according to yεt another εmbodimεnt of thε invention;

Figs. 10A-B show schematic illustrations exemplifying construction of a layered index, according to another embodimεnt of the invention;

Fig. 11 shows a schematic illustration exemplifying construction of a layered index, according to still yet another εmbodimεnt of thε invεntion;

Fig. 12 shows a schematic illustration for exemplifying use of designators in a designated index in accordance with one embodiment of the invention;

Fig. 13A-E show five schematic illustrations for exemplifying feature of subordination of data rεcords in a dεsignatεd indεx in accordance with one embodimεnt of thε invεntion;

Fig. 14 shows a schematic illustration of a designatεd indεx εxεmplifying multi-dimension record according to an embodimεnt of the invention;

Fig. 15 shows a schematic illustration of a designated index according to another embodiment of the invention;

Fig. 16 shows a schematic illustration for exεmplifying feature of relations among data records provided in accordance with one embodiment of the invention;

Fig. 17A-B show a schematic illustration of compressεd represεntation of links to data records in accordance with one embodiment of the invention;

Fig. 18A-D show four benchmark graphs demonstrating the enhanced performance, in terms of response timε and filε sizε, of a databasε utilizing a filε managεmεnt system of the invention vs. commercially available Ctreε based database; and - 33 -

Fig. 19A-D show four bεnchmark graphs demonstrating the enhanced performance, in terms of rεsponsε time and file size, of a databasε utilizing a file management system of the invention vs. commercially available Btree based database.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

Attention is first directed to Fig. 1 showing a generalized block diagram of a system employing a databasε file management system of the invention. Thus, a genεral purposε computer 1, e.g. a pεrsonal computer (P.C.) employing a Pentium microprocessor 3 commercially available from Intel Co.rp. U.S.A, has an operating system module 5, ε.g. Windows NT^® commercially available from Microsoft Inc. U.S.A., which communicates with processor 3 and controls the overall operation of computer 1.

P.C. 1 further accommodates a plurality of user application programs of which only threε 7, 9 and 11, rεspεctivεly arε shown. Thε usεr application programs arε εxεcutεd by processor 3 under the control of operating system 5, in a .known per se manner, and are responsive to user input fεd tlirough keyboard 13 by the intermediary of I/O port 15 and thε opεrating systεm 5. The user application programs further communicate with monitor 16 for displaying data, by the intermediary of I/O port 17 and operating system 5. The user application programs can access data stored in a database by means of database managemεnt system module 20. The genεralizεd database management system, as depicted generally in fig. 1, includes high lεvεl managεmεnt system 22 which views, as a rule, the undεrlying data in a "logical" manner and is responsive, to thε usεr application program by means .known per se such as, e.g., SQL Data Definition and Data Manipulation language (DDL and DML). The databasε managεmεnt systεm typically exploits, in a .known per se manner, a data dictionary 24 that includes meta-data which maintains information on the underlying data. - 34 -

Thε underlying structure of thε data is govεrnεd by databasε file management system 26 which is associated with the indεx schεmε and actual data rεcords 28. Thε "high-lεvεl" logical instructions (e.g. SQL commands) received and processεd by thε high-lεvεl managεmεnt system 22 are converted into "lower level" commands that access or update the data records that are stored in the database file(s) and to this εnd thε databasε file managemεnt system considers the actual structure and organization of the data records. The "high levεl" and "low level" portions of the database file management system can communicate through a known per sε Application Programmers Interface (.API), e.g. the Microsoft opεn databasε connectivity (ODBC) interface commercially available from Microsoft. The utilization of the ODBC enables "high levεl" modules of the database filε managεmεnt systεm or application program to transparently communicate with differεnt "database file managεmεnt systems" that support the ODBC standard. The terms access or update of data records used herεin εncompass all kind of data manipulation including "find", "insert", "delεtε" and "modify" data rεcord(s), and thε pεrtinεnt DDL commands which afford the construction, modification and delεtion of thε databasε. Fig. 1 further shows, schematically, a storage medium in the form of internal memory module 29 (ε.g. 16 Mεga bytε and possibly εmploying a cache memory sub-module) and an εxtεrnal mεmory modulε 29' (ε.g. 1 gigabytε). Typically, εxtεrnal mεmory 29' is accessed through an extεrnal, relatively slow communication bus (not shown), whereas the internal memory is normally accessed by means of a faster internal bus (not shown). Normally, by virtue of the relatively small size of thε intεrnal mεmory, only those applications (or portions therεof) that are currently executed are loaded from the external memory into the internal memory. By the same token, for large databases that cannot bε accommodated in their entirety in the internal mεmory, a major portion thereof is stored in the external memory. Thus, in responsε to an application gεnεratεd query that seeks for one or more data records in the database, the - 35 -

database management system utilizes operating system services (i.e. an I/O operation) in order to load, through the extεrnal communication bus, onε or morε blocks of data from the eternal to the intεmal memory. If the sought data records are not found in the loaded blocks, additional I/O operations are requirεd until the sought data records are targeted.

It should be noted that for simplicity of presεntation, the internal and extεrnal memory modules 29, 29', arε sεparatεd from thε various modulεs 5, 7, 9, 11, 20. Clεarly, albεit not shown, thε various modulεs (opεrating system, DBMS, and user application programs) are normally stored in the εxtεmal mεmory and thεir currently executed portions are loaded to the internal memory.

Computεr 1 may serve as a workstation forming part of a L_^AN Local .Area Network (L.AN) (not shown) which employs a server having also essεntially thε same structure of Fig. 1. To the extent that the workstations and the sεrvεr employ client-servεr basεd protocols a predominant portion of said modules (including the database rεcords thεmsεlvεs 28) reside in thε server.

Those versεd in thε art will readily appreciate that the foregoing embodimεnts dεscribεd with rεfεr ncε to of Fig. 1 are only two out of many possible variants. Thus, by way of non-limiting εxamplε, thε databasε may be an on-line database residing in an Intεmεt Wεb sitε. Thε invention is, of course, not limited to the specified partition of small internal mεmory and largε εxternal memory. Thus, for example, by a modified embodiment a large internal and extεrnal mεmoriεs arε employεd and by yet another modified embodiment only internal mεmory is εmployed.

It should be further noted that for clarity of explanation system 1 is illustratεd in a simplifrεd and gεnεralized manner. A more detailed discussion of database file managemεnt systεms and in particular of thε various components that are normally accommodated in database file management systems can be found, e.g. in Chapter 7 of "Database System - 36 -

Concepts" ibid.

Having described the genεral structure of a systεm of thε invεntion, attεntion is now directed to Fig. 2 showing a sample database structure rεpresεntεd as Entity Rεlationship Diagram (ERD), and sεrving for illustrativε purposεs. Thus, the ERD 30 of Fig. 2 consists of the entities "CLIENT" 32 and "ACCOUNT" 34 as well as an "n to m" "DEPOSIT" 36 relationship indicating that a given client may have more than one account and by thε samε tokεn a givεn account may be owned by more than one client.

As shown, the entity "CLIENT" has the following attributes (fields): "Client_Id" 38 bεing a kεy attribute that uniquely identifies each client, "Name" 39 standing for the client's name and "Address" 40 standing for the client's address. The εntity "ACCOUNT" has thε following attributεs (fiεlds): "Acc_No" 42 bεing a key attribute that uniquely identifiεs εach account, and "Balance" 43 holding the balance of the account. The relationship "DEPOSIT" consists of pairs of keys of the "CLIENT" and "ACCOUNT" entities, such that each pair is indicative of particular account owned by specific client.

Turning now to Fig. 3, therε is shown a databasε of Fig. 2, rεprεsεntεd as three tables 50, 51 and 52 corresponding to thε relational data model, 32, 34 and 36, rεspεctivεly, with εach tablε holding a few data occurrencεs for illustrative purposes. It should be noted that the length of the key field ("Client D") of the "CLIENT" table is 5 digits, whereas the lεngth of the key field ("AccJD") of the "ACCOUNT" tablε is 6 digits. Thε client table holds 5 data occurrences 55-59, thε account tablε holds 2 data occurrences 65, 66 and the deposit table holds 3 data occurrences 70-72.

In accordance with prior art techniques, for each table thεrε is, as a rulε, a diffεrεnt index file by the primary key. Thus, Fig. 4 illustrates an undεrlying indεxing filε of thε "CLIENT" tablε of Fig. 3, in accordance with file managemεnt systεm εmploying thε conventional B- treε indexing schemε. As - 37 -

shown, the indexing file 80 consists of three blocks 80a-c, standing for a root block and two leaf blocks respectively. The data records are organized randomly in a separatε file 81 holding the five data records 83-87. Each block consists of a succession of pair of fields (e.g. 82a-b and 83a-b in block 80a). In εach pair thε first fiεld stands for a sεarch kεy value and the second field stands for a link such as number that identifies the next block to sεarch, or in the case of a leaf block a link to the data record such as a number identifying the data record. The latter realization form a non limiting embodimεnt of associating a data rεcord to a block. In thε specific embodimεnt of Fig. 4, a search for records with a key that εquals 12355 or smallεr valuε arε dirεctεd from root block 80a to block 80b.

Thus, a sεarch for a rεcord whosε kεy is 12355 (82a) starts in root block 80a and is dirεctεd by thε link 82b to block 80b. In block 80b, the search key 12355 (86a) is associated with link 86b indicating the address of the data record identifiεd by this sεarch kεy in thε data file 81. Put differently the data record that is identified by search key "12355" (57 in Fig. 3) is the forth in order in data file 81.

The tables "ACCOUNT" and "DEPOSIT" are likewise arranged in two separate B-treεs tree indexing files, respectively.

The B^'treε indεxing filε of Fig. 4 εxhibits onε of the significant shortcomings of this approach in that the keys (i.e. search kεys) arε duplicated, i.e. they are hεld both in thε internal blocks (i.e. in the index scheme) and in the data records associated with the B- treε indεx. Thus, for εxamplε, thε search key of data record 57 (in Fig. 3) is not only held as an integral part of the data record 86 in filε 81 but also in block 80b (sεarch kεy 86a) and sometimes in parent blocks such as 80a (sεarch kεy 82).

This bεing thε casε, one readily notices that for large files (which is thε case in many real-lifε scenarios) the duplication of the search keys (and particularly for long kεys) rεsults in inflatεd indεx which necessitate a large storage volume, which also adversεly affεcts thε performance. - 38 -

Fig. 5 illustrates a differεnt indεxing scheme of the "CLIENT" table of Fig. 3, in accordance with a file managεmεnt systεm εmploying a .known trie indexing schemε. Thus, trie indexing file 90 includes plurality of nodes and links whεrεin each node stands for an offset position and the link stands for a value at this offset. Table 91 has four columns. Thε first column indicates which digit position is to be usεd, thε sεcond column thε valuε of that digit. A digit valuε partitions the key into two subsεts. Columns thrεε and four dirεct thε sεarch procedure to the next step.

In order to locate a given sεarch kεy, ε.g. 12355, a digit at the position indicated by the root (position "5" indicated by nodε 90a, bεing also thε first column in thε first linε of tablε 91) is compared to the value specified at the second column of the same line (valuε "5" indicated also by link 90b in the trie index). Since the digit at position 5 of the sought search key 12355 is indeεd 5, control is transferred to line 2 (as indicated by the third column of line 1 of table 91). Next, the digit at position 3 of the sought search key (90c in the treε, bεing also thε valuε of thε first column of thε sεcond linε in tablε 91) is compared to thε valuε 3 (link 90d, being also the second column in thε second line of the table 91). Since match occurs control is transferred to line 3 in the table. In this step the digit at position 4 of the sought search key doεs not match the value specified at the second column of line threε (i.ε. "5" vs. "4") and accordingly as indicated in the fourth column of table 91 ("not equal") a link to the sought data record 57 (86 in fig. 4) is obtained.

The tables "ACCOUNT" and "DEPOSIT" are likewisε arranged in two separate trie indεxing filεs, rεspεctivεly. In contrast to thε B-trεε indεxing filε of Fig. 4, the one shown Fig. 5 does not necεssitatε duplication of the search key. Put diffεrεntly, only the offsets and the link values and not the entirε kεys arε held in the trie (90). In this sεnsε it constitutes an advantage over the B- technique.

However, and as specified, the above trie is associated with some shortcomings: it retains an evεn distribution of thε data at thε cost of knowing - 39 -

a priori the contents of the database and consequεntly partitioning thε kεys so as to obtain balanced structure. Knowing a priori the contents of the database is obviously undesirεd as it poses undue constraint since databases of the kind described in Fig. 2 are of a dynamic nature, e.g. for thε spεcific databasε of Fig. 2, nεw clients open accounts, senior clients close accounts, nεw clients are registered as co-owners of existing accounts etc.

Another drawback of the above tree is that it does not support sequεntial processing. Navigating in the treε would rεsult in accessing the data by the following ordεr - 83, 86, 87, 84, 85 (fig.4) and not by the order of the kεy.

Having shown a known triε indεx schεmε (with reference to Fig. 5), there follows a description of various embodimεnts of an indεx of thε invεntion which includes basic partitioned index and which cope with the drawbacks dεscribεd above in connection with hitherto .known techniques. Specifically there will be shown a preferred embodiment of the index in the form of layered index, and preferred embodiment of basic partitioned indεx in the form of trie. Thesε εxamplεs are by no means binding.

Before turning to the explanation of the various embodimεnts there is described, with refεrεncε also to Fig. 6A-C, a nεw trie index schemε dεsignatεd P.AIF. As will be shown below, the PAIF is not confined to a treε structure. On the basis of the PAIF, various embodimεnts of layεred index are described, with reference to FIG. 7-9, which include representative index constructed over thε representative keys of the PAIF. By the embodimεnts of Figs. 7 to 9, thε indεx scheme of the representative index and that of the basic partitioned index being substantially thε samε PAIF.

In Fig. 10 thεrε is dεscribεd yεt another embodimεnt of thε layεrεd indεx, with a diffεrεnt triε. As will bε shown, in thε embodiment of Fig. 10, the representativε indεx and thε triε arε also substantially thε samε. This, howεvεr, is not obligatory and as is εxemplified, ε.g. with referεnce to Fig. 11, wherε the trie and thε represεntative index are differεnt. - 40 -

Turning now to Figs. 6A-C, there is shown a succession of schematic illustration of thε "CLIENT" tablε of Fig. 3, in accordance with the file management system employing the P.AIF. The terms "transaction" and "operation" are used interchangeably.

In the description below the basic commands which enable data manipulation in the PAIF will be reviewεd, i.ε. insert new data record to a PAIF, find data record in PAIF, and delεtε existing data record. Those versεd in thε art will no doubt apprεciatε that on thε basis of thesε basic primitives more compound data manipulation opεrations, (ε.g. "Join") may bε rεalizεd.

Turning at thε onset to Fig. 6A, therε is shown thε Cliεnt's data record 103 (56 in table Client of Fig. 3) having search key "12345" (i.e. a 5-bytε-long sεarch kεy). Thε P.AIF of Fig. 6A (100) is, of course, trivial and consists of a single node 101 (standing for both the root nodε and thε leaf node) linked by means of a long link 102 to data record 103.

Thε nodε 100 rεprεsεnts an offsεt 0 in said sεarch kεy and thε link 102 represεnts a value "1" of the search key portion (being by this particular embodiment 1 -byte-long) at the specified offset.

As clearly shown in Fig. 6A, the data record 103 is associated with a search path being a unit that consists of a nodε 101 and a link 102 which defines an offset and a pertinent search key portion valuε that conforms to thε coirεsponding search key portion value at that particular offset within the search key of the specified data record. More specifically, thε value of the onε-bytε search-key-portion at offset 0 within search key "12345" is indeεd

11 1 II

Turning now to Fig. 6B-1 therε is shown a P.AIF 108 aftεr the termination of a successive transaction in which the data record having Cliεnt_Id_No "12445" 107 has bεεn insεrtεd (data occurrence 58 in table Client of Fig. 3). Thε search keys of data rεcords 103 and 107 are distinguished only in the third byte (offset 2), i.e. "3" and "4" respεctivεly.

The unit defined by root node 101 and the link 102 is not sufficient to - 41 -

distinguish bεtwεεn data rεcords 103 and 107, since the value of the 1-byte search key portion at offsεt 0 for both data records is "1". Hence, node 104 indicates thε lowεst offsεt which distinguishes betwεεn thε two records and links 105 and 106 indicate on the rεspective 1-byte sεarch kεy portion "3" and "4" at offsεt 2. It should bε notεd that the realization of the FAIF is not bound by the specific examplεs illustrated in the drawings and various implemεntation thereof may apply, depεnding upon thε particular application. Thus, for example, Figs. 6B-2 and 6B-3 illustrate other two options of realizing the PAIF of Fig. 6B-1, where in Fig. 6B-2 the full key is reprεsεntεd in thε P.AIF (ε.g. all thε digits of thε rεcord 12445 arε spεcifiεd in thε links commεncing from thε root nodε and ending at the data record). Thε latter realization is more explicit and less efficient in terms of space, as compared to the sparse realization of Fig. 6B-3 where only the nodes which arε absolutεly necessary appear in thε tree. Other variants are, of course, applicable

Before moving on to describe a procedure of inserting a new data record to an existing database it should be borne in mind that the higher the node in the trie P.AIF the smaller is the offsεt indicated thereby (e.g. in the P.AIF of Fig. 6B, nodε 101 is highεr than modε 104 and accordingly it is assigned with smaller offset - "0" vs. "2").

Generally speaking, the prefεrrεd procedure for inserting a new data record into an existing P.AIF includes thε execution of the following steps: i. advancing along a reference path commencing from the root node and ending at a data record associated to a lεaf node (referred to as "reference data record"); in each node in the refεrεncε path, advancing along a link originated from said node if the value reprεsεnted by the link equals the value of the 1-bit-long key portion at thε offsεt spεcifiεd by said nodε; in thε casε that thε offsεt spεcified in the node is beyond any corresponding key portion in the key, or if therε is no link with said value, advancing along an arbitrary path to any refεrεncε data rεcord ; - 42 -

ii. comparing thε search key of the reference data record to that of the new data record for determining the smallest offsεt of thε sεarch kεy portion that discerns the two (hereinafter discerning offset). iii. proceed to one of the following steps (iii.0-iii.3) depεnding upon thε valuε of thε discerning offset: iii.O if the data records are equal then terminatε; or iii.1 if thε discerning offset matches the offset indicated by one of the nodes in the rεfεrεnce path, add another link originating from said one node and assign to said link the value of the search key portion at the discerning offset takεn from thε sεarch kεy of thε nεw data record; or iii.2 if the discerning offset is larger than that indicated by thε lεaf nodε that is linkεd, by means of a link, to the refεrεncε data rεcord: iii.2.1 disconnect the link from thε rεfεrεnce data record (i.e. it remains temporarily "loosε") and movε thε link to a nεw nodε; thε nεw nodε is assignεd with a value of the disceming offset; iii.2.2 connect the refεrεnce data record and thε nεw nodε (which now bεcomεs a lεaf nodε) and assign to the link (long link) a value of the search-kεy-portion at thε discerning offset taken from the search key of thε refer- ence data record; iii.2.3 connect by means of a link the nεw data rεcord and the new node and assign to the link (long link) a value of the search-kεy-portion at thε discerning offset taken from thε search key of the new data record; or iii.3 if conditions iii.0,iii.1 and iii.2 are not mεt, thεrε εxists, in the refεrence search path, a father node and a child node therεof such that thε discerning offset is, at the same time, larger than - 43 -

thε offset assigned to the father node and smaller than the offsεt assignεd to thε child nodε -(- considered case A), or all the nodes in the refεrεnce search path have a value greater than the disceming offset - (-- considered casε B); accordingly, apply thε following sub-stεps: iii.3.1 for casε A and B, create a new node and assign the node with the value of said discrening offsεt, for casε A only - disconnect the link from the father node to the child node and shift the link to a new internal node (i.ε. the child node remains temporarily "loosε"); iii.3.2 for casε A and B, connect by means of a link (long link) the new data record and said new internal nodε; thε valuε assignεd to thε link is that of thε sεarch-kεy-portion at the discerning offset, as taken from the sεarch kεy of thε nεw data rεcord; iii.3.3 for casε A and B, connect by means of a new link thε nεw node and for case A - the child node, for case B - the root nodε (i.e. the new node becomes for case A - a new fathεr nodε, for casε B - a nεw root nodε), and the value assigned to said link is the sεarch-kεy-portion at thε offsεt indicated by the new node, taken from the search key of the refεrence data record. UUH It should bε notεd that for a different reference path a different PAIF may be obtained.

For a better understanding, the aforemεntionεd "insεrt data rεcord" operation will be successivεly appliεd to thε spεcific PAIF of Fig. 6B, εach timε with a diffεrεnt data rεcord so as to exemplify the threε distinct scenarios stipulated in steps iii.l - iii.3. above, therεby rεsulting in three PAIF illustrated in Figs. 6C-1 to 6C-3, respεctively.

In the first εxamplε the CLIENT data record having Client_Id (or - 44 -

sεarch kεy) "12546" (59 in tablε Cliεnt of Fig. 3) is inserted to the P.AIF of Fig. 6B. As stipulated in stεp (i), a movε is madε along thε rεfεrεnce path commencing from the root 101 and ending, for εxample, at data record 103 which stands for thε "reference data record". This being implemεntεd by advancing from node 101 along link 102 (where in offset '0' of the insεrted data record the value of the 1 long digit is ' 1 ') and thereafter since at offset '2' (as specified by node 104) nonε of the values of links 105 and 106 (4 and 3 respectively) matches the value of the insertεd key at offset 2 ('5') advance is made at arbitrary path (by this particular embodimεnt through link 106) to thε rεfεrεncε data rεcord 103.

Thε comparison opεration stipulated in step (ii) results in that the search key of the new data rεcord in distinguished from the search key of the reference data record (103) at offsets 2 ("5" vs. "3") and 4 ("6" vs. "5"). The smallest offsεt ("discerning offset") is therefore 2.

Turning now to step (iii), thε condition of step iii.1 is met since thε discerning offset is εqual to that assignεd to nodε 104. Accordingly, and as is shown in Fig. 6C-1, nεw link 111 connects node 104 to thε nεw data rεcord 112. Thε value assigned to link 111 is 5, bεing thε bytε value at position 2 in the search key of the new data record 112. P.AIF 110 of Fig. 6C-1 is therefore the result of inserting the data record 112 into the PAIF 108 ofFig. 6B-l.

Moving now to the second example, the CLIENT data record having Client_Id (or search kεy) "12355" (57 in tablε Cliεnt of Fig. 3) is insεrtεd into thε P.ALF of Fig. 6B-1. Steps i and ii, stipulated above result in a refεrεncε path starting at nodε 101 and εnding at data rεcord 103.

Turning now to stεp (iii), the condition of step iii.2 is mεt since the discerning offset 3 is larger than the offset 2 of lεaf node 104 in the refεrεncε search path. Accordingly, in compliance with step iii.2 J and as is shown in the resulting PAIF 120 of Fig. 6C-2, thε link 106 is disconnected from reference data record 103 and is connected to a new node 121. The new node - 45 -

is assigned with thε discerning offset 3. Next, in compliance with step iii.2.2, the refεrεnce data record 103 and the new node 121 are connected by means of new link 122. The nεw link is assignεd with thε valuε 4 (being the digit value at the disceming offset 3 taken from the search key "12345" of the reference data record 103); and finally, as stipulated in step iii.2.3, the new data record 123 is connected to node 121 by means of link 124 which is assigned with the valuε "5" (bεing thε digit at thε disceming offset 3 taken from thε sεarch kεy "12355" of thε new data record 123). PAIF 120 of Fig. 6C-2 is, therefore, the result of inserting the data record 123 into the PAIF 108 of Fig. 6B-1.

The third and last εxamplε concerns inserting the CLIENT data record having Client_Id (or sεarch key) "H346" (55 in table Cliεnt of Fig. 3) into thε PAIF of Fig. 6B-1. Applying thε aforεmentioned stεps i and ii result in advancing from node 101 to data record 103 (in Fig. 6B) and establishing that the disceming offset is 1.

Thus in step iii, thε condition of step iii.3 is met. Accordingly, in compliance with step iii.3 J and as is shown in the rεsulting PAIF 130 of Fig. 6C-3, thε link 102 is shiftεd to a nεw intεmal node 131. The new internal node 131 is assigned with the value 1 (bεing thε discerning offset). As stipulatεd in step iii.3.2, the nεw data rεcord 132 and node 131 are directly connected by means of new link 133. The value assigned to link 133 is 1 (being the digit at the disceming offset 1 taken from the search key "H346" of thε new data record 132), and finally, in compliance with step iii.3.3 the new internal nodε 131 is linked to node 104 by mεans of link 134 assignεd with thε valuε 2 (being the digit at thε discerning offset (1) taken from the search key "12345" of the reference data record 103).

Although the PAIF described above with referεncε to Fig. 6A-6C may bε accommodated within one block it is nεvεrthεlεss prεfεrablε to sεparatε bεtween "nodεs" and "data rεcords" such that data rεcords are grouped in a distinct file or files. Applying this approach to thε PAIF of Fig. 6C-3, results - 46 -

in thε generation of thε data rεcord filε holding thε records 132, 103, 107. Links 133, 106 and 105 bεcomε, of course, long links.

Obviously, if an insert procedure results in finding that the data record to be insertεd already εxists in thε P.AIF an appropriatε εrror message is reτurnεd to the procedure that invoked thε Insεrt command.

It should bε notεd that in thε lattεr εxamplεs it is assumεd that thε εntirε P.AIF rεsidεs in a single block. Obviously when additional data rεcords arε inserted by following the foregoing "insεrt procedure" a block overflow may occur, which necessitatεs (as will bε εxplained in greatεr dεtail below) invoking "split block" procedure, and thereafter it is nεεdεd to advance to the sought block and perform the insert procedure in thε manner specified above. Having described a typical "Insert" transaction, a "Find (or Retriεvε) data rεcord" transaction will be now described. Thus, for finding a data record by a given sεarch kεy (hεrεinaftεr thε sought data record) in an existing P.AIF, the following steps should be εxεcuted: i. advance along a search path commencing from the root node and εnding at a data rεcord linked to a leaf node, and for each node in the sεarch path (hεr inaftεr "current node") perform thε following sub-stεps: i.l for εach link originated from the current node: compare the search-kεy-portion of thε sought data rεcord at thε offsεt defined by the current node value to a valuε assignεd to said link; in casε of a match advance along said link and return to step i.1 ; i.2. if nonε of the links originated from the current node matches the search-kεy-portion of thε sought data rεcord, return "NOT FOUND" and tεrminatε thε find procedure; i.3 if a data record is reached (hereinafter "rεfεrεncε data rεcord"), compare the sεarch kεy of the sought data - 47 -

rεcord as a whole, to the key of the reference data record; i.3.1 in casε, return "FOUND" (and in case of "Retriεvε", return also thε entire data record) and terminatε thε find procεdurε; or i.3.2 in the case of mismatch return "NOT FOUND" and terminatε the find procedure. For a better understanding the "find" procedure will be applied, twice, to the specific P.AIF of Fig. 6C-3 giving rise to "found" and "not found", results respectively.

Thus, consider a find data record by search key "12445" (herein after sought data record). According to step i.l the value of the digit "I" at the offset assigned to the root nodε (offsεt 0) of thε sought data rεcord is compared to the one assigned to link 102 (being the sole link originated from node 101). Since a match is found, control is shifted to node 131. Again according to step i.l the valuε of the digit ("2") at the offset assigned to node 131 (offset 1) of the sought data record is compared to the onε assignεd to link 134. Here also a match is found so control is shifted to node 104. Next, according to stεp i.l, thε value of the digit "4" at the offsεt assignεd to nodε 104 (offset 2) of the sought data record is compared for εach link originating from mode 104. The comparison results in a match for link 105 and accordingly control is shifted to data record 107.

According to stεp i.3 the search key of the sought data record and that of data record 107 are compared and since a match is found a "FOUND" result is returnεd (stεp i.3.1).

Turning now to a sεcond εxamplε, consider the case when the sought data record has a search key "12463". Thε procedure described with reference to the previous example is rεpεated, however at step i.3 the comparison betwεεn the sought data record and data record 107 results in a mismatch, and - 48 -

according to step i.3.2 a "NOT FOUND" result is returnεd.

A gεnεral "Dεlete Data Record" transaction will now be dεscribed. Thus, as a first stage a "Find data record" transaction is applied to the PAIF. In case of "NO FOUND", an appropriate error message is retumεd to thε procεdurε that invokεd thε "Dεlete" command. Altematively, the sought data rεcord is found. For clarity of explanation of the "Dεlεtε" procεdurε, thε following nomεnclaturεs arε introduced:

The leaf node that is linked to the sought data rεcord is rεfεrrεd to as thε "targεt node". The father of the target nodε is rεfεrrεd to as thε "predecessor target node". The link that connects the predεcessor target node to thε targεt nodε is refeirεd to as thε "prεdεcεssor link" and thε link that connects the target node to a child nodε thereof (or to a data record other than the sought data rεcord) is referred to as thε "targεt link". Bεarϊng this nomεnclaturε in mind, the following steps are exεcutεd: i. deletε the sought data record and the link that links the targεt nodε thεrεto; ii. if thε numbεr of links that remain in thε targεt nodε is largεr than or εqual to 2, then the dεlεtion procεdurε tεrminates; iii. if, on the other hand, thε numbεr of links that remain in thε targεt nodε is εxactly onε (i.ε. onε targεt link), then: iii.l "bypass" the target node by connecting the prεdεcessor link from the predecessor node to said child node (or to a data record); and iii.2 delete the target node and the target link; tεrmi- nating the delεtion procεdurε. It should bε notεd that the current stεp is morε of "prudent memory managemεnt" stεp in ordεr to rεlease thε space occupied by the target node and link, so as to enable allocation therεof to other nodes and links in the block. It should be further noted that said stεp (iii) is optional. - 49 -

For a bεtter understanding the foregoing "delεtε data rεcord" procεdurε will bε appliεd to thε specific P.AIF of Fig. 6C-3.

Thus, responsive to a command "delεtε rεcord having sεarch key = "H346", the latter record is searched in the PAIF according to the procedure described above. Having found the data record 132 and in compliance with step i above, the data record as well as the link 133 leading thereto arε both dεlεtεd. Sincε aftεr the latter delεting stεp, the target node 131 remains only with thε solε targεt link 134, stεp iii and iii.l apply, and accordingly thε predecessor link 102 bypasses targεt nodε 131 and is directly linked to the child node thεreof 104. Next, in compliance with step ii.2, target node 131 and the target link 134 are delεtεd therεby obtaining thε ?A1¥ shown in Fig. 6B-1. .Another Example is given with reference to the P.AIF of Fig. 6C-1. Thus, responsivε to a command "delete record having search key = "12546", the latter record is searched in the P.AIF according to the procedurε described above. Having found thε data rεcord 112 and in compliance with step i above, the data record as well as the link (111) leading thεrεto are both delεtεd. Sincε, as stipulatεd in stεp ii, thε numbεr of links that remain in the target node 104 is two (i.e. links 105 and 106), then the deletion procedure termi- natεs. Thε rεsulting P.AIF is again thε onε shown in Fig. 6B-1.

.Anothεr common primitive is the "Modify existing data record", e.g. change the home address of an existing client. The "Modify" primitive is normally realizεd by sεlectively utilizing the aforementionεd primitives. For executing a "Modify" command one should distinguish bεtwεεn thε following cases:

1. The "modify" applies to fields other than the search key (e.g. modify thε address of a cliεnt having Cliεnt_Id_No ="xxxxx") - in this case the modify procedure simply involves a "Find" operation (data record having Client_Id_No ="xxxxx"). Having found thε sought data rεcord, the old address is replaced by a new one.

2. The "modify" applies to a search key fiεld (e.g. change an account - 50 -

no. from "xxxxxx" to "yyyyyy"). This command is realized as a sequεncε of two othεr primitives, i.e. delete data record having "Account_No" ="xxxxxx" and thereafter insert data record having "Account_No" ="yyyyyy", or vice versa. Obviously a Modify transaction may consist of both cases.

In the previous examplεs each search key is represented as a series of bytes and accordingly the search procedure is performεd by partitioning thε sεarch-kεy into sεarch kεy portions εach consisting of at lεast onε bytε.

Thosε vεrsed in the art will readily appreciate that bytes are not the only possible reprεsεntation of a sεarch key. Thus, for example, a search key can be represented in binary form, i.e. a sεriεs of l's and 0's and accordingly thε sεarch procεdurε is performed by partitioning the search-kεy into sεarch key portions each consisting of one bit (i.e. 1=1) or more, e.g. one byte (i.ε. 1=8 bits) and others. In certain scenarios, it may well be the case that the / value is not identical for all the nodes in the P.AIF.

It should be further noted that differεnt links in a given PAIF may be assignεd with sεarch-kεy-portions of different length as long as the respεctivε sεarch-kεy-portion is .known thε corresponding node.

As is clearly evident from thε various PAIF of Figs. 6A-6C, thε data rεcords are hεld in a sorted foim according to search key. Navigating , for example, in the PAIF of Fig. 63-C (from right to left) brings about the ordered series "11346", "12345" and " 12445". This characteristics constitutes yεt anothεr advantagε which εasε data manipulation as compared to the tree of Fig. 5 whεrε thε data rεcords arε not sorted. As spεcified before, a node in the P.AIF is not necessarily classified uniquely. Thus, for examplε, in thε PAIF 120 of Fig. 6C-2, nodε 104 is at thε samε time a leaf nodε (linkεd, by mεans of a long link 105 to data rεcord 107) and an internal node (linked by means of a short link 106 to node 121).

Those versed in the art will readily understand that the "Insert", "Deletε" "Find" and "Modify" procedures described herεin are only one out - 51 -

of many possiblε variants for rεalizing thεsε procedures and they may be modified, all as required and appropriatε dεpending upon the particular implεmεntation.

The specified insert, delεtε and find transactions apply to a so called intra-block transaction. As will be explained in greater detail bεlow, applying thε lattεr transactions in intεr-block context necεssitatεs to addrεss fεw scenarios which are irrelevant in the intra-block operation.

Having explained the structure of the PAIF trie, there follows a description of various εmbodimεnts according to thε invention, wherε thεrε is shown a layerεd indεx basεd on a P.AIF indεx schεmε that includes a PAIF treε (as basic partitioned index).

Turning now to Fig. 7A-H, therε arε shown schematic illustrations of a layerεd index constructed in response to a succession of split block operations, according to one embodimεnt of the invention. Considεr for example a block 140 in Fig. 7 A (in the basic partitioned index) which overflows in terms of memory space. This being the case a "split block" procedurε is invokεd which results in a layεrεd indεx 142 of Fig. 7B consisting of root block 144 and a duplicated node A' (155) linked to leaf block 146 by means of direct link 145 and by means of long link 147 to a leaf block 148.

By this specific example, the split point was selεcted to be link 149 (fig. 7A) (herεinaftεr "split link") thεrεby shifting nodεs A,B,E D and F to nεw block 146 and nodεs C,G,I,J,K,L and H to a nεw block 148. Thε split link is prεfεrably sεlected in ordεr to accomplish an εssεntially even distribution of nodes and links between the new blocks (e.g. the size of the sub P Fs that resides in blocks 148 and 146 is essεntially thε samε). In thε casε that a fathεr block does not exist, a father block -144 (constituting I_x ) is created with a duplicated node A' (155) of the split node A (156). In the case that a duplicated node of split node from which the split link is originated does not already residε in thε fathεr block 144, the node is copied - 52 -

to the latter block (marked A') and the connection bεtween A' (155) node and the block in which A residεs (146) is implεmεntεd by mεans of said direct link 145. The split link 149 (being originally a short link between A and C ) is replaced by long link 147 betwεεn A' and thε block in which C resides. Optionally nodes A and C (156,153 respectively), may also bε linked by means of split link marked as dashed line 150.

The net effεct is that in Fig. 7B thεre is provided a layered index constituted by blocks 144, and the blocks of thε triε are 146 and 148. Those versεd in thε art will readily appreciate that it is now possible to access or update data rεcords not through thε triε (i.e. commencing from node A 156 ), but rather through thε layered index (i.e. commencing form node A' 155). In this connection it should be noted that link 147 has thε samε valuε as link 150, which in turn has thε value of original link 149 of Fig. 7A.

Considering now that block 148 overflows it undergoεs similar block split procεdurε rεsulting in layered index 151 in Fig. 7C. By this examplε thε split link is short link 152 of Fig. 7B and accordingly nodεs C and H reside in block 148A of Fig. 7C whereas nodes G,I,K,L and J r sidε in block 148B. The node from which the split link originatεs (node C -153 of Fig. 7B) is duplicated (yielding a duplicated node 153a of Fig. 7C) and placed in block 140 marked C. As before, direct link 154 connects the copied nodε C 153a to thε block 148A of thε original split nodε 153 whilst thε link 155 is a far link to thε split block 148B and thε valuε of the link is as the original value of link 152 betwεεn nodεs C and G bεforε (and after) the split.

In Fig. 7C, the layerεd indεx 151 is constituted by the trie that includes blocks 141, 148A and 148B forming and block 16 which forms a representative index over the common kεys of thε triε.

It should be noted that in Fig. 7C nodε A in block 141 and nodε C in block 148 A arε optionally disconnected and likεwisε nodε C of 148A and nodε G of 148B arε optionally disconnεctεd. As is clearly shown, nodes A ' and C are connected in block 140 to form a (connected) trie and it is - 53 -

accordingly possible to access blocks 141 through node A' and direct link 156; block 148A through node A', C and direct link 154; and block 148B through nodεs A', C and dirεct link 155. It is noteworthy that the value of the link bεtwεεn nodεs A' and C (in block 140) is identical to the original value betwεen nodes A and C (seε link 149 in Fig. 7A).

As is clearly seen in Fig 7C, the resulting layerεd indεx constitutes a balanced structure of blocks thereby keεping thε index depth to a minimum and consequεntly minimizing thε numbεr of accesses (normally, although not necessarily, I/O operations) that are requirεd in order to find, insert or delete a given data record. Considering now that in order to access data record the layerεd indεx maintains substantially logarithmic function that depends on the number of records, the layerεd indεx is morε εffrciεnt in tεrms of numbεr of 1 0 opεrations rεquired for access a given data rεcord as compared to the numbεr of I/O opεrations required to access a data record through the trie. Thus, for example, for accessing data record that is associated with node J through the layerεd indεx , it is required at first to accεss block 140 and thεrεaftεr block 148B and thereafter the sought data rεcord (i.e. threε I/O opεrations). In contrast, accessing the same data rεcord through the trie brings about 4 I/O accεssεs, namεly block 141, block 148 A block 148B and data record 159. As shown there are few particular instances that the trie is more efficient (e.g. accessing data record associated with node A), however, the larger the trie (i.e. constituted by more blocks) the more εfficiεnt is thε access through the indεx of thε layεred index.

By the particular embodimεnt of Fig. 7, thε rεprεsεntativε indεx and thε triε (bεing one embodiment of basic partitioned index) comply with substantially thε same index schεmε i.e. the P.AIF. By "substantially" thε samε schεme it is meant that thεrε arε somε diffεrεncεs as will εxplainεd with r fεrence to Fig. 9G bεlow.

The considerations in connection with duplicating nodεs to highεr layεrs I _j in the layered index are further illustrated with referεncε to - 54 -

additional εxamplεs dεpicted in Figs. 7D to 7H. Thus, Consider the layerεd index of Fig. 7D where block split is performεd in link 400. Thε rεsulting layεred index is illustrated in Fig. 7E, whεrε block 402 is crεatεd node 401 is copies to higher level block 402 (forming part of the layerεd indεx schεmε) and the original link betwεen nodes B and E is optionally retainεd (through dashed link 403). Through node B it is now possible to access the two blocks of the triε (405 and 406), by means of links 407 and 408, respectively.

Next, should it now bε rεquirεd to split block 405 at, say link 409, thε rεsulting structure appears now in block 402 of Fig. 7F, wherε nodεs A and I of block 405 arε duplicated to A' and I' (410 and 411) in block 402 . Node F is obviously a duplicated node of the split nodε I in block 405. However, node A is also copied considering that both nodes B (whose counterpart B' is a priori residing in block 402) and I (whose F is now duplicated to block 402) are descendεnt nodes of A. Node A being the lowest ancestor node of nodes B and I, and thus a (connected) trie is formed in block 402. The valuε associated with short link 414 (betwεεn blocks A' and B' in block 402) is of thε samε valuε as link 412 (bεtwεεn A and B in block 405). Thε valuε of thε link 415 (bεtwεεn nodes A' and F) in block 402 is of the same value as that of link 413 which originates from node A in the direction neεdεd to access node B. The internal structure of block 402 is such that it allows a search to thε represεntativεs of blocks 405, 406 and 407.

Thε direct links 416, 417 of nodes 422 and 411 arε optionally rεtainεd since it is possible to move along direct link 418 to block 405, seεing that node 410 is maintained in thε access path to both nodes 422 and 411.

Fig. 7G shows the resulting layerεd indεx after splitting block 407 of Fig. 7F (in link 420) and Fig. 7H shows thε rεsulting layεred indεx aftεr splitting block 402 (in the link between nodes I' and N'). The resulting layered index in Fig. 7H has, as shown three layers, the first consisting of block 430, the second consisting of blocks 402 and 408 and the trie consisting of blocks 405, 407, 426 and 406. - 55 -

Those versed in the art will readily appreciate that the manner of realizing split block is, of course not limited to the examples of Fig. 7D to 7H.

Having described an embodimεnt of constructing a layεrεd indεx by split processes resulting from the succession of insert transaction (with refεrεncε to Fig. 7), it will bε appreciated that the oppositε procedure, i.ε. "Deletε block" is activated when a data record is deleted leaving only one node in a block having no data records associated therewith.

Those versεd in thε art will rεadily undεrstand that thε layεrεd indεx described with refεrence to Fig. 7 is only one out of many possible variants for realizε thε layered index, where the representative index and thε basic partitioned index being substantially the same.

The utilization of a P.AIF in the mannεr spεcified constitutes an advantage over somε of thε hithεrto .known triεs in thε sεnse that the so accomplished layerεd indεx has a balanced structure of blocks despite the fact that the triε per se may possibly bε unbalanced.

Attention is now directed to Figs. 8A-BB showing respεctivε two illustrations εxεmplifying the application of the technique of thε invεntion to a according to another embodimεnt of thε invention.

Thus, Fig. 8A illustrates a given trie structure having vertical orientation (i.ε. constituting a vertical treε) which, as shown, is unbalanced i.e. three blocks depth (260, 261 and 262) vs. two blocks depth (260 and 264). The description below does not aim at explaining the search scheme of the specified vertical treε but εmphasizεs only thosε aspects which are requirεd to obtain balanced layered index. It should nevεrthεlεss bε notεd that thε nodεs in trie structure 260, signify offsets in a half byte size. (The nodes valuεs arε presented in hexadεcimal represεntation) of thε data rεcords (a-k) that arε shown in Fig. 8A.

It should bε notεd that an εxtra I/O opεration, i.e. accessing threε blocks - (or tlirεε I/O operations) in order to access data record k as compared to one block (or one I/O operation) to access data record b as depictεd in Fig. - 56 -

8A, may bε rεgarded as balanced. In some real-lifε scenarios this does not necessarily requirε applying the technique of the invention in order to bring about εxactly thε samε number of I/O operations. Of course, further insertions of data records may genεratε highεr "unbalance" degrεε, which, if not handled by the technique of the invεntion, will give rise to degradεd performance (due to the unbalanced structure) as discussed in detail above (with referεnce to prior art techniques).

Fig. 8B illustrates one possible embodiment of the invention. As shown, a reprεsεntativε indεx that consists of onε block 270 (forming I_/) is constructed with the result that horizontal balanced tree is obtained having a root block 270 from which all the blocks of thε lowεr lεvεl vεrtical trεε (thε lattεr constitutes the unbalanced triε) arε accessed through one I/O operation.

As shown, the actual access to thε blocks in the first vertical tree (being the trie) arε achiεvεd by mεans of thε common kεy valuε of εach block. Bεforε proceeding any further the term common key will bε exemplified with reference to Fig. 8.

The common key of block 260 (in hεxadεcimal rεprεsεntation of half bytε units) is 0x4, Oxl and 0x3, whεre 0x4 stands for the most signficant bits of the bytε of the character A and Oxl stands for the least significant bits of the Character A, and Ox 3 stands for the most significant bits of the characters which reside in offset 2 of the data records.

It should be noted that all data records that can be accessεd through block 266 share the common key prefix specified above. - 57 -

In the same manner, the following table summarizes the common key of each block:

BLOCK COMMON KEY

NO.

260 0x4, 0x1, 0x3

261 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3

269 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3,0x3, 0x3, 0x3, 0x3, 0x3

264 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x4,0x3

It should bε notεd that block 261 can accommodatε a root nodε with valuε 8, thus, the common key, hereafter k of the block, is changed to be 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, i.e. it consists of 8 units. In this case, the represεntative of block 261 in 11 should be changed accordingly. In a a differεnt implementation, the representative of 261 is k, even if the root nodε with thε value 8 does not exist.

The indεx ovεr the common keys is accomplished in the represεntativε indεx (consisting of block 270) such that it constructs a trie that addressεs thε common kεys of thε first vertical treε. Now, for εxample, in order to find data record g, one follows node 290, link 291 to node 292. Then, one advances with the dirεct link 293 to block 261, which is associated with data record g. Thε rεsulting layεrεd index is balanced.

As spεcifiεd above, for the specific casε of triε, thε rεprεsεntativε kεy of a block bεing a common kεy. Generally speaking, the common kεy of a block is thε longest prefix of all keys of thε data rεcords that can bε accεssεd from thε block by thε relevant index scheme. For the P.AIF, thε specified prefix size (calculated in 1-bit-long units) εquals thε valuε of thε root nodε in the block (which as recalled holds offset value). If the prefix sizε is εxprεssed as number of bits, then the prefix size is calculated as the offset value multiplied by the 1-bit-long value. - 58 -

Thεrε follows now a dεscription of yεt anothεr εmbodiment of constructing a layered index of the invention with reference to Figs. 9A-9G.

Accordingly, attention is now directed to Figs. 9A-9G showing a succession of modify (insert) transaction on a PAIF treε (constituting a triε that is suscεptiblε to an unbalanced structure) and the so obtained layerεd indεx. For convεniεnce of presentation, the data records are shown as foiming part of the trie. As spεcifiεd abovε, the actual manner in which the data records are associated to thε trie may vary depεnding upon the particular application.

In the following figures, a layεrεd indεx is constructed by inserting successivεly thε following unsortεd data rεcords A-F (which for convenience of presεntation form part of thε blocks): Thε data string is prεsεntεd as a sεriεs of bits whεre the 1-bit portion stands for 1 : A=001000011 B=110011100 C=011011111 D=011011011 E=101010101 F=l 11111111

In thε first step (Fig. 9A), record A is inserted whereafter Block 300, includes node 301 having offsεt 0, being associated to first record A through link 302, having the value 0. At this stage, thε trεε consists of Block 100 having only onε nodε. Thε index schemε dictates that the sεarch path to data rεcord A is dεtεrmined according to value 0 at offsεt 0 as depicted on link 302 and node 301, respectively.

Thereafter (Fig. 9B), data rεcord B is inserted, in which, as can be clearly sεεn and distinguished from data record A, in offset zero, thε kεy valuε is 1 and, accordingly, link 302 lεads to data rεcord B and assigned with the valuε 1.

Thεrεaftεr (Fig. 9C), data record C is inserted, and the valuε thεrεof in - 59 -

offsεt 1, sεrvεs for distinguishing it from rεcord A. Links 303 and 304 connect node 305 (standing for offset 1) to the specified data records C and A rεspεctivεly. Sincε Block 300 accommodates nodes 301 and 305, it is not required, as yet, to split the block.

Next, data record D is insertεd, and the structure of the block following the insert operation is shown in Fig. 9D. Since, howεvεr, thε data block cannot accommodate more than two nodes (overflow occurs), it is now required to split Block 300. Fig. 9E illustrates the treε structure after splitting. Thus, link 306 is the split link with the motivation that approximately the contents of a half block will bε rεtainεd in Block 300, and thε contents of the remaining half block will bε movεd to another block 310. Of course, other links could bε likewise selεctεd to bε the split link.

As a first stage, block 300 in I. is replaced with two blocks 300 and

310. The nodes 0,1 (designatεd as 311 and 313, rεspεctivεly) and thε data rεcords A and B arε rεtainεd in thε splitting block 300, whereas node 6, data records D and C, (standing in this particular embodiment for the remaining nodεs), arε movεd to block 310. Accordingly, thε basic partitionεd index of Fig. 9E consists now of two blocks 300 and 310 (which in fact constitute the unbalanced trie).

Thereafter, since the block of Bi does not exist, it is creatεd, and, accordingly, block 312 is provided. The split node (313) is copied to the block (312) to thereby constitute a duplicated nodε (314). Nεxt, thε duplicated node (314) is connected by means of direct link 316 to block 300, and the duplicated node 314 is linked by means of a far link 318, to the block 310. This far link replaces thε original split link 306 that is markεd in Fig. 9E in a dashεd linε. The value of the far link 318 is the same as the value of the split link. Thus, the reprεsentative index (constituted by block 312), allows to search according to thε common kεys of thε basic partitionεd indεx.

It should be noted that thεrε are no constraints as to whethεr the split - 60 -

link should be deleted or retained. As shown, the so obtained horizontal tree that constitutes thε layered index (consisting herε on blocks 312, 300 and 310, of which 312 belongs to the reprεsεntativε indεx) is balanced.

Next, data record E is insertεd. In this casε advancing in the horizontal treε (being onε foim of the layerεd indεx) from thε first nodε 314 of block 312 (having a value 1) is not possible by means of the far link 318 since it reprεsεnts direction 1 from nodε 314 (having a 1) valuε, and a link in direction 0 is required. Therεforε advancing by means of the direct link 316 to block 300. Thus, the block that needs to be associated with the new data record is found. In the same way data record F is insεrtεd rεsulting in a trεε structure shown in Fig. 9F.

Next, if a split between node 320 and node 321 of block 300 is perfor εd, nodε 320 is copiεd to block 312 (dεsignatεd 323 in Fig. 9G) and since it can not be linked to node 314 of block 312 (since it will not retain the correct inta-block links of thε nodes) - node 311 of block 300 is also copied to block 312 (designated 322 in Fig 9G) in order to crεatε a (connεctεd) triε that εnablεs to sεarch by thε sεarch schεmε to blocks 300, 326, 310 according to the common keys of the blocks.

It should also be noted that instεad of having dirεct links from all copied nodes 314,322,323 of block 312 in Fig 9G, it would be sufficient to have one such direct link from the copied node (322) to block 300. A far link 324 from node 323 is set to block 126 in the direction of the link before the split (the direction of link 315 of Fig 9F). Obviously, if another split is performεd in block 326, it would be reprεsεntεd in block 312 by a nodε connεctεd from nodε 323 by link in direction 1 having a direct link to the B;-!, and a far link to block Bj-r-

Figs. 9A-G and 8A-B illustrate two of many possiblε mannεrs of rεalizing thε split block mechanism that maintains the balance structure of thε invεntion by constructing a layεrεd indεx. The flexibility in adopting another non-limiting variant is shown e.g. in fig. 8B where the near link 271 and - 61 -

direct link 272 are rεprεsεntεd by far link 273 (marked in dashεd linε) with direction as of link 271 rεndεring thus nodε 276 redundant.

Insofar as many embodiments are concemεd, thε balance technique of the invention confers to the so obtained balanced horizontal oriented digital treε (bεing one form of the layerεd index structure) a so called "probabilistic access " characteristics. This means that a sεarch in connection with an input data record (e.g. search for a data record A), may lead to a different data record or to a node where there is no link to the direction prescribed by the index scheme and may requirε to apply "correction" in order to evεntually access the sought data record.

For a better understanding of the foregoing consider, for examplε, Fig. 9E. Consider for example that a search transaction is applied to the layered index of Fig. 9E with the sought data record L= 111011110 . Thε sεarch path will follow nodε 314 and link 318 (offsεt 1 value 1, respεctivεly) and thεn at offsεt '6' (root nodε of block 310) through link 319 (valuε ' 1 ') to data rεcord C. Thε lattεr example exεmplifiεs the probabilistic search characteristics of the so obtained layerεd index.

In order to resolve the specified failure, the size of thε common prefix of the kεy of the sought data record and thε kεy of the data record is calculated. The common kεy of thε block (310) is the prefix portion of thε kεy of thε actual data record C. Thus, the size of the common prefix is zero. Next, climb up the treε to the nodε in the access path that has a value equal to or less than the common prefix size that has a dirεct link . If thε lattεr requirement is not met, i.e. all the nodεs have a value greater than the calculated prefix size, then from thε first node in the access path that has a direct link (which should point to the first block of the indεx /,.;). Now, from thε nodε 311 move by means of direct link 316 to thε lowεr level vertical orientεd trεε (i.ε. to layεr /,_/) and thεrεfrom continue the search path as prescribed by the index schεmε. - 62 -

According to another scenario, should the indεx schεmε prescribes to go in a given direction and there is no link in the desirεd direction, the search path follows the direct link from a node with the largεst valuε on the search path (that maintains a direct link). When advancing from block to block, a comparison to the common kεy (if availablε) or to data rεcords associated with nodes (if available) can lead to a decision as to whεthεr or not to advance by the index schemε or to return to a node with a direct link. It should bε notεd that thε common kεy is not nεcεssarily physically attached to the data records.

Revεrting to the previous example (sought data rεcord L) and associated data record C of fig. 9E, if the common key of block 310 (being 011011) is maintained in the block it is not needed to access data record C. Thus, since the common prefix of the key of L and the common key of the block is 0, one can return to node 314 and link 316 without accessing record C. Avoiding the neεd to access the data record in the manner specified has, of course, the advantage of improving performance. The criterion to .know that the sought data record does not reside in the treε is that thε sizε of thε common kεy prεfix of thε sought data record and the common key of the block is greater than the valuε of the split node.

In the latter examplε, thε value of the split nodε is 1 (of nodε 313), thus block 310 is not thε block that accommodates record L (if such record exists). Therefore, the sεarch for record L is continued from nodε 314 and link 316. This procεdurε appliεs to all modify transactions.

Insofar as insert transaction is concemεd, block 300 is found in thε mannεr spεcifiεd abovε and is associated with the new data record L.

The latter example refεrrεd to a spεcific εxamplε of layεrεd indεx. Those versεd in the art will readily appreciate that the lattεr probabilistic access characteristics applies mutatis mutandis to other types of layered index that utilize a basic partitioned indεx.

The probabilistic search characteristics which leads to "errors" stems - 63 -

from the fact that not nεcessarily thε complεtε common kεy of a block in layer I_h__x is .known from the values of the node that reside on the sεarch path up to thε block in I_h__x . Thus, it is nεcεssary to know the common key of the block in I_h__x in order to verify if the sεarch path to thε spεcifiεd block matches the sεarch path according to thε kεy of thε sought data rεcord. If the common key is not maintainεd in thε block, it might bε nεεdεd to advance in the index to a data record in order to know the common kεy valuε.

The inherent error pronε characteristics of the layered index and the manner of handling it has been exεmplifiεd with refεrεncε to Fig. 9 abovε, and may be described more genεrally as follows: to sεarch a rεcord by kεy k , thε lattεr is sεarched in I_h (and in some cases in I_h__x to /, or to data record(s)) in order to find the block B of I_h_ leading to k . This process is repeated until reaching the block of I- that is associated with the data record with key k (if one exists).

The description in Figs. 7 to 9 exemplified a layerεd index utilizing a P.AIF based indexing scheme as the basic partitioned index and thε rεprεsεntativε indεx . Thosε vεrsεd in thε art will readily appreciate that the layered index of the invention is not bound only to PIAF. Thus, for examplε, U.S. 5,495,609 illustratεs a diffεrεnt triε. Considεr, for example, the trie of Fig. 10A in accordance with the spεcifiεd '609 patent, and assuming that the triε consists of a block that accommodates nodes 11, 12, 13 and 14. Should it now be requirεd to split the block subsequεnt to the insertion of new nodes to the treε, a possiblε approach of splitting thε block in accordance with prior art techniques, would be, for examplε, to brεak thε link bεtwεεn nodε 12 and 14, to thεrεby obtain two blocks, onε accommodating nodεs 11,12 and 13, whεrεas the other accommodating node 14 (hereinafter nεw block). Assuming that thε first block rεsidεs in thε intεrnal memory, if it is now required to reach record 26, only one I/O - 64 -

operation is requirεd. If, on the other hand, record 20 is of interest, a first I/O operation is requirεd, in ordεr to access the new block (i.e. the onε accommodating nodε 14), and thεrεfrom anothεr (i.ε. sεcond) I/O opεration is rεquirεd, in order to access record 20. It is accordingly appreciated that the split block gavε rise to an unbalanced tree. Subsequεnt insεrt transactions may adversely affect the unbalanced characteristic of the trεε, i.ε. nεcεssitatε multiplε I/O accεssεs which is obviously undεsirεd.

Applying thε tεchniquε of thε invεntion will copε with the shortcomings of an unbalanced trεe, and the rεsulting layεred indεx is illustratεd in Fig. 10B, whεrε thε rεprεsentative indεx is constituted by block 159A over thε rεprεsentative keys of the trie (constituted by blocks 159b and 159c). Here also, the link betwεεn nodε 12 and 14 is considered a split link, and the new node, 159D (being replication of node 12) is copiεd into a nεw block designated as 159A. Now, in order to access record 20 and record 26, the same number of I/O operations is rεquirεd, and in this particular case, 2. As the size of the trie grows the more efficient is the access using the layered index.

The layerεd indεx of Fig. 10B brings about, thus, a balanced treε of blocks, assuring that essentially the same number of I/O operations is requirεd to reach εach and εvεry data rεcord in the tree. Those vεrsεd in the art will readily appreciate that prefεrably thε numbεr of I/O opεrations is a logarithmic function dεpεnding upon thε numbεr of data rεcords and the number of links originated from a block. Thus, for examplε, if 1000 far links originate from a block, a layerεd index with 3 levels allows access to 1,000,000,000 data records.

For a better undεrstanding of thε foregoing, therε follows numerical example. Assuming that every block has 1000 far links. Assuming that the size of εach far link is 4 bytεs it rεadily arisεs that the size nεεdεd for rεprεsεnting the far links is 4000 bytes. Assuming further that thε nodεs and the near links within a block occupy another 4000 bytεs, thε rεsulting block - 65 -

size is less than 10,000 bytes. For sake of discussion assuming that each block size is 20,000 bytes.

Considering now a layerεd index that consists of one block (e.g. block 144 in Fig. 7B) as indεx layεr I_x and assuming that it is linkεd to a thousand blocks in thε layer I. (of which only two blocks 146 and 148 are shown in

Fig. 7B), the layerεd indεx amounts for a total of 1001 blocks εach having a sizε of 20,000 bytεs. Accordingly, the total space that should be allocated for holding the blocks of the layεrεd indεx is about 20 mεga bytεs. This order of size can bε εasily accommodated in the intεmal mεmory of say, for εxample, a personal computer. Assuming now that each block in I. is associated with one thousand data records, the net εffεct is that by utilizing a layεrεd indεx of thε invεntion (according to thε lattεr εmbodimεnt) which is wholly accommodated in the internal mεmory, a million data records can be accεssεd without I/O indεx.

By thε samε tokεn accessing billions of records may required practically one more index layer which may require an additional one I/O opεration.

For a better undεrstanding of thε foregoing consider for example thε implεmεntation of thε layered index in Figs. 6B-1 or 6B-3 (P.AIF index scheme). Had the kεys of data rεcords 103 and 107 bεεn longεr in sizε (for εxamplε 100 bytε long), this would havε not changed the size of the P.AIF. .Another non limiting example can be shown in Fig. 8B - the size and thε structure of the layered index would not be changed if the size of the key of data rεcords a-k addressed by the indεx would bε 200 bytεs long. As can bε seen, it is also possible to navigate in the index and to retriεve the data a-k according to thε ordεr of thε kεy. This εxεmplifiεs one farm of sequential opεration.

As shown, thε rεsulting layered index of fig. 10B includes two trees having vertical orientation i.e. the first tree structure consisting of blocks - 66 -

159B and 159C (bεing onε form of the basic partitioned index I. ) and second tree having one block 159A (being one form of the basic partitioned index I_x ).

The so accomplished horizontal treε of blocks (being one form of the layerεd index) is balanced, i.e. root block 159A which, through one I/O εnablεs to access all the links to the data records. Further insertions of data records which will lead to additional splits in thε blocks of I- , will require, of course, updating thε layεr indεx /, . When the number of nodes in block 159A of /, excεεds a givεn number, block 159A is split according to the split mechanism.

The trie index with which the technique of the invention is of concern, is not confined to the search trεε disclosed in the '609 patent, and it may encompass other types of treεs as εxplained above.

It should be noted that the intra-block structure is not necessarily balanced , i.e. nodes inside block are not necessarily arranged in a balanced sfructure. Whilst this fact is sεεmingly a drawback, those versεd in the art will readily appreciate that its implications on the overall database performance are virtually insignificant. This stems from the fact that intra-block search schemε is normally pεrfoπnεd in thε fast internal memory of the computer system. As opposed to the intra-block search schemε, thε arrangement of a block within a layered index is retained in a balanced structure thereby the number of blocks in a search path is a logarithmic function depεnding on the number of data records and reflεcts therefore the number of I/O accessεs to thε εxtεrnal mεmory (an opεration which is inherently slow) in order to load a desired block to the internal memory.

In this connection those versεd in thε art will rεadily appreciated that thε prεsεnt invεntion is by no mεans bound to a givεn physical realization. Thus, for εxamplε, insofar as sεarch scheme is of concern whilst the intra-block retains the search schεmε aftεr applying thε tεchnique of the - 67 -

invεntion this appliεs to the logical concept of e.g. advancing in the layered index according to offsets and values of offset. The latter genεral concept may be realized in many manners all of which are encompasses by the technique of the invention. Thus, for εxample, the offset size (in tεrms of numbεrs of bits) that is accommodated within each node may be altεrεd, thε mannεr of realizing empty pointers (i.e. pointers that point to null - having no children) and others. The latter physical realization flexibility applies also to thε intεr-block portion.

Thε layered index described with refεrεncε to Figs. 7 to 10 all, rεtain essentially the same index schemε for both thε triε and the reprεsεntativε indεx scheme , (except for the error handling which may be encountered when accessing data records through the index, as εxplained in detail with refεrence to Fig 10G abovε).

Thε rεtention of the index scheme for both the trie and the reprεsεntativε indεx is not obligatory as will bε εxεmplifiεd with rεfεrεncε to Fig. 11.

Fig. 11 illustratεs another approach of balancing an unbalanced treε of Fig. 8A (i.ε. constructing a layered index) using a conventional B treε as a represεntativε indεx ovεr thε rεprεsentative keys of the unbalanced trie. The so obtained horizontal orientεd balanced treε (layεrεd indεx) includεs blocks 272 at the upper level (index layer I. ), 270 and 271 at a lower levεl (indεx layεr I_x ) and the original blocks of the unbalanced vertical orientεd tree of Fig. 8 A at the lowest (blocks 260,261,262,264) - index layεr I. . Fig. 4 dεmonstrates thus that the indεx scheme of the reprεsεntativε indεx is not nεcessarily the same as that of the original unbalanced triε. If dεsirεd, the B-treε in its εntirεty (forming a rεprεsεntativε index) may be regarded as an indεx layεr I .

Thε databasε filε management system of the invention not only copes with the drawbacks of thε conventional trie indεxing filε but also offers - 68 -

othεr bεnεfits which facilitate and improve data access by user application programs.

Thus, the fact that a balanced structure of blocks is retainεd assurεs that, on thε avεragε, thε numbεr of slow I/O opεrations is rεtainεd essentially optimal, i.e. a more efficient result is obtained, particularly when large files consisting of multitude of blocks arε concεmεd.

Thosε vεrsed in the art will readily apprεciatε that whilst preferably thε construction of layεred index apply to slow I/O operations, e.g. for minimizing the number of accessεs to slow external storage medium, the invention is by no mεans bound to thε spεcifiεd storagε mεdium. Thus, for εxamplε the storage mεdium with which thε prεsεnt invεntion is applicable may also be an internal memory. This is of particular relεvance considering the evεr increasing volumes of internal mεmoriεs which although being faster than extεmal mεmory, may also rεquirεd efficient access control which is realized according to the invεntion.

Thεrε follows a dεscription of the second aspect of the invention.

For convenience of explanation, the second aspect of thε invεntion will be described with referεncε to thε P.AIF indεx (constituting a dεsignatεd index). The invention is by no means bound by this specific εxample.

As stated before, thε databasε file managemεnt system of the invention enables to address diffεrεnt typεs of data rεcords using a singlε indεx.

In ordεr to better distinguish bεtwεεn data records of differεnt typεs that arε addrεssεd by the same P.AIF index, each data record belonging to a given typε is associated with a given designator. The latter forms part of the key of the data rεcord constituting a dεsignator kεy. The designator is unique for evεry typε of data. Thus, for example, the key of data records that bεlong to thε εntity "Borrowεr" is prefixed with thε dεsignator 'A', whεrεas all thε kεys of data rεcords that bεlong to thε entity "Book" are prefixεd with thε designator 'B'. The new key of the data records that bεlong to Borrowεr - 69 -

becomes a designated key that consists now of the concatenation of 'A' and the original key of Borrower, and by the same token, the new designatεd kεy of the data records that belong to Book consists now of the concatenation of 'B' and the original key of Book.

Having discussed the so called "designator" fεaturε of the second aspect of the invention, there follows a description of thε so called meta data.

According to an aspect of the invention, a data dictionary maintains meta-data information, which provides information on the data records as a function of the type of thε rεcords. Thus, in addition to thε data records it is needεd to maintain a dεsignator, to bε ablε to idεntify thε dεsignator and by using thε meta-data information, to bε ablε to identify or construct the designated key as wεll as other information such as the rεcord sizε. The search schemε of the index is oblivious to the meta-data. It locates thε rεcord from thε dεsignator (or composite) key without using the meta-data. The meta-data is required to construct the (composite) designator key and, oncε the record is retrieved, to determine the propertiεs of thε rεcord. Thus, for εxample, having retrieved the data record of book the designator -B- is identified, and information on the record designated B is available from the meta-data. For example the size of the book record, its fields and the fields that arε thε kεy fields.

The use of dεsignatεd data rεcords is not bound to only onε typε, but rather (prefεrably) morε than onε typε may bε trεatεd by the designated indεx and as will be explained bεlow with subordination rεlationship.

Thus, whilst according to hitherto known solutions, data of different types are typically held in several files (and is addrεssεd by sεvεral indεx files), according to a database managemεnt systεm utilizing a dεsignatεd index of the invention, data records of different types may bε addrεssed from the same index. It should be noted that the keys of data records that belong to different types (and are addressεd by the same designatεd indεx) do - 70 -

not nεcεssarily havε thε same length. Thus, for example, consider a layerεd index which is also a designated index based on a trie as its basic partitioned layered index of the kind dεpicted in Fig. 8A. Thε sizε of thε kεy of thε rεcords that bεlong to thε "Borrowεr" εntity is 6 bytεs long, whereas the size of the key of the records that bεlong to thε "Book" εntity is 5 bytes long. Inserting books to the designated index of fig. 8A with the designator keys Bl l l l l and B22222 rεsult in thε data structure of fig. 12 that includes a designated index that address 2 types of data rεcords - data rεcords a-k which are assigned with the designator A and data records w-x which are assigned with thε dεsignator B. In thε description below, the terms record of type X or rεcord designated X are used to describe a record having a designatεd kεy and thε designator is X.

Whilst thε latter example illustrated onε manner of realizing designated data (i.e. pre-pεnding as prefix a character, string or any number of bits) to the key of thε data rεcord, those versεd in thε art will readily appreciate that this is only one out of many possible variants. In fact, the proposed designator may be realizεd in any known manner provided that the designator distinguishes betwεεn diffεrεnt data rεcords, treated as part of the key, and therεfore forms part of the search.

The latter statement applies, regardless of whethεr thε designator: (i) forms part of the data record (or key portion), (ii) being stores elsewherε (ε.g. in a different data structure), or (iii) it may bε defined elsewhεrε, or εvεn dεfinεd othεrwisε. .An εxamplε of thε lattεr is a trie structure that is associated with data records all of the samε type (for examplε, all arε dεsignatεd with a character A ). Obviously, by this examplε, it is not rεquirεd to physically attach thε dεsignator to thε instances of the data records, seεing that thε dεsignator is common to all records. Howevεr if data record is accessed it is needed to idεntify the designator and add it to the key. Another possible solution is to prεfix thε dεsignator to the data record such that when the data record is accessεd the designator is availablε. - 71 -

For examplε, consider Fig. 12, data record d is accessεd from node 266 by link 270. The first character of data record d is A - the designator.

For a bεttεr undεrstanding of thε subordination relationship, attention is directed to Fig. 13A-13E. Fig. 13A illustrates a designated index 800 (in the form of PAIF) with four data records 802, 804, 806 and 808 (of which only the designator keys are shown) associated thereto. The data records are all of the samε type as readily arises from the designator 'A' that is prepεndεd to εach of the data records.

Turning now to Fig. 13B, therε is shown thε PAIF 800 with new data record (812) with a composite key A12355B940201333333 (the designator of rεcord 81 is B). Thε new data record is subordinated to data rεcord 806 whosε kεy is A 12355. According to the PAIF index, node 814 indicated that the discerning offset is 6 and that the value B links to data record 812 (having the value B at offset 6). Seεing that rεcord 806 has no valuε at offsεt 6, it is assignεd with virtual value (say null) at this offset in order to determine the disceming offset vis-a-vis the other record and accordingly , then link 818 is set with direction marked null.

Fig. 13C illustrates the PAIF 800 in which another data record 820 is inserted. Data record 820 which represents another instance of B type data record that is subordinated to A typε data rεcord (806) is inserted to thε PAIF. Thε disceming offset is 11 (the value of the new node 822) and the link values therεof are '0' and ' 1 ' to data records 812 and 820, respectively.

Fig. 13D illustrates the PAIF 800, where a differεnt typεs of records are subordinated to record 806. Data record of typε 'D' (824) bεing subordinatεd to data record of type 'A' is linked from node 814 by link 823 having the value D. As recalled, the PAIF already represents data record dεsignatεd B whεrε thε lattεr is subordinatεd to thε data record designated A. An example of the 'B' type subordinated to 'A' typε is items ('B') storεd by supplier ('A') and ('D') type subordinated to ('A') is clients ('D) served by the supplier ('A'). - 72 -

Turning now to Fig. 13E, there is shown another embodiment of the P.AIF of Fig. 13D implemented slightly differently. In particular, the subordinated data records 812, 820 and 824 are reprεsεntεd and maintained in the data file without their key prefix that is the designator kεy of the record 806 (i.e. the prefixed key A12355 is omitted). When accessing, for examplε, data record 812 the infoimation availablε from the meta-data according to the designator B allows to εxtract the following information: (i) identify that part of the key is missing,

(ii) that record 812 is subordinated to a record designatεd A that can be accessed from node with valuε 6 (814) and by a link with valuε null (818).

Thus it is possiblε to access data record 806 and construct the completε kεy of record 812. If the PAIF 800 is a layerεd index, it might bε that nodεs 814 and 822 rεsidε in diffεrεnt blocks and thε access path to the block associated with record 812 does not include node 814. In that case, a link from the subordinated records (links 826, 828 and 830) to record 806 allows to access data record 806 and construct the key. Thε implemεntation described above obviate the nεcessity to duplicate the reprεsεntation of thε dεsignatεd kεy of data rεcord 806 in respect of each subordinated data record (by the particular εxamplε of Fig. 13D, thε spεcifiεd prεfix A12355 is duplicated threε timεs for rεcords 812, 820 and 824). Replacing the key prefix with a link can save space (if the size of thε prεfixεd is largεr than the representation of the link) and allows to access the record that the subordination relates to without necessitating a separatε sεarch.

Fig. 13D, 13E illustrate that the subornation relationship characteristics of the invention is not limited to any spεcific realization.

The subordination relationship of the invention enablεs, thus, to rεndεr more efficient the low lεvεl implementation of data as compared to hitherto .known techniques in the sensε that onε indεx can bε associated with various data types and subordination relationships as compared to separatε indεx filεs according to thε prior art. This notwithstanding, there - 73 -

may of course be applications according to the invention, where more the one index filε is utilized.

Obviously, each of the subordinated records 812, 820, 824 can havε rεcords subordinated to it.

Moreovεr, thεre are somε othεr advantagεs that arε brought about using thε proposed technique of the invεntion, ε.g. maintaining data intεgrity. Consider, for examplε, an insert transaction that is applied to the PAIF 800 of Fig. 13E, of data record designated B with a composite kεy A12355B930101123456 subordinatεd to data rεcord 806 (having designated key A12355). Thε sεarch leads to node 822. The value at key offset 11 of the insertεd data rεcord is 0 thus rεcord 812 is accessed. The search key of record 812 needs to be constructed (by accessing record 806 via link 826) and the insertion of thε nεw data record can be complεtεd. It should be noted that thε link to rεcord 806 obviates the neεd to conduct a separate search for record 806 by it's key in order to confirm it's existεncε. Thus thε maintenance of data integrity is more εfficiεnt.

Pεrforming thε samε data intεgrity check using the spεcified B-treε indεx implies considerablε ovεrhεad sincε it is rεquired two phase operation. At first, a search is applied to the index of data records of type 'A' in order to find data record whose key is 12355. Only upon finding it record of type B can be insertεd (and a sεparatε index file is normally updated).

When sεarching data, thε data structure of fig 20E exemplifies other advantages rεsulting from thε fact that subordinatεd data rεcords arε linkεd to thεir "parent" rεcord. For example, if record from type A is a customer and record from type B is an invoice, it is usually needed to access the invoice details with the customer details. The link from the invoice to the customer obviatεs a separate search for the customer details. - 74 -

Thε so obtainεd dεsignatεd indεx of the invention brings about another important advantage in that navigation in the index for accomplishing sεquεntial opεrations.

Consider, for examplε, the P.AIF of Fig. 13E, where it is required to "retriεve" all data records in an ascending order. Thus, it is possible to navigate in the PAIF (known also as sequεntial operation) and data records 802, 804 806, 812,820,824, and 808 arε rεtriεvεd according to thε order of the designator key. If_only records of certain type are neεded, for examplε thε rεcords of typε A, onε would navigatε in thε indεx in thε samε mannεr whilst avoiding thε accεss of nodεs and rεcords that arε not relevant. Accordingly, from nodε 814 data rεcord 806 is accεssεd and it can bε predicted that the data records that can be accessεd from node 814 by its links and descendent nodes are subordinated to record 806, therεby avoiding links 833, 823. In this εxamplε only rεcords 802,804,806 and 808 arε retrievεd. In thε samε mannεr, onε would avoid to movε along link 823 if only records of type A and B are needεd since it can be predicted that a link with a value D from a node with a value 6 addressing record 806 is a link to subordinated data record designated D.

If the PAIF index is a layered index and assuming that nodes 814 residε in a different block than of node 822, the movε from node 814 to node 812 can be by the split link. If the split link does not exist, for examplε in fig. 7F onε nεeds to use the link 421 of node B' (422) when it is needed to advance by link 400 from node B (423) to node E (424).

Having exεmplifiεd thε subordination rεlationship with rεfεrεncε to the specific embodiment of Fig. 13, there follows a description that pertains to the multi-dimensional characteristic according to the second aspect of the invention.

Turning now to Fig. 14, thεrε is shown a schematic illustration of a designatεd indεx according to onε embodiment of thε invention. The indεx contains two sεarch paths to onε dεsignated data record ("DEPOSIT" data - 75 -

rεcord) such that the deposit can be accessεd by εach of the two composite keys - a designatεd kεy that includes the key fields account number, date and client number and a second designated key that includes thε kεy fiεlds cliεnt numbεr, datε and account number. Reverting to the above examplε, thε account data record has a dεsignatεd kεy 'A 133333' (1201), Updating a dεposit for the account (deposit subordinatεd to account) can bε implεmεnted by means of designated record 203 subordinated to designated record 201. The P.AIF would allow to access records 201,203 from node 207 by link 206. By the same token, data record 204 rεprεsεnts a deposit of a client. The key of record 202 is B133333. Updating a deposit 204 to a client 202 can bε implεmεntεd by thε index 200 and node 209 linked (208) to data record 204. The kεy of data rεcord 203 is. 'A133333C01019811346' (jfc, ). The key of record 204 is Bl 1346D010198133333 (k. )

As shown thε fiεlds of Cliεnt and Account are duplicated in records 203, 204 (as well as additional information such as the date and the sum) which is an obvious drawback which results in an unduε inflated file.

This drawback may be overcome by reprεsεnting a single DEPOSIT record as a multidimεnsion rεcord 210.

Data rεcord 210 (Fig. 14) is a multi-dimension record that is updated and accessed by the designatεd indεx 200 according to the designator key k_x (designator C) and according to the designator key k₂ (designator D). (note that when data record is a multi-dimension record, the designator of thε rεcord dεpεnds on thε kεy that is bεing used) The path in the index by k_x leads to nodε 207 and from that node to the designator C of record 210. The information in the mεta-data according to thε dεsignator C allows to construct thε rεlεvant structure. For εxamplε construct a data structure that includes the key k_x .by links 213, 214 records 201 and 202 are accessed an thus with the datε field of record 210 all the key fields are constructed. The path in the index by k₂ lεads to nodε 209 and from that nodε to the - 76 -

dεsignator D of rεcord 210. Thε information in the meta-data according to the designator D allows to construct thε rεlevant structure, for example construct a data structure that includes the key k₂ . As shown, the search path defined by the search keys of rεcord 203 leads to thε first fiεld 212 having a valuε 'C (which is thε dεsignator according to sεarch key k_x ). The third fiεld points to data rεcord 201. Thε sεcond field 215 (having a value 'D' - which is the designator according to search key k₂ ) of thε same data structure 210 is accessiblε by sεarch path that is defined by the sεarch kεy of rεcord 204. The fourth field has a link to the actual data record 202. In this manner thε record DEPOSIT represεnts subordination of both account and cliεnt, whilst avoiding duplication of thε fields account, client date and sum. It should bε noted that the data εlεmεnts account and client are accεssεd by means of link to the original data records (201 and 202) and the rest of the data (date and sum) exists only once within data element 210. Obviously, data record 210 can include other fields. The invεntion is by no mεans bound to a givεn realization and accordingly the manner of realizing data record 210 as depicted in Fig 14 is only one out of many possible variants. Thε number of search paths is not limited. As had been εxplainεd above with refεrεnce also to Fig. 13E, if the sought data record is Axxxx (i.e. the account record 201 per se), then one simply moves in thε index with a search key of '.Ajixxx' to any of it's subordinated records and access the record of type A by the link from the subordinatεd rεcord to record of type A.. Such for example link 213 of fig. 14. Other implementation are of course feasiblε (e.g. maintaining a link in the index to record A), all as required and appropriate. The specified description which provides two (and in the gεnεral casε at lεast two) sεarch paths to onε physical occurrence of data records constitutes the multi-dimεnsional data structure which is a designated index that contains at least two search paths to one data record (called multi-dimension record).

Relation among data elεments - Fig. 15 illustrates another feature of - 77 -

thε invεntion, i.ε. data rεlationship fεature. Thus, data record A (a book data record) has C, F, J, K and L data records subordinated thereto. The realization of this hierarchy was illustrated above. According to the present rεlationship fεaturε, onε-to-onε and onε-to-many relations may easily be rεalizεd. Considεr, for εxamplε, that a book has many categories (L), i.e. one-to-many, howevεr, it has only onε abstract (K), i.e. one-to-one.

According to the proposed feature, a one-to-onε data relationship is implemεntεd by a dεsignatεd (compositε) kεy of two components: the first is thε dεsignatεd kεy of its subordinating rεcord and thε sεcond is thε dεsignator of thε subordinatεd rεcord (sincε it is a onε-to-onε relation thεrε is no nεεd to usε thε kεy field of the subordinated rεcord). Whεrεas a one-to-many relationship is implεmented by a designator (composite) key whose first component is the designator key of the subordinating record, and whose second component consists of thε dεsignator and kεy of thε subordinatεd record.

In this example, the one-to-onε rεlation bεtwεεn a book and its abstract is maintained by defining thε kεy of L to be .AxxxL, wherε Axxx is thε dεsignatεd key of A, L is the designator of thε kεy of record L. The one-to-many relation betwεεn a book and a category is maintained by defining the key of L to be AxxxLyyy, whεrε Axxx is the designated key of A, L is the designator of the key and yyy are the key field(s) of record L.

There follows now a description that pertains to another fεaturε according to thε sεcond aspect of the invention that pertains to multi-model represεntation. In accordance with this feature, and as will be εxplainεd in grεatεr dεtail below, one or more of thε following (and possibly other) models may bε represεnted by the specified designatεd indεx.

Rεprεsεnting relational tables by a multi-model dεsignatεd index -

The rεlational modεl considers all data as consisting of tables. Each table consists of records of the same structure, callεd tuples. Supposε, thε - 78 -

tuples consist of fields FI, F2 and F3. Each such field is a key. If kεy F2 is subordinatε to key FI, and key F3 is subordinate to key F2, we can easily construct thε tablε: to rεtrieve its tuplεs, follow the designator of key FI, and from there for each value of FI, follow thε dεsignator of F2, and in thε same manner continue to F3. Each such triple definεs a tuplε of thε table. Some projεctions arε even easiεr: to find all thε pairs of values of FI and F2 for which therε εxists a value of F3 in the tablε, wε tεrminatε thε sεarch after processing (FI, F2). Performing the projεction of (F2, F3) might bε εxpεnsivε, sincε it requires searching all valuεs of FI first. Howεvεr, if this opεration is common, the designatεd index should also maintain the search path (F2, F3, FI). I.e., we construct a new designator composite kεy F2'F3'F1' with new designators, and insert the additional paths to the dεsignated index. Thus each record can be reached via both paths and constitute multi-dimεnsion rεcord.

Additional models on the multi-modεl dεsignatεd index -

The designatεd indεx enables to reprεsεnt additional data modεls, including . relational database, an objεct oriented system, and a hierarchical database, wherε substantially no data is duplicated.

Implementing object oriεntεd (pεrsistent data structures) by multi-model designatεd indεx -

Thε objεct oriεntεd approach considers all data as objects. Every object belongs to a class, which determines its structure and which methods (functions) can be applied to it. Thε classes are organized in a hierarchy, from which structure and method may be inheritεd. Thε objεct-oriεntεd approach is εphεmεral — an objεct εxists only whilε thε program that crεatεd it is active Objects that need to be supported for a longεr pεriod of timε, arε dεfinεd as persistent. Thεsε objεcts are storεd on thε disk and arε availablε to - 79 -

other (authorized) programs. The multi-model dεsignatεd indεx can easily support such object. Since their structure is uniformly encoded with the aid of designators, later incarnations of the program as well as other programs can access thesε pεrsistεnt objεcts. Notε that at thε samε time a persistεnt object can also be part of a relational table. Thεrε is no nεεd to duplicate data.

Consider, for examplε, thε data structure 220 of fig. 16. Data records 223, 224, 225, and 226 arε subordinatεd to data record 221 and together with record 221 are considerεd as an objεct. It is possible to search efficiently in the index for all data rεcords with a key prefix εquals to thε dεsignatεd key of record 221 (partial key search) and retriεve the entire object. If only part of the object's data is needεd such as the A type record and the subordinated B type records, again a partial kεy sεarch is donε for data rεcords with key prefix that is equal to the designated key of record type A (for examplε 221) and thε dεsignatore B as the next key fiεld.

Implementing object-relational by multi-model designatεd index -

As opposed to the object-oriεntεd approach, thε relational approach considers all data as tables. Thus it is difficult to integratε SQL quεries in an object-oriεntεd programming languagε (C++ or Java). The object-relational approach provides an intεrfacε to convert tables to objects. The intεrfacε requires the user to spεcify thε rεlationship between the objεcts and the table attributεs. If somε attributes themsεlves are tables, we nεεd to allow relational algebra operations on thesε tablεs too. Thεsε conversions are performed by the application program. Thus thε databasε is unablε to optimize the queriεs. Thε dεsignatεd indεx trεats data in a uniform mannεr, thus providing an idεal intεrface betwεεn thε objεct-oriented application program and the data structures. The application program's queriεs are - 80 -

formulatεd in tεrms of designated keys, so the database can optimize the query strategy. The databasε rεtums designated keys, which the object-oriented application program can readily process by the object-oriεntεd mεthodology. Thε sεquence of dεsignators of thε search path to the object detεrminεs its class, and thε dεsignators to various fields allow the object-oriented program to resolve polymorphism of the method calls.

Thε dεsignatεd addrεsses all relating data. For examplε assuming that fig. 16 dεscribεs a data structure of an insurance company wherε records of type A are customers, records of type B arε customers claims and records of type C arε customers payments. As it is clearly shown, all the data records arε addressed by a single index structure.

Now, one is able to efficiently access all thε objεct instances since the index allows to navigate from a customer to its relatεd data - claims and payments. At the same time one is able to navigate on the index structure efficiently and effect the customer tablε (thε collections of records of typε A), customer claims table (the collections of records of type A and B) and customers payments tablε (the collection of records A and C). Since the data structure doεs not imposε physical clustering of the data, if data is shared among differεnt objεcts, it can be efficiently accεssεd by thε different object views - and thus such data record is a multi-dimension record. In this example, a claim can be efficiently accessεd both from thε customer object and the policy object and being from a typε structured as for example in fig.16 (structure 210).

The object-orientεd approach allows users to add user-dεfinεd typεs (UDT) and usεr-dεfmed functions (UDF). For examplε one could add the photos of accidents to the insurance company database. In the examplε, a nεw dεsignatεd data rεcord subordinatεd to the A type data record is definεd. Whεn a claim's details are searched, the photo of the accident is accessεd and sent to the photo printout application. With a dεsignatεd - 81 -

index, the relation bεtween the photo data to the claim is handled in the same manner as with built in classes and relations. The new UDT can be basεd on or bε related (by subordination) to any other data type. Now, with the designatεd indεx, thε application can navigate to the new UDT from the definεd classes from which the new UDT can inherent mεthods and other properties. In the examplε, whεn navigating in the index, one would navigate to a claim from which onε could reach the photo as well as any other part of the claim's data.

Network and Hierarchical Models:

Implemεnting network and hierarchical models by multi-model designated index -

The network and hierarchical models have beεn rεplacεd by thε relational model. However, even though these models are obsolete, they have some advantages (as well as many disadvantages) over the tablε-oriεntεd implεmεntation. Oncε a rεcord is retrieved the addrεssεs of related records are readily available.

Consider, for examplε, a bank with customers and loans. Each customer has an address and sevεral loans, whilε εach loan is taken by one or more customers. In the network modεl, εach customer is representεd by a nodε containing link to thε customer and links to nodes representing the loans taken by thε customer. A node reprεsεnting a loan is likεwise linked to thε nodεs of the customers that took that loan. Thus given a loan one can easily access of the customers that took thε loan and gεt thεir homε addresses.

The B-treε implementation, requires us to maintain two treεs: onε of thε customers and home addressεs, and thε sεcond of loans and customers. Thus having retrievεd the data of a loan, the names of the customers that - 82 -

took the loan are availablε. To find thεir addrεssεs, an indεpεndεnt B-trεε sεarch is required for each customer.

In the proposed multi-model designatεd indεx (such as for example in fig. 16), once reaching the node reprεsεnting thε loan , onε can continue to a designator that identifies the customers that took that loan (for examplε rεcords of typε B). Normally, at most onε disk access is required for each customer. The proposed multi-dimensional dεsignatεd indεx has the advantages of the network model, without its disadvantages. While the network model treated each node separatεly, and was susceptible to long search paths, the multi-model designatεd index treats all data uniformly and the length of the search paths in probably logarithmic such that the basε of thε logarithm is thε block sizε. Thus, in practice, the search requirεs a singlε disk access.

Implemεnting server-client model with object orientεd basεd on a designated index-

Thε client-servεr model enablεs εfficiεnt implεmεntations of thε relational model. According to this model, all the data rεsides at a central computer (called the server), and the application programs run at othεr computers (called clients). When an application nεεds data, it formulatεs an SQL quεiy, which is sent by thε cliεnt to thε sεrvεr. Thε sεrvεr evaluates the query and returns the resulting tablε to thε client.

Thus, the interface betwεεn the client and the servεr is via SQL queries — the servεr is unaware of thε intεmal data structures and code of the application. The client and the sεrver havε just to agree on the names of the tables and their attributes.

In the object-oriεntεd approach this modεl brεaks down. Sincε εach data item is an object, the sεrvεr must be aware of its internal structure. This problem is aggravated in thε prεsεncε of polymorphic methods. The sεrvεr must bε aware of the structure and thε dεtails of the entire class hierarchy. - 83 -

Thε designated index allows to apply the client-sεrver approach for the object-oriented and object-relational models. For examplε, to rεach an attribute, the application program sends the path of kεys and link dεsignators leading to the desirεd nodε to the server. Based on this data the server can fulfill the request without any .knowledgε of thε data structure of the application program.

The client and the sεrvεr should agrεε on thε namεs of thε fεlds and thεir dεsignators. Thε sεrvεr nεεd not be aware of the type of data of each such field, and its semantic content.

According to yet another aspεct of the invention it is proposed to further compress the rεprεsentation of the index therby render it more efficient. Herεon there is an estimation of the space requirεd by a triε and mεthods to reduce the space requirements.

If the trie is a layered index the analyzing of the trie index structure will concentrate on the last layεr ( /„):

Storage requiremεnts for primary key index of a triε -

Onε of the most important fεaturεs of a triε basεd data structure is the modest size of its representation. The PAIF for example maintains evεn smaller size than a conventional trie bεcausε of it's comprεssεd rεprεsentation.

The last levεl of the P.AIF index contains a trie with links that point to other trie nodes in thε samε block, and links that point to rεcords. Lεt N bε thε number of records in the database. Thε indεx contains exactly N pointers to these records. If each pointer rεquirεs 4 bytεs, the size needed for the pointers is 4N bytes. In addition, each pointer has a direction, (1 bytε) thus the total is 5N bytes.

Now consider the space requirεd for a PAIF trie. Since N pointers emanate from the index and each trie node has at least 2 children, therε arε at most - 84 -

n ≤ N - l trie nodes. Let d denote the avεragε numbεr of children of a trie nodε thεn n < N l{d -\) . Sincε in practice d » 2 , n « N . Each trie node has a lεvel numbεr (1 bytε). Sincε each trie node has at most one incoming triε link, thεrε arε at most n - 1 triε links, εach triε link has a label, which is a single character and an intra-block pointer (1 byte), thus a total of 3n bytes. Thus in the worst casε it is nεεdεd 3n + 4N ≤ IN bytεs in thε worst casε. And bεtwεεn 4N and 6N bytεs in practice.

Perfoπning thε samε analysis but from anothεr anglε: Considεr two pointεrs p and p₂ that εmanatε from nodε v of lεvεl k . Let x be a kεy reachable from p_\ andx₂ ^a key reachable from p₂. Then jtj and x₂ share the first & -1 characters. In A PAIF structure, each one of these characters is represented at most once. In the B-tree reprεsεntation it is needed to explicitly represent thε first k character of each key.

The savings in the PAIF are twofold: First evεry character of is stored at most once on each levεl, and sεcond, not all characters neεd bε representεd.

Furthεr indεx compression -

In the abovε discussion, most of thε space is requirεd for thε pointers to records. It will be now presentεd a mεthod that allows to save pointer space. The method is basεd on allowing sεvεral links to rεcords to share the samε pointεr. Supposε, first, that thε records havε fixεd sizε. If the first two records reside in the same block, then it is possible to keep a single full sized pointεr for the first pointer to a block, and instead of keeping a pointer for each of the rεmaining outgoing links to that block, computing their displacement, i.e., if the first two records reside in block number 2000 and the third record in block 7000 it is possible to maintain the structure 2000(e,f) 7000(h).Thε savings would be much more substantial if a larger number of outgoing links point all to the same block. If k such links point to - 85 -

a block, then the 4B of the pointer are divided among all k rεcords, thus the space for addressing each record is reduced to 4/k bytes plus the space for the direction (1 bytε). For k > 4 this mεans that εach record requires 2 bytes in the index.

For variablε sizεd rεcords It is possible to maintain the displacement within the block, for example: 2000(e:<i_e , t df ) 7000(h: _A ). Instεad of maintaining a full pointεr, a displacement that could fit into a single byte is maintained. Thus, for εach rεcord it is needed 1 byte for its share in the pointer, 1 byte for the direction, and 1 byte for the displacemεnt; a total of 3 bytεs per record.

Looking at the examplε of fig. 17, fig. 17A shows a nodε 2000 of a trie with the links 2010, 2011, 2012 (values 5,9,A respεctivεly) that address 3 data records - 2002, 2004, 2006 at disk address 3000, 5000, 7000 respectively. . The size neεdεd to rεprεsεnt thε link valuεs (1 bytε for each link) and the pointers (4 bytεs) to thε data is 15 bytes.

Turning now to fig. 17B where node 2000 maintains a shared link (2010) to thrεε data rεcords (2002,2004,2006). Thε information that rεprεsεnt thε link is the address to block 2020 (4 bytes) and thε link values to the data records 2002, 2004, 2006 that reside in the block (1 byte for each link value). The size neεdεd to rεpresent thε pointεr to the data block and the valuε of thε links is only 7 bytεs - (3000:5,9,A).

Now in ordor to access data record 2004 one can calculate it's address as the address of the data block + the displacemεnt which dεpεnds on thε rεcord sizε assuming that the records in thε data block arε all of equal size. As had been explained, node 2000 can include links to other data records or data blocks (such as link 2024 to data block 2022 accommodating data rεcord 2008).

Prεfεrably, thε database file managεment system of the invention - 86 -

should be associated with .known per se concurrency and/or distributed capabilities so as to enablε a plurality of usεrs to access virtually simultaneously to the database. The database may bε located in a central location, or distributed among two or more rεmotε locations.

Turning now to Figs. 18A-D, thεrε arε shown four bεnchmark graphs demonstrating the enhanced performance, in terms of response time and file size of databasε utilizing a file managemεnt system that employs a system of the invention vs. commercially available Ctrεε based database. The inserts are realized through Uniface application running in Windows (for workgroup) opεrating systεm.

Thε benchmark of Fig. 18A concerns measuring the time in minutes for inserting an evεr increasing number of a priori sorted data records to a file (0-1,000,000). As shown in Fig. 18A, the larger number of inserts thε grεatεr is thε improvεmεnt in terms of response time of the database file managemεnt systεm of thε invεntion. Thus inserting 1 million records takes about 669 minutes in the Ctree based database as compared to only 65 minutes in the systεm of thε invεntion. Mor ovεr, thε rεsponse time in thε filε management system of the invention increases by only a small extent as the numbεr of records increases, as opposed to significant increasε in thε rεsponsε timε in the counterpart systεm according to the prior art.

The bεnchmark of Fig. 18B illustrates the file size in mega bytes as a function of number of data records in the file (0-1,000,000). As shown in Fig. 18B, the larger number of rεcords the greater is the improvemεnt in tεims of file size in thε databasε filε managεmεnt systεm of thε invεntion. Thus for 1 million rεcords thε filε sizε of Ctrεε basεd filε is about 151 mεga bytε as compared to only 22 mega byte in the database file managemεnt systεm of thε invεntion.

Graphs 18C and 18D are similar to thosε shown in Figs. 18A and 12B apart from the fact that in the former (18C and 18D) thε data rεcords arε insεrted randomly whereas in the latter (18A and 18B) the data records are a - 87 -

priori sorted according to search key. As shown the rεsults arε as bεforε i.ε. the system of the invention is more efficient in terms of both responsε time and file size.

Figs. 19A-D illustrates a benchmark graphs of a system of the invention (operating under DOS operating system) vs. commercially available Btreε basεd databasε systεm. The results are as before i.e. the system of the invention is more efficient in terms of both responsε time and file sizε.

Thosε vεrsεd in thε art will appreciate that alphabetic and roman characters designating claim steps arε made for conveniεncε of explanation only and should by no means construes as imposing order of stεps, or how many timεs each step is executed vis-a-vis other steps of the method.

The prεsεnt invεntion has been described with a certain degrεε of particularity, but thosε vεrsεd in the art will appreciate that various modifications and alterations may be implεmεntεd without dεparting from thε scope and spirit of the following claims:

Claims

- 88 -CLAIMS:

1 . In a storage m╬╡dium us╬╡d by a databas╬╡ fil╬╡ manag╬╡m╬╡nt syst╬╡m executed on data processing system, a data structure that includes: a layered index arranged in blocks; the layered index includes a basic partitioned index that is associated with data records; the basic partitioned index enabl╬╡s accessing or updating the data records by key or keys, and being susceptibl╬╡ to an unbalanced structure of blocks; said layered index enabl╬╡s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

2 . The layer╬╡d ind╬╡x of Claim 1, wh╬╡r╬╡in said basic partition╬╡d ind╬╡x b╬╡ing a tri╬╡.

3 . In a storage medium used by a database file management system executed on data processing syst╬╡m, a data structure that includes: an index arranged in blocks and b╬╡ing constructed over the keys of data r╬╡cords; the index includes a basic partitioned ind╬╡x that is associated with the data records; the basic partitioned index ╬╡nabl╬╡s accessing or updating the data records by key or keys, and being susceptible to an unbalanced structure of blocks; said index ╬╡nabl╬╡s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

4. In a storage medium used by a database file managem╬╡nt system ex╬╡cut╬╡d on data processing system, a data structure that includes: an index arrang╬╡d in blocks and b╬╡ing construct╬╡d over the keys of data records; th╬╡ ind╬╡x includes a trie that is associated with the data records; the trie enables accessing or updating the data records by key or keys, and being susceptibl╬╡ to an unbalanced structure of blocks; said index enables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks. - 89 -

5 . The layered ind╬╡x of Claim 1, wh╬╡rein said storage medium being an ext╬╡mal m╬╡mory.

6 . The layer╬╡d ind╬╡x of Claim 5, wh╬╡r╬╡in said storag╬╡ m╬╡dium b╬╡ing furth╬╡r an int╬╡rnal m╬╡mory.

7 . The layered index of Claim 1, wher╬╡in said storag╬╡ m╬╡dium being an internal memory.

8 . The layered index of Claim 2, wh╬╡r╬╡in said trie being a PAIF tri╬╡.

9 . Th╬╡ layered index of Claim 1, wher╬╡in th╬╡ basic partitioned index and the r╬╡pr╬╡s╬╡ntativ╬╡ ind╬╡x of said lay╬╡r╬╡d ind╬╡x b╬╡ing substantially th╬╡ sam╬╡ ind╬╡x sch╬╡m╬╡s.

10 . Th╬╡ layered index of Claim 1, wher╬╡in th╬╡ basic partitioned index and the r╬╡pr╬╡sentative index of said layer╬╡d index being different index schem╬╡s.

11 . The layered index according to Claim 8, wherein the repres╬╡ntative index of said layered index being the Btre╬╡ ind╬╡x sch╬╡m╬╡.

12 . Th╬╡ lay╬╡red index according to Claim 10, wher╬╡in th╬╡ repres╬╡ntativ╬╡ ind╬╡x being the Btree index schem╬╡.

13. Th╬╡ lay╬╡red index according to Claim 8, wh╬╡rein the representative index of said layered index being substantially the P.AIF ind╬╡x scheme.

14. Th╬╡ lay╬╡red index according to Claim 9, wherein the r╬╡presentativ╬╡ index being substantially the PAIF index schem╬╡.

15 . Th╬╡ layered index according to Claim 1, capable of supporting the ODBC standard.

16 . The lay╬╡r╬╡d ind╬╡x I₀,...,I_h according to Claim 1, comprising: a r╬╡pr╬╡s╬╡ntativ╬╡ ind╬╡x I_x,...,I_h constructed such that any Ij is constructed over the representativ╬╡ k╬╡ys of Ij - 1 .

17 . The layer╬╡d ind╬╡x I₀,...,I_h according to Claim 16, wh╬╡rein I_h is - 90 -

fully contained in one block.

18 . The layer╬╡d ind╬╡x of Claim 3, wh╬╡r╬╡in said storag╬╡ medium being an ext╬╡rnal m╬╡mory.

19 . The layered index of Claim 18, wher╬╡in said storage medium being further an internal memory.

20 . The layered index of Claim 3, wher╬╡in said storag╬╡ medium being an internal memory.

21 . The layered index according to Claim 3, capable of supporting th╬╡

ODBC standard.

22 . The layer╬╡d index of Claim 4, wher╬╡in said storag╬╡ medium being an ext╬╡rnal m╬╡mory.

23 . Th╬╡ layered index of Claim 22, wherein said storage medium being further an internal m╬╡mory.

24 . The layered index of Claim 4, wher╬╡in said storag╬╡ medium being an internal memory.

25 . The layer╬╡d ind╬╡x according to Claim 4, capable of supporting the

ODBC standard.

26. In a database file management system for accessing data records and being executed on data processing system; the data records are associated with a basic partitioned ind╬╡x arrang╬╡d in blocks and b╬╡ing stor╬╡d in a storag╬╡ medium; the basic partition╬╡d ind╬╡x ╬╡nabl╬╡s accessing or updating th╬╡ data r╬╡cords by k╬╡y or k╬╡ys and being susceptible to an unbalanced structure of blocks; a method for constructing a layered index arranged in blocks, comprising the steps of:

(a) providing said basic partitioned index;

(b) constructing a representative index over th╬╡ r╬╡pr╬╡s╬╡ntative keys of said basic partitioned index; said lay╬╡red index enables accessing or updating the data r╬╡cords by k╬╡y or k╬╡ys and constitutes a balanced structure of blocks. - 91 -

27. Th╬╡ layered index of Claim, 26 wherein said basic partition╬╡d index being a trie.

28 . In a database file manag╬╡m╬╡nt syst╬╡m for accessing data records and being ex╬╡cut╬╡d on data processing system; th╬╡ data r╬╡cords are associated with a basic partitioned index arranged in blocks and being stor╬╡d in a storag╬╡ m╬╡dium; the basic partitioned index enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks; a method for constructing an index over the keys of the data records, the index being arranged in blocks, comprising the steps of:

(a) providing said basic partition╬╡d index;

(b) constructing an index ov╬╡r th╬╡ representativ╬╡ keys of said basic partitioned ind╬╡x; said index enabl╬╡s accessing or updating the data r╬╡cords by k╬╡y or k╬╡ys and constitutes a balanced structure of blocks.

29 . In a database file management system for accessing data records and b╬╡ing executed on data processing system; the data records ar╬╡ associated with a trie arranged in blocks and being stored in a storage medium; the trie enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks; a method for constructing an index over the keys of the data records, the index being arranged in blocks, comprising the st╬╡ps of:

(a) providing a trie;

(b) constructing an index over the representative keys of said trie; said index enabl╬╡s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.

30 . Th╬╡ m╬╡thod of Claim 26, wh╬╡r╬╡in said storag╬╡ medium being an external memory. - 92 -

31 . The method of Claim 30, wherein said storage m╬╡dium being further an internal memory.

32 . Th╬╡ m╬╡thod of Claim 26, wh╬╡r╬╡in said storag╬╡ medium being an internal memory.

33. The method of Claim 27, wherein said trie being a P.AIF trie.

34 . The method of Claim 26, wher╬╡in the basic partitioned index and the representativ╬╡ index being substantially the same index schem╬╡s.

35 . Th╬╡ m╬╡thod of Claim 26, wherein the basic partitioned ind╬╡x and the repres╬╡ntative index being different index sch╬╡m╬╡s.

36 . Th╬╡ method of Claim 33, wher╬╡in the representativ╬╡ index being the Btre╬╡ ind╬╡x sch╬╡m╬╡.

37 . Th╬╡ m╬╡thod of Claim 35, wherein the r╬╡pr╬╡s╬╡ntative index being the Btr╬╡e index scheme.

38 . The layered index according to Claim 33, wherein the r╬╡pr╬╡s╬╡ntativ╬╡ index being the PAIF ind╬╡x sch╬╡m╬╡.

39 . Th╬╡ layered index according to Claim 34, wher╬╡in th╬╡ representative index being the PAIF ind╬╡x scheme.

40 . The method of Claim 26, capable of supporting the ODBC standard.

41 . Th╬╡ method of Claim 28, wher╬╡in said storag╬╡ m╬╡dium b╬╡ing an external memory.

42 . The method of Claim 41, wh╬╡r╬╡in said storag╬╡ m╬╡dium b╬╡ing furth╬╡r an int╬╡mal m╬╡mory.

43. Th╬╡ m╬╡thod of Claim 28, wherein said storage medium being an internal m╬╡mory.

44 . Th╬╡ m╬╡thod of Claim 28, capabl╬╡ of supporting th╬╡ ODBC standard.

45 . Th╬╡ m╬╡thod of Claim 26, wh╬╡r╬╡in said index supports sequential operations.

46 . The method of Claim 28, wher╬╡in said ind╬╡x supports - 93 -

s╬╡qu╬╡ntial op╬╡rations.

47 . Th╬╡ m╬╡thod of Claim 29, wh╬╡r╬╡in said ind╬╡x supports sequential operations.

48 . The m╬╡thod for accessing a sought data record rby k╬╡y k m th╬╡ lay╬╡r╬╡d ind╬╡x of Claim 1, comprising:

(a) s╬╡arching k in I_h to I_k wh╬╡r╬╡ h ΓëÑ k ΓëÑ O and in th╬╡ cas╬╡ it is not found in th╬╡ k╬╡y of a data r╬╡cord in order to find the block of I_h__x leading to k ;

(b) repeating step (a) until reaching the block of I. that is associated with the data record with key k , if exists.

49 . The method for ins╬╡rting a data r╬╡cord rby k╬╡y k in th╬╡ lay╬╡r╬╡d ind╬╡x of Claim 1, comprising:

(a) searching k in I_h to I_k where h ΓëÑ k ΓëÑ O and in the cas╬╡ it is not found in th╬╡ key of a data r╬╡cord in order to find the block of I_h__x leading to k ;

(b) repeating step (a) until reaching the block B of I₀ that is associated with the data record with key k , if exists;

(c) associating r to B .

50 . Th╬╡ m╬╡thod for deleting a data record r by key Hn the layered index of Claim 1, comprising:

(a) searching k in I_h to I_k wher╬╡ h ΓëÑ k ΓëÑ O and in th╬╡ case it is not found in the key of a data record in order to find the block of I_h_ leading to k ;

(b) repeating step (a) until reaching the block B of I. that is associated with the data record with key k , if ╬╡xists;

(c) disconnecting r from B .

51. The method for accessing a sought data record rby key A: in the layer╬╡d index of Claim 3, comprising: - 94 -

(a) searching k in I_h to I_k where h ΓëÑ k ΓëÑ O and in the case it is not found in the key of a data record in order to find the block of I_h__x leading to k ;

(b) repeating step (a) until reaching the block of I₀ that is associated with the data record with key k , if exists.

52 . The method for inserting a data record rby key k in the layered index of Claim 3, comprising:

(a) searching k in I_h to I_k where h ΓëÑ k ΓëÑ O and in the cas╬╡ it is not found in th╬╡ k╬╡y of a data record in order to find th╬╡ block of I_h__x leading to k ;

(b) rep╬╡ating step (a) until reaching the block B of I. that is associated with the data r╬╡cord with k╬╡y k , if exists;

(c) associating r to B .

53 . The m╬╡thod for deleting a data r╬╡cord rby k╬╡y A: in th╬╡ layered index of Claim 3, comprising:

(a) searching k in I_h to I_k where h ΓëÑ k ΓëÑ O and in th╬╡ cas╬╡ it is not found in the key of a data record in order to find the block of I_h__x leading to k ;

(b) r╬╡p╬╡ating st╬╡p (a) until reaching th╬╡ block B of I. that is associated with the data record with key k , if exists;

(c) disconnecting r from B .

54 . The method of Claim 26, wher╬╡in said construction st╬╡p (b) includes:

(a) If B (in I_h__x ) overflows, it is split into two (or mor╬╡) blocks and the repr╬╡s╬╡ntativ╬╡ ofB inI_h is r╬╡plac╬╡d by the repr╬╡s╬╡ntativ╬╡s of the new blocks.

(b) If the block of I_h overflows an additional layer I_h+X is creat╬╡d and add╬╡d to th╬╡ layered index. - 95 -

55 . The method according to Claim 54, perform╬╡d on th╬╡ fly.

56 . The method according to claim 54, performed post factum.

57 . The method of Claim 28, wherein said construction step (b) includes:

(a) If B (in I_h__x ) overflows, it is split into two (or mor╬╡) blocks and th╬╡ repres╬╡ntativ╬╡ ofB v^'s\I_h is replaced by the repr╬╡sentatives of the new blocks.

(b) If the block of I_h overflows an additional layer I_h+X is creat╬╡d and added to the layered index.

58 . The method according to Claim 57, perform╬╡d on th╬╡ fly.

59. Th╬╡ method according to claim 57, performed post factum.

60 . The method according to claim 26, wherein the construction step (b) includes:

(a) at least one short link among the short links of a nod╬╡ (h╬╡r╬╡on split node) in the block (of B._ ) is del╬╡ted (hereon split link) in a way that at least two tries ╬╡xist in th╬╡ block;

(b) ╬╡ach of the sub-tre╬╡s is moved to a separate block;

(c) if the block of B_{ does not exist, B_t is creat╬╡d and a copi╬╡d nod╬╡ of th╬╡ split nod╬╡ is created in B_t ;

(d) if the block of B_i exists and a copied nod╬╡ of th╬╡ split nod╬╡ do╬╡s not exist in B_t , then a copi╬╡d nod╬╡ of th╬╡ split nod╬╡ is cr╬╡at╬╡d in

5,. and conn╬╡ct╬╡d to the trie of B_; such that B_t__x ' (at th╬╡ ╬╡nd of th╬╡ split process) is accessible in a search path that includes the root node in B_t and th╬╡ copi╬╡d nod╬╡ and its lab╬╡l╬╡d links according to th╬╡ r╬╡pr╬╡s╬╡ntativ╬╡ k╬╡y of B_t_ ' ;

(e) if the copied node has no direct link, a direct link is added from the copied nod╬╡ to the block 5_M ; - 96 -

(f) a far link added from the copied node to the block ┬ú,._, ' or if th╬╡ copi╬╡d nod╬╡ has a short link to a child node in the direction of the far link, the far link is replaced by a direct link from the child node to block 5,._, ' .

61 . In a storag╬╡ medium used by a database file manag╬╡m╬╡nt syst╬╡m executed on data processing system, a data structure that includes at least one probablistic access indexing file (PAIF) having a plurality of nodes and links; th╬╡ l╬╡av╬╡ nod╬╡s of said P.AIF ar╬╡ associated each with at least one data record accessibl╬╡ to said us╬╡r application program and wh╬╡r╬╡in at l╬╡ast portion of said data r╬╡cord constitutes at least one search-k╬╡y; s╬╡l╬╡cted nodes in said PLAF represent, each, a given offset of a search key portion within said inset search key; link(s) originat╬╡d from each given nod╬╡ from among said s╬╡l╬╡cted nod╬╡s, r╬╡pr╬╡s╬╡nt, ╬╡ach, a unique value of said search key portion; the PLAF having at least two sub-PIAF's being arranged, each, in a block; said data base fil╬╡ manag╬╡m╬╡nt system is further capable of arranging said blocks as a balanced structure of blocks.

62 . The data processing system according to Claim 61, wherein at least some data records that ar╬╡ associated to said leaf nodes are held in at least one separate file.

63 . Th╬╡ data processing system according to Claim 61, wherein at least one leaf is associated with more than one data record.

64 . A m╬╡thod for ins╬╡rting a n╬╡w data r╬╡cord into an existing PAIF according to Claim 61 including the execution of the following steps:

i. advancing along a ref╬╡r╬╡nc╬╡ path commencing from the root nod╬╡ and ╬╡nding at a data r╬╡cord associated to a leaf node (referred to as - 97 -

"refer╬╡nc╬╡ data r╬╡cord"); in ╬╡ach nod╬╡ in th╬╡ r╬╡f╬╡r nc╬╡ path, advancing along a link originated from said node if the valu╬╡ represented by the link equals the value of the 1-bit-long key portion at the offset specified by said node; in the case that the offset specified in the node is beyond any corresponding key portion in the k╬╡y, or if th╬╡r╬╡ is no link with said valu╬╡, advancing along an arbitrary path to any reference data record ; ii. comparing the search key of the ref╬╡r╬╡nce data r╬╡cord to that of th╬╡ n╬╡w data r╬╡cord for d╬╡t╬╡rmining th╬╡ small╬╡st offs╬╡t of the search k╬╡y portion that discerns the two (hereinafter disceming offs╬╡t). iii. proceed to one of the following st╬╡ps (iii.0-iii.3) d╬╡p╬╡nding upon th╬╡ valu╬╡ of th╬╡ discerning offset: iii.O if the data records ar╬╡ ╬╡qual then terminate; or iii.l if the disceming offset matches th╬╡ offs╬╡t indicated by one of the nod╬╡s in th╬╡ reference path, add another link originating from said one node and assign to said link the valu╬╡ of the search key portion at the disceming offset taken from the s╬╡arch k╬╡y of th╬╡ new data record; or iii.2 if th╬╡ discerning offset is larger than that indicated by the leaf node that is linked, by m╬╡ans of a link, to th╬╡ ref╬╡renc╬╡ data r╬╡cord: iii.2.1 disconnect the link from th╬╡ r╬╡f╬╡r╬╡nc╬╡ data r╬╡cord (i.╬╡. it remains temporarily "loose") and mov╬╡ the link to a new node; the new node is assigned with a value of the disceming offset; iii.2.2 connect th╬╡ r╬╡f╬╡r╬╡nce data record and th╬╡ new node (which now becomes a l╬╡af nod╬╡) and assign to th╬╡ link (long link) a valu╬╡ of th╬╡ s╬╡arch-k╬╡y-portion at th╬╡ discerning offset taken from the search key of the ref╬╡r- - 98 -

╬╡nc╬╡ data record; iii.2.3 connect by means of a link the new data record and the new nod╬╡ and assign to the link (long link) a value of the search-key-portion at the disceming offset taken from the search key of the new data record; or iii.3 if conditions iii.0,iii.l and iii.2 are not met, ther╬╡ ╬╡xists, in th╬╡ r╬╡f╬╡r╬╡nc╬╡ s╬╡arch path, a fath╬╡r node and a child node ther╬╡of such that the disceming offset is, at th╬╡ sam╬╡ time, larger than the offset assign╬╡d to th╬╡ father node and smaller than th╬╡ offs╬╡t assign╬╡d to the child node -(- consider╬╡d case A), or all the nod╬╡s in th╬╡ r╬╡f╬╡r╬╡nc╬╡ s╬╡arch path hav╬╡ a value greater than the disceming offset - (-- considered case B); accordingly, apply the following sub-steps: iii.3.1 for case A and B, create a new node and assign the node with the valu╬╡ of said discr╬╡ning offs╬╡t, for case A only - disconnect th╬╡ link from th╬╡ fath╬╡r nod╬╡ to the child node and shift the link to a new internal node (i.e. the child node remains temporarily "loose"); iii.3.2 for case A and B, connect by means of a link (long link) the n╬╡w data record and said n╬╡w int╬╡mal nod╬╡; th╬╡ valu╬╡ assign╬╡d to th╬╡ link is that of th╬╡ s╬╡arch-key-portion at the disceming offset, as taken from the search key of the new data record; iii.3.3 for case A and B, connect by means of a new link the new node and for case A - the child node, for cas╬╡ B - th╬╡ root nod╬╡ (i.╬╡. th╬╡ n╬╡w nod╬╡ b╬╡comes for case A - a n╬╡w father node, for case B - a new root nod╬╡), and th╬╡ valu╬╡ assign╬╡d to said link is th╬╡ search-key-portion at the offset indicated by the new node, taken from the search key of the reference data record. - 99 -

65 . A method for obtaining a balanced PAIF ind╬╡x; the PAIF including blocks each accommodating a plurality of nodes and links originated from said nodes; leaf nodes from among said nodes ar╬╡ associated with data records; the method comprising ex╬╡cuting the following steps as many times as r╬╡quir╬╡d:

(i) replacing a block, constituting a replaced block, with at least two split blocks such that few from among the nodes of said split block are accommodated within one of said split blocks and the remaining nod╬╡s from among th╬╡ nod╬╡s of said split block ar╬╡ accommodated within other split blocks; (ii) coping at l╬╡ast one node from among th╬╡ nodes of said replaced block into a block such that said at least two split blocks being children blocks ther╬╡of.

66 . In a computer system having a storag╬╡ medium of at least an internal memory that ranges betwe╬╡n 10 to 20 M byt╬╡ or mor╬╡, and an external m╬╡mory; a data structure that includes an index over th╬╡ k╬╡ys of th╬╡ data r╬╡cords; the index is arranged in blocks; such that for on╬╡ billion data r╬╡cords substantially no mor╬╡ than two acc╬╡ss╬╡s to said ╬╡xt╬╡mal m╬╡mory are requir╬╡d in order to access a block that is associated with any one of said billion data records, irrespective of th╬╡ siz╬╡ of the key of said data r╬╡cords.

67 . In a computer system having a storage medium of at least an internal m╬╡mory that rang╬╡s b╬╡tw╬╡╬╡n 10 to 20 M byt╬╡ or mor╬╡, and an ╬╡xt╬╡mal m╬╡mory; a data structure that includes an index over the keys of the data records; the index is arrang╬╡d in blocks; such that on╬╡ million data r╬╡cords substantiallv all th╬╡ blocks of th╬╡ ind╬╡x ar╬╡ accommodat╬╡d in said int╬╡mal - 100 -

m╬╡mory regardless of the size of the key of said data records.

68 . In a computer syst╬╡m having a storag╬╡ m╬╡dium, a data structure that includes an index over the keys of data records; the index is arranged in a balanced structure of blocks and enabl╬╡s to perform sequential operations on said data records; the index size is essentially not aff╬╡ct╬╡d from the size of said keys.

69. In a storage medium used by a database file management system executed on data processing system, a data structure that includes: an index over the keys of data records; the data records being of at least two typ╬╡s wh╬╡r╬╡ data r╬╡cords of the second type are subordinated to th╬╡ data r╬╡cords of th╬╡ first type.

70. In a storage medium used by a database file managem╬╡nt syst╬╡m executed on data processing system, a data structure that includes: a designat╬╡d index over designat╬╡d k╬╡ys of data r╬╡cords; the data records, constituting designat╬╡d data r╬╡cords, b╬╡ing of at l╬╡ast two typ╬╡s where designated data records of th╬╡ second type ar╬╡ subordinat╬╡d to th╬╡ d╬╡signat╬╡d data r╬╡cords of th╬╡ first type.

71. The storage m╬╡dium of Claim 69, wherein said ind╬╡x constitutes a layered index.

72. The storag╬╡ medium of Claim 70, wherein said designat╬╡d index constitutes a layer╬╡d ind╬╡x.

73. The storage medium according to Claim 70, wher╬╡in said d╬╡signated index constituting a multi-dimensional index;

74. Th╬╡ storage medium according to Claim 72, wher╬╡in said d╬╡signat╬╡d index constituting a multi-dimensional index;

75. The storage medium according to Claim 70, wher╬╡in said d╬╡signat╬╡d ind╬╡x constituting a multi-model ind╬╡x.

76. The storage m╬╡dium according to Claim 72, wh╬╡r╬╡in said - 101 -

designated index constituting a multi-model ind╬╡x.

77. The storage medium according to Claim 74, wherein said designated index constituting a multi-model index.

78. The storage medium according to Claim 69 wh╬╡rein data record of the first type and subordinated data record of the second type constitute one to one relationship.

79. The storage medium according to Claim 70, wherein data record of the first type and subordinated data record of the second type constitut╬╡ on╬╡ to many relationship.

80. The storage medium according to Claim 71, wh╬╡r╬╡in data record of th╬╡ first type and subordinated data record of the second type constitute one to one relationship.

81. The storage medium according to Claim 73, wherein data record of the first type and subordinat╬╡d data r╬╡cord of th╬╡ s╬╡cond type constitute on╬╡ to many r╬╡lationship.

82. Th╬╡ storag╬╡ m╬╡dium of Claim 69, wh╬╡r╬╡in said index includes trie.

83. The storage medium of Claim 70, wher╬╡in said index includes trie.

84. The storage medium of Claim 71, wher╬╡in th╬╡ basic partition╬╡d ind╬╡x of said lay╬╡r╬╡d ind╬╡x b╬╡ing a tri╬╡.

85. Th╬╡ storag╬╡ m╬╡dium of Claim 69, wherein for accessing or updating transaction in respect of subordinated data record having composite key KL.Kn, there exists in the index a subordinated search path that leads to the subordinated data record according to the composite key KL.Kn; the subordinated search path includes a search path to a data record having key K 1..kn- 1.

86. The storag╬╡ m╬╡dium of Claim 70, wh╬╡r╬╡in for accessing or updating transaction in resp╬╡ct of subordinat╬╡d data r╬╡cord having composite key KL.Kn, ther╬╡ exists in the index a subordinated search path that leads to the subordinated data record according to the composite key KL.Kn; the subordinated search path includes a search path to a data record - 102 -

having k╬╡y K 1..kn- 1.

87. Th╬╡ storag╬╡ m╬╡dium according to Claim 75, wh╬╡r╬╡in said multi-mod╬╡l includes relational model.

88. The storag╬╡ medium according to Claim 75, wher╬╡in said multimod╬╡l includes object ori╬╡nt╬╡d mod╬╡l.

89. The storage m╬╡dium according to Claim 75, wherein said multimodel includes object relational model.

90. The storage medium according to Claim 75, wherein said multimodel complies with a client serv╬╡r mod╬╡l.

91. Th╬╡ storage medium according to Claim 76, wher╬╡in said multimod╬╡l includes relational model.

92. The storage medium according to Claim 76, wher╬╡in said multimodel includes obj╬╡ct ori╬╡nt╬╡d mod╬╡l.

93. Th╬╡ storage medium according to Claim 76, wher╬╡in said multimodel includes object relational model.

94. The storage medium according to Claim 76, wh╬╡rein said multimod╬╡l complies with a client serv╬╡r model.

95. In a storage medium used by a databas╬╡ fil╬╡ management system ╬╡xecuted on data processing system, a data structure that includes: an index being stored in the storage medium and constructed over the keys of said data records that are stored in blocks; the index being arranged in blocks with the leaf blocks being linked to data records by means of links; said index is characterized in that at least one of said links is shared by at least two data records stored in the same block.

96 . The storage medium of claim 95, wher╬╡in said ind╬╡x b╬╡ing constituted by a trie.

97 . In a storage medium used by a databas╬╡ fil╬╡ management system ex╬╡cut╬╡d on data processing system, a data structure that includes: an ind╬╡x b╬╡ing stor╬╡d in a storag╬╡ medium and constructed over - 103 -

th╬╡ keys of said data records that are stored in blocks; the index being arranged in blocks with the leaf blocks being link╬╡d to data r╬╡cords by means of links; said index is characteriz╬╡d in that at least one of said links is shared by at least two data records stored in th╬╡ sam╬╡ block; said index constituting a layered index according to claim 1, and blocks of said basic partitioned index are linked to said data records.

98 . The storage medium of claim 97, wher╬╡in said basic [partitioned inex b╬╡ing constituted by a trie.