WO2003052628A1

WO2003052628A1 - Computer-implemented method of merging at least two dimensionally modeled databases

Info

Publication number: WO2003052628A1
Application number: PCT/DK2001/000831
Authority: WO
Inventors: Bjørn Nikolaj HENRICHSEN
Original assignee: Acinta Aps
Priority date: 2001-12-16
Filing date: 2001-12-16
Publication date: 2003-06-26
Also published as: AU2002221573A1

Abstract

The invention relates to a computer-implemented method of merging at least two dimensionally modeled databases (DB1, DB2). The invention facilitates establishment of so-called hybrid dimensional models referring to dimensional models of independent data sources. In this way, data warehouse-like environments may be established on the basis of already existing dimensionally modeled data without any or very little need for data conversion.

Description

COMPUTER-IMPLEMENTED METHOD OF MERGING AT LEAST TWO DIMENSIONALLY MODELED DATABASES

Field of the invention

The invention relates to a computer-implemented method of merging at least two dimensionally modeled databases (DB 1 , DB2) as stated in claim 1.

Background of the invention

The invention relates to the handling of data from a variety of data sources.

A part of this data handling aspect may be referred to as business intelligence.

Business Intelligence is generally understood as the use of strategic information produced by and captured within business processes to enhance business operations.

Enterprises today use a variety of software applications, each dealing with a different activity or department inside the business: Production, human resources, sales etc. The different applications store data in different databases according to their own proprietary database schemas. This makes it difficult to integrate and consolidate data from these systems for analytical purposes since database schemas vary and different databases use different access methods and each database type has its own query language.

For the end user, this means either having IT specialists build a Data Mart/Warehouse or a lot of manual work trying to integrate and consolidate data using spreadsheets and other general desktop tools.

The user will often attempt to load the data into the spreadsheet to perform the necessary analysis. However, this poses a range of problems. For example, it is quite complex to perform table joins, and the amount of data which can be stored in a spreadsheet is limited. Furthermore, spreadsheets encourage mixing data storage with formatting which, by experience, will complicate further analysis even more. A more appropriate solution is to perform the analysis in a database which generally requires knowledge of relational database design and use of SQL, both of which are very complex to non-IT experts. Even for an expert, the time spent on transporting data, designing queries and calculations, updating data and establishing the necessary joins is considerable.

Summary of the invention The invention relates to a computer-implemented method of merging at least two dimensionally modeled databases (DB1, DB2) comprising

at least one first dimensional model (DM1) referring to said at least one first database (DB1) by means of a first set of meta-data (DMMD1) said at least one first dimensional model (DM1) comprising at least one dimension,

at least one second dimensional model (DM2) referring to said at least one second database (DB2) by means a second set of meta-data (DMMD2) said at least one second dimensional model (DM2) comprising at least one dimension,

merging of the at least two dimensions of the at least two different dimensional models (DM1, DM2) by adding at least one set of further meta-data (FUMD) said at least one set of further meta-data (FUMD) referring to at least one of said first set of meta-data (DMMD1) and said second set of meta-data (DMMD2).

According to the invention, the costs of implementation of business intelligence may be minimized.

According to the invention, an extraction model may be so intuitive that an end-user can define it.

Moreover, a hybrid dimensional model according to an embodiment of the invention is flexible and at the same time robust. Moreover, a hybrid dimensional model according to an embodiment of the invention is efficient for practical purposes. According to the invention, time-to-market may be minimized when expanding the analysis foundation with new data sources.

The invention relates to a computer-implemented method of merging at least two dimensionally modeled databases (DB1, DB2)

The invention facilitates establishment of so-called hybrid dimensional models referring to dimensional models of independent data sources. In this way, data warehouse-like environments may be established on the basis of already existing data without any or very little need for data conversion.

The invention supports both tightly integrated, high-performance Data Warehouses and loosely coupled, disparate dimensional and semi-dimensional data sources in the same analytical environment.

The terms "first" and "second" only serve to establish an understanding of distinct separate data sources, databases and dimensional models and should in no way be confused with a certain desired functional chronology.

A dimensional model comprises a number of dimensions.

According to the invention, the methods, the hybrid model and the graphical editor may advantageously be applied for performing so-called business intelligence, i.e. for intuitive extraction of key data from databases related to e.g. business and management purposes.

Business intelligence is well-described within the art and may be regarded as the use of strategic information produced by and captured within business processes to enhance business operations.

Further, meta-data may also be referred to as hybrid meta-data in the terms of the invention. When, as stated in claim 2, said at least one set of further meta-data (FUMD) refers to both said first set of meta-data (DMMDl) and said second set of meta-data (DMMD2), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 3, said first set of meta-data (DMMDl) refers to said first database (DB1) comprising a first data source (DS1) and at least one associated set of first fundamental set of meta-data (FFMD)

whereby said second set of meta-data refers (DMMD2) to said second database (DB2) comprising a second data source (DS1) and at least one associated set of second fundamental set of meta-data (SFMD), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 4, said at least two dimensional models (DM1, DM2) are established as a collection of conceptually related entities, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 5, said entities comprise levels (L) or fact groups (FG), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 6, at least one of said dimensional models (DM11, DM2) is established by means of dimensional modeling tools, a further advantageous embodiment of the invention has been obtained.

Applicable dimensional modeling tools may e.g. comprise ERWLN by Computer Associates.

When, as stated in claim 7, at least one of said at least one first dimensional model (DM1) and said at least second dimensional model (DM2) is established by means of an associated dimension modeler (DIMMOD), a further advantageous embodiment of the invention has been obtained. A dimension modeler may be regarded as a tool provided to the user facilitating the establishment of dimensional models of selected data sources. The dimensional modeler may e.g. comprise a separate development tool or it may preferably be integrated in a hybrid dimension software.

Such hybrid dimension software may e.g. comprise a graphical user interface (GUI) and graphically interfaced selection means for selecting at least two databases to be comprised in a hybrid dimensional model.

When, as stated in claim 8, said dimension modeler (DIMMOD) comprises means for analyzing the structure of a selected database, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 9, at least one of said at least one first dimensional model (DM1) and said at least second dimensional model (DM2) is established manually by a user, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 10, said at least one of said at least one first dimensional model (DM1) and said at least second dimensional model (DM2) is established wholly or partly automatically by computer-implemented algorithms, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 11, said method comprises the steps of

-establishing or retrieving a hybrid dimension identifier ID

-establishing a reference to the databases (DB1, DB2) to be comprised in said hybrid dimensional model (HDM),

-including at least one dimensional model (DM1, DM2) of the databases (DB1, DB2) in said hybrid dimensional model (HDM), said dimensional models (DM1, DM2) each comprising dimensions (Dl, D2), said dimensions (Dl, D2) comprising levels (LI, L2, L3, L4, L21, L22), -identifying at least one equivalent level (L2, L22) of a dimension (Dl, D2), -providing a new hybrid dimension (D12) comprising said at least one equivalent level (L2, L22) of at least two dimensions (Dl, D2) comprised in at least two different dimensional models (DM1, DM2) and at least one set of further meta-data linking at least one level (LI, L2, L4; L21) with said at least two dimensions to one equivalent level (L2, L22),

a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 12, said at least one set of further meta-data (FUMD) defines at least one hybrid dimensional model (HDM), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 13, the at least one set of further meta-data (FUMD) refers to at least two equivalent levels (1615, 1616) of at least two different dimensions (1617, 1618), said at least two dimensions (1617, 1618) being comprised in at least two different dimensional models, a further advantageous embodiment of the invention has been obtained.

Moreover, the invention relates to a computer-implemented hybrid dimensional model comprising at least two dimensionally modeled databases (DB1, DB2) with

at least one first dimensional model (DM1) referring to said at least one first database (DB1) by means of a first set of meta-data (DMMDl), said at least one first dimensional model (DM1) comprising at least one dimension,

at least one second dimensional model (DM2) referring to said at least one second database (DB2) by means of a second set of meta-data (DMMD2), said at least one second dimensional model (DM2) comprising at least one dimension at least one set of further meta-data (FUMD), said at least one set of further meta-data (FUMD) referring to at least one of said first set of meta-data (DMMDl) and said second set of meta-data (DMMD2).

When, as stated in claim 15, at least one set of further meta-data (FUMD) refers to both said first set of meta-data (DMMDl) and said second set of meta-data (DMMD2), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 16, said first set of meta-data (DMMDl) refers to said first database (DBl) comprising a first data source (DSl) and at least one associated set of fundamental meta-data (FFMD) and

said second set of meta-data refers (DMMD2) to said second database (DB2) comprising a second data source (DSl) and at least one associated set of second fundamental meta-data (SFMD), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 17, said at least two dimensional models (DM1, DM2) comprise a collection of conceptually related entities, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 18, said entities comprise levels (L) or fact groups (FG), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 19, said databases comprise ERP systems, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 20, said hybrid dimensional model facilitates queries to at least two different databases (DBl, DB2), a further advantageous embodiment of the invention has been obtained. When, as stated in claim 21, the at least one set of further meta-data (FUMD) refers to at least two equivalent levels (1615, 1616) of at least two different dimensions (1617, 1618), said at least two dimensions (1617, 1618) being comprised in at least two different dimensional models, a further advantageous embodiment of the invention has been obtained.

Moreover, the invention relates to a method of performing metric calculations based on a hybrid dimensional model according to the claims 14-21, said method comprising the steps of

establishing at least one consolidation database (2030),

copying at least one subset of facts from said at least one first database (DBl) to at least one fact table (2032) in said at least one consolidation database (2030),

copying at least one subset of facts of from said at least one second database (DBl) to at least one further fact table (2042) in said at least one consolidation database (2030),

copying at least one subset of dimensions from said at least one first database (1910, DBl) to at least one dimension table (1931) in said at least one consolidation database (2030),

copying at least one subset of dimensions from said at least one second database (1920, DB2) to at least one dimension table (1931) in said at least one consolidation database (2030),

performing said metric calculations on the basis of the facts comprised in said at least one fact table (2032) and said at least one dimension table (1931),

outputting said metric calculations to at least one result table (2051). Moreover, the invention relates to a method of performing metric calculations based on a hybrid dimensional model according to the claims 14-19, said method comprising the steps of

each of said at least two dimensionally modeled databases comprising at least one set of dimensions referred to by associated meta-data,

exchanging dimensions between said at least two dimensionally modeled databases so that at least one of said at least two sets dimensional models (1911, 1921) comprises a subset of dimensions (1912, 1922) borrowed from the other dimensional model,

performing metric calculations by means of application logic associated with said at least two databases (1910, 1920) on the basis of said set of dimensions (1911, 1921),

outputting said metric calculations to at least one result table (2051).

When, as stated in claim 24, the dimensions of the at least one first database (1910) and the at least second database (1920) are identified by said further set of meta-data (FUMD), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 25, said metric calculations are based upon user request, said user request being performed according to the terms of said hybrid dimension model, a further advantageous embodiment of the invention has been obtained.

Moreover, the invention relates to a graphical user interface (GUI) comprising

graphically interfaced selection means for selecting at least two databases to be comprised in a hybrid dimensional model,

graphically interfaced means for displaying the levels (1615, 1616) of said at least two dimensions (1617, 1618) selected from different dimensional models , graphically interfaced means for combining the selected databases into one single hybrid dimensional model (1710) comprising at least one hybrid dimension (1717), said at least one hybrid dimension (1717) comprising at least one level (1615, 1616) referring to at least two individually modeled data sources (1102, 1103).

When, as stated in claim 27, the graphical user interface comprises means for establishment of dimensional models of selected data bases, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 28, the at least two selected databases are individually and dimensionally modeled, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 29, at least one of the data sources comprises at least one dimensionally modeled data warehouse, a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 30, at least one of the data sources comprises at least one OLAP system (1102), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 31, at least one of the data sources comprises at least one electronic spreadsheet (1103), a further advantageous embodiment of the invention has been obtained.

When, as stated in claim 32, at least one of the data sources comprises at least one relational or flat-file data source obtained via the Internet, a further advantageous embodiment of the invention has been obtained. When, as stated in claim 33, the graphical means for combining the selected database into one hybrid dimensional model is established according to the method of any of the claims 1-14.

Moreover, the invention relates to a data carrier for implementation of the computer implemented method according to any of the claims 1- 13 on a computer.

A graphical interface combines multiple data sources into a hybrid model. The hybrid model provides simple graphical tools for the user in order to choose the desired data sources and the relations between data sources as defined above and business intelligence analyses based on combinations of data from the chosen data sources. The graphical interface handles the different data sources in an integrated manner so the user does not have to concern himself with the origin or type of data used in business intelligence analyses.

The invention relates to a computer-implemented method of merging at least two dimensionally modeled databases (DBl, DB2).

The figures

The invention will now be described with reference to the drawings, in which

fig. 1 illustrates the principles of dimensional modeling figs. 2 and 3 illustrate the basic principles of the invention fig. 4 illustrates the establishment of a specific hybrid dimension according to an embodiment of the invention, fig. 5a and 5b illustrate a set of hybrid dimension defining meta-data, figs. 6a-6f illustrate the references between meta-data, figs. 7-18 illustrate an actual embodiment of the invention, and fig 19 illustrates a simple, distributed calculation according to a hybrid model, fig. 20 illustrates a remote calculation according to a hybrid model, and fig. 21 illustrates a computer which may be applied to carry out the invention.

Basically, the invention refers to data sources which are dimensionally modeled.

Fig. 1 illustrates the principles of dimensional modeling.

A database DBl comprises a data sourceand source describing data, i.e. the so-called meta-data.

The database DBl may e.g. comprise a single uniform database managed by a standard ERP-system or it may e.g. comprise a so-called data warehouse.

Evidently, the database may comprise several other applications within the scope of the invention.

Basically, the data of the database DBl are represented by a relational database.

In order to facilitate conceptual understanding of the database DBl, a further layer of meta-data are applied, dimensional modeling meta-data DMMD. Together with the database DBl defining meta-data, the dimensionally modeled database DMDB defining meta-data constitutes a dimensional model.

Dimensional modeling is a well-known technique applied for modeling of databases for analytical applications, such as Business Intelligence methods.

Dimensional modeling has several advantageous characteristics: -It is subject-oriented instead of storage-oriented,

-Data are divided between dimensions holding categorical data like Customer,

Products and Time, and facts which are business measures like sales, quantities, profits, stocks, etc.

-Relationships between dimensions only exist through the observational behavior of facts.

Dimensional models are described e.g. by Ralph Kimball. Preferably, dimensional modeling meta-data DMMD within the terms of the invention refer to the underlying database and the meta-data are applied for construction of a more intuitive reference to the underlying database.

Dimensionally modeled meta-data may typically refer to the underlying data source in terms of so-called dimensions. The dimension may again be "subdivided into dimension units or dimension levels". According to the invention, the terms dimension unit and dimension levels are used interchangeably and both terms will be applied throughout the description.

The invention offers a simple, intuitive and highly effective way of combining two or more data sources regardless of type (relational databases, object-oriented databases, spreadsheets, flat files) in a hybrid project, defining relations and handling them in an integrated way with no need for programming.

In order to use data sources in a hybrid project, they have to be accessed through dimensional models, modeled or auto-modeled. A multi-dimensional model may be regarded as a collection of meta-data definitions that identify dimensional units and facts in the data source, group together dimensional units in dimensions and establish hierarchical relations between dimensional units. Units may e.g. also be referred to as levels or attributes. The meta-data definitions can be modeled by a user, resulting in a so-called modeled project, or it can be generated automatically at a basic level, resulting in a so-called auto-modeled project.

Users can define hybrid reports in a hybrid project through the graphical interface. A hybrid report is a business intelligence analysis based on data from one or more of the selected data sources.

The whole hybrid project can be created within a very short time, and the structure of the hybrid reports can be saved in the hybrid meta-data database for later use. Data stored by hybrid projects in hybrid meta-data refer to the location of the selected data sources and the relations defined by the user between these data sources and the logical structure of the hybrid reports. No physical data are moved from their original data source location in order to store a hybrid project.

When running a hybrid report, the final result is a temporary table holding together all data needed to show the graphical layout required by the user. If needed, data are stored in different data sources, in which case transportation of data to the destination database is required in order to get the final temporary table with the report results. The invention includes methods of automatically deciding which data need to be transported, how to filter them and which destination database to use in order to minimize the report run time and the amount of data transported from their original data sources.

Dimensional models have limitations over relational, object-oriented and object- relational models. However, modeling techniques that cannot be expressed as dimensional models are seldom suitable for Business Intelligence analysis because they are generally too complex. But even in complex databases, some parts may usually fit into a dimensional model. So even though a database model may not be expressed in a dimensional model, parts of it may be suitable and useful for dimensional Bl analysis.

For the remainder of this document, we shall consider dimensionally modeled databases, DMDB, and by a DMDB we mean:

1. Any database that has been fully or partly modeled by using a dimensional model. The database may physically be a relational database, a multidimensional cube or a structured flat file or any other source which has a table-structure. A technical definition of the dimensional structure of the database, called DMMD (Dimensional Model Meta-data), must be associated with the database for the remainder of this document. For a multidimensional database, e.g. a cube, the meta-data are usually part of the database, i.e. the database structure itself expresses a dimensional model, and hence automatically qualifies for a DMDB. For other databases, distinct meta-data are usually required. 2. By DMDB, we mean a physical database and associated meta-data defining the dimensional model together with a physical mapping from the dimensional model to the physical database.

3. For a non-modeled database, it is possible to automatically generate one simple dimensional model for each table in the database: For each attribute in the table, a dimension is created containing exactly one dimensional unit, corresponding and mapped to the said attribute. In addition, for each attribute in the table, a fact is created corresponding and mapped to the said attribute. Each fact is modeled into a function of all the dimensional units in the table, i.e. all other attributes in the table. This leads to a rude and dimensionally flat table model, but for many purposes this approach will yield value to a user. By using this approach, virtually any database will become subject to Bl analysis, although the quality of analysis is probably lower than on a modeled database. Also, the tables in the database are not linked to each other. Therefore, for the remainder of this document, data that can be treated as table-oriented, structured data, will be considered a DMDB, albeit the dimensional model may have varying quality.

It is important to point out that the definition of a dimensional model may be saved in many different formats. The interpretation of a dimensional model may have variations, too, with respect as to how data are retrieved from a database by using a dimensional model depending on the tool used to interpret the model.

When a user has access to multiple DMDBs, he may occasionally want to perform calculations that involve data from two or more DMDBs. The problem faced by the user is that these DMDB sources are not integrated and hence, it is not possible to use a standard Bl tool to analyse data across these sources. If, for example, one source is the enterprise Data Warehouse and another DMDB is a customer segmentation in a spreadsheet, the user faces a technical challenge if he wishes to analyse sales per customer segment when the sales measures are stored in the Data Warehouse. Usually, the choice is between loading the customer segmentation into the Data Warehouse, thus involving the IT department and one or more IT specialists, or to integrate the data using the tools at hand, such as a spreadsheet application or even a desktop database. Spreadsheets are not very suitable for joining data, and databases require specialist knowledge. Either way, the user loses the dimensional interpretation of his data, because data are exported out of their dimensional environment for integration in another tool.

So the preferred environment for analysis must be in the Bl tool using a dimensional model, but the analysis requires data to be analysed from different DMDBs and hence, there is no unified dimensional model in which the analysis may be performed. Therefore, the Bl tool will not know how to integrate these data. A normal approach to this would be to integrate the dimensional model of either DMDB into the other which would normally require data to be physically or logically located in a single database. By physical integration is meant the process of copying/replicating data into a single database. By logical integration is meant establishment of a link in one database to the tables in the other databases by means of facilities in the said database which will dynamically copy the relevant data from the linked tables whenever the linked tables are queried. Not all databases support this feature, and the end user seldom has privileges to create such a link.

In an environment in which all tables have been integrated into one database, the designer would then use his tools to redefine/enhance the dimensional model to incorporate the new data. Although this is a sound approach from a theoretical point of view, it is highly impractical in many real life situations, since it requires modifications of one of the databases - usually the Data Warehouse would be the place of choice to integrate the data - over which the user will usually have no direct control. I.e. the modifications require involvement of the IT department who may not prioritize the task because of its narrow scope from an enterprise point of view. Also the need for integration may have a very short time horizon due to the ad hoc nature of the task.

According to the invention, a new method of creation of a unified dimensional model is described and involves merging of dimensional models. Instead of building a unified dimensional model bottom-up using the databases underneath the DMDBs, the invention introduces a method of creating a unified dimensional model from different DMDBs by using the original DMDBs and information on how to integrate the existing models into a unified dimensional model, a so-called Hybrid Dimensional Model, HDM. By using existing structures, both on the model and the physical parts, the work of creating and maintaining a unified dimensional model is minimized and greatly simplified.

To create an HDM using two or more DMDBs requires the following steps:

1. Selecting the DMDB members for the HDM

2. Creating merged dimensions by identifying equivalent dimensions from the member DMDBs and logically merging the equivalent dimensional units into these dimensions.

Using this approach provides the user with a very robust model for creating an HDM on an as-needed basis, since the model will always be valid and usable, even if the user has not merged all dimensions which must logically be merged. Merging dimensions is the process of identifying equivalent dimensions and selecting a merge-level, i.e. a dimensional level is selected from each model. If, for example, each dimensional model contains a Customer dimension of business terms and these dimensions describe the same business term and the domain of customer IDs is equivalent for the two dimensions, then these dimensions may/should be merged (unified) into one single Customer dimension. In an example, the first Customer dimension is located in the Data Warehouse and holds information about customer ID, telephone, address, sex, and age, among other things. The other Customer dimension is from a DMDB based on a spreadsheet and holds the customer ID along with credit information. Since the customers are the same in both dimensions, they should be merged into a unified Customer dimension by using the dimensional unit CustomerlD from both DMDBs as the equivalent dimensional units to be merged.

This proces can be repeated any number of times between a dimension from a member DMDB and either a dimension from another member DMDB or a merged dimension in the HDM. DMDB sources are available to the user through a GUI (Graphical User Interface). Each data source is seen as an individual project. The appearance of DMDB sources is the same, regardless of type: Relational databases, object-oriented databases, flat files, spreadsheets, etc. The projects are structured in dimensions, each dimension having one or more dimensional units. The dimensional units come from the data source. The data sources can be modeled by the user, meaning that the user can define a meta-data definition of the data, including dimensional units grouped into dimensions, hierarchies for the dimensional units inside the same dimension, and facts. If the data sources are non-modeled by the user, they will automatically receive a meta-data definition which corresponds to basic modeling of the data source with each dimension having one dimensional unit and facts as described above. In this way, the user can handle both modeled projects and non-modeled database tables at the level of the graphical interface in an integrated manner regardless of the data source types. One or more of the projects defined above can be selected as members of a hybrid project. The user can define, at the level of the graphical interface, the relations between the member projects of a hybrid project. This is accomplished by identifying equivalent dimensional units from dimensions located in different member projects. Equivalent dimensional units are units referring to the same data, e.g. client code or product code. The identified equivalent dimensional units can be merged by the user at the graphical interface level. Merging means combining the two dimensions from different projects into one hybrid dimension containing all dimensional units from the two dimensions and a merged unit containing the two equivalent units. By merging two or more dimensional units from disparate dimensional models, the user is effectively creating a unified dimension containing a unified dimensional unit holding together the merged dimensional units while the rest of the dimensional units from the merged dimensions maintain their original relationships.

Fig. 2 illustrates the basic principles of the invention. Basically, a new access structure to the underlying data is established. The access structure benefits from the fact that the data in the underlying databases are accessed by means of already established meta-data instead of establishing a traditional new common meta-data format.

The illustrated database structure comprises a number of databases DBl, DB2,..DBn.

The illustrated databases DBl, DB2, DBn comprise (not shown) database-defining data, so-called meta-data.

Moreover, the illustrated DBl, DB2, DBn databases are associated with dimension- modeled meta-data, DMMDl, DMMD2, and DMMDn forming the dimensionally modeled databases DMMDl, DMMD2, and DMMDn.

Together with the databases DBl, DB2, DBn, the meta-data DMMDl, DMMD2, and DMMDn form dimensional models DM1, DM2, DMn, respectively.

The n dimensional models are "glued" together into one hybrid dimensional model HDM by means of further meta-data, the so-called hybrid dimensional meta-data HDMD. The hybrid dimensional meta-data HDMD refers to the already established dimension modeling meta-data, DMMDl, DMMD2, and DMMDn.

A hybrid dimension can hold two or more dimensions, each from a different member project of the hybrid project, from the units or from these dimensions, and one or more merged units. Each merged unit may contain two or more dimensional units, each from a different member project. The process of merging can be repeated for other pairs of dimensions from different projects resulting in more hybrid dimensions. A unit (dimensional unit DU or merged unit MU) from a hybrid dimension (HD) may also be merged with a unit (U) in a dimension (D) from another member project (a project different from the projects of all the dimensions already merged in the hybrid dimension). The effect of merging D and HD is that the dimension D from the member project is incoφorated in the hybrid dimension and, depending on whether the user has chosen DU or MU, either a newly merged unit is created from DU and U or the existing merged unit MU gets a new member dimensional unit, U. Merging is allowed between two dimensions or between one dimension and one hybrid dimension, respecting the rules defined above. The process of merging one dimension with another dimension or a hybrid dimension can be repeated as many times as desired. The information about the hybrid project is stored in a database in a hybrid metadata definition (hybrid meta-data is also referred to as further meta-data in the context of the application) and comprises information about the location of member projects and the relations between them as defined by the user through merging dimensions. The merging process does not lead to transport of the physical data from the member projects. Only the logic of the relations defined by the user is stored in the hybrid meta-data FUMD definition.

Further, the graphical interface handles all elements from the member projects in an integrated manner with those merged as described above. The user can build, in a transparent way, business intelligence analyses based on data from one or more of the member projects.

Running a 'hybrid' report, i.e. a business intelligence analysis based on data coming from different sources, implies carrying out a process with the final puφose of gathering all needed data in one location and showing the data in the graphical format required by the user. The location in which all needed data are gathered is a temporary table in a database dynamically chosen during the process. When the required data are from different data sources, they need to be transported from their original location to the final, dynamically chosen, destination database called the consolidation database.

There are several steps in the process of building the final temporary table used to display the report data. Several decisions are made along this process about what data to transport, where to transport them and which filtering to use in order to get a minimum amount of data transported so that a minimum network load and time required to run the report may be obtained.

Fig. 3 illustrates the above-described merger of dimensionally modeled databases DMDB1, DMDB2, DMDBn into one merged hybrid dimensional database HDM. The combination HDM of the dimensionally modeled databases DMDB1, DMDB2, DMDBn results in a number of dimensions which is lower than the complete number of dimensions in all databases DMDB1, DMDB2, DMDBn if at least one merged dimension has been created. This feature will be explained below.

Fig. 4 illustrates a specific example of a how dimensions of different dimensional models may be merged together into one merged dimension.

In fig. 4, two dimensions "Product", Dl and "Color", D2 have been established by two different dimensional models. This basically means that the dimensions may only be accessed on an individual basis and that the combined utilization of the knowledge contained in the dimension is not facilitated.

The Product dimension Dl comprises four levels, Product Class, LI, Product Group L2, Product Subgroup L3 and Product code L4.

The Dimension Color D2 comprises two levels, Color L21 and Product Group L22. The only equivalent dimension level is Product group LI and Product group L22.

Subsequently, a merged dimension D12 may be established by merging the levels L2 and L22 with Product Group L2L22, thereby "connecting" the dimension D2 level Color L21 into a merged Product dimension D12 comprising five levels, Product Class LI, the merged Product Group L2L22, Product Subgroup L3, Product code L4 and Color L21.

Fig. 5 is described below in the description.

Fig. 6 illustrates the basic principle of establishment of a meta-data link-based hybrid dimension according to the invention. Fig. 6a illustrates a dimensionally modeled database DBl. A dimensional model DM1 of the first database DBl is established by the dimensional model meta-data DMMDl. Data of the database DBl may now be accessed via the dimensionally modeled front-end, representing the data of database DBl in a conceptual user- perceivable way. Different reports may now be established in a relatively simple way with the puφose of investigating and revealing the status and dynamics of the data of the database DB 1.

A further database, DB2, is now provided and combined access to both databases DBl, DB2 is desired.

This access may be obtained by establishment of a so-called hybrid dimensional model. The hybrid dimensional model may be regarded as a link to both (or all) relevant databases DBl, DB2. Hence, the hybrid dimensional model refers to two (or several) databases including the database defining meta-data.

The data sources referred to by the hybrid dimensional model are principally independent data sources which may be modified individually.

Initially, in fig. 6B, a dimensional model DM2 of the second database DB2 is established by means of dimensional model meta-data DMMD2.

Now, in fig 6C, a hybrid dimensional model HDM may be established by means of a further layer of hybrid dimensional meta-data FUMD. The hybrid dimensional meta- data HMD are here illustrated as a separate overlaying structure of meta-data. The meta-data may evidently be applied as a separate overlay data-structure but it may also, obviously, be more or less comprised in e.g. one of the sets of dimensional model meta-data DMMDl or DMMD2, if the dimensional modeling tool applied for the modeling of the specific database allows it.

However, irrespective of the "location" of these data, the basic idea according to the invention is to reuse the already established dimensional modeling meta-data; here represented by dimensional model meta-data DMMDl and dimensional model metadata DMMD2.

Figs. 6D, 6E and 6F correspond fully to the above-described figs. 6A, 6B and 6C, in which the structure of the databases DBl and DB2 has been further unfolded, thereby illustrating a database, e.g. database DBl, according to the understanding of the invention, comprising a data source DSl and an associated set of database defining meta-data FFMD.

Likewise, database DB2 comprises a data source DS2 and an associated set of database defining meta-data SFMD.

The graphical editor

Figs. 7 to 18 illustrate an actual embodiment of the invention. The figures illustrate how dimensions may be merged into a hybrid dimension by means of graphically operated software routines according to one embodiment of the invention.

The illustrated dimension handling editor may e.g. be executed on a standard computer, e.g. a 1GHz Pentium processor.

The editor comprises a window similar to the one illustrated fig. 7, in which dimension-represented databases are illustrated on a node-tree basis.

A node represents a dimensional model of a database. In this case, the model is referred to as a project. The project is represented by a symbolic name, AC Storkøb detailløsning.

The project 70 comprises a number of members including Dimensions 71, Metrics 72, Filters 73 , Layouts, 74 and Reports 75. Metrics 72 represent an environment for establishment of data processing routines, i.e. the metrics related to the current project, here "AC Storkøb detailløsning" 70.

Filters 73 represent an environment for establishment of subsets of data, resulting in only the desired subsets of the dimensions and facts data being included in the metrics.

Filters 73, Layouts 74 and Reports 75 together represent an environment for establishment of the desired reports. A report may be defined by a layout defining how data are visualized and by a data filter defining a subset of the data to form the basis of the data elements of the layout (data element: dimensional unit/level or metrics).

The dimension group node 71 comprises a number of dimensions "Tid" 711, "Vare" 712", "Leverandør " 713, "Butikskampagne" 714, "Fasllekampagne" 715, "Butik" 716.

In fig. 7 the node "Vare" 712 is expanded and a number of dimension levels is illustrated as a number of dimension sub-nodes 718.

Turning now to fig. 8, a report has been established comprising three columns: "Kategori" 82, "Omsastning" 83 and the column 84 indicated by arrows on the figure.

The report comprises a number of members listed in rows. Hence, each member may be regarded as a concrete value of a dimension level.

The "Kategori" code e.g. refers to the category of a product and the associated field of the "Omsastning" column e.g. refers to the turnover related to this category defined by the product. The turnover may be calculated by predefined metric calculations associated with the dimensional model AC Storkøb detailløsning. The column 82 "Kategori" represents a level of the dimension "Vare" 712 in fig. 7.

The third column 84 relates to the category name, i.e. a category name perceivable to a relevant user of the system. In other words, the name of each member in column 84 is unambiguously associated with an ID, "Kategori", in column 82 as part of the definition of the level "Kategori".

Fig. 9 illustrates a further data source. The data source is associated with database defining meta-data. The illustrated database is a simple Microsoft Excel spreadsheet database file.

The illustrated database is not comparable with the already established dimensional model illustrated in fig. 7.

The illustrated spreadsheet comprises data arranged in four columns 90, 91, 92 and 93. Column 90 represents a "Kategori", 91 represents a "KategoriNavn", column 92 represents "Omsastning" and column 93 represents "OmsGruppe".

"Kategori" refers to a category, "KategoriNavn" refers to a category name and "OmsGruppe" refers to a grouping of categories.

The database of fig. 9 may represent data established by a user of the system, data provided by a supplier, etc.

As mentioned above, the already established dimensional model comprises the data level "Kategori".

However, the fourth column "OmsGruppe" 93 is not represented in the above- mentioned dimensionally modeled database illustrated in fig. 8. According to the illustrated embodiment of the invention, "OmsGruppe" 93 represents a customer- defined grouping of categories listed in "Kategori" 90. Basically, someone wishing to utilize the third column "OmsGruppe" 93 may now establish a dimensional model incoφorating the dimension "OmsGruppe", a hybrid dimensional model, by means of the above-mentioned dimension handling editor.

This may now be done by establishment of a common dimension comprising both "Kategori" and "OmsGruppe" in which a hierarchical relationship exists between the "OmsGruppe" and "Kategori", i.e. "OmsGruppe" is a parent of "Kategori".

The process of establishing a hybrid dimensional model will now be described below with reference to an embodiment of the invention implemented in the program Data Shop by the applicant.

Initially, a dimension handling editor according to one embodiment of the invention (here: Data Shop) comprises a graphical facility for creating a hybrid dimensional model referring to two or more data sources.

This hybrid dimensional model may subsequently facilitate access to the data sources referred to by the hybrid dimensional model.

The below-described hybrid dimensional model will refer to the data sources already comprised in the dimensional model "AC Storkøb detailløsning" and the spreadsheet database file illustrated in fig. 7.

Evidently, according to the invention, several dimensions of different dimensional models may be merged into a hybrid dimensional model. For illustrative puφoses, the hybrid dimensional model refers to only two dimensional models, i.e. a dimensional model of the first data source AC Storkøb detailløsning and a dimensional model of the spreadsheet database file illustrated in fig. 7.

In fig. 10, a window is applied for selection of data source. The illustrated adding facility comprises three data source describing parameters, data source name 101, data source type 102 and data source location 103. Evidently other data source describing parameters than the illustrated three may be applied within the scope of the invention.

The name of the new data source is OmsGrupper.xls, the type of data source is MS Excel and the location of the data source is under C:\ - Documents and S ettings\nh\Dokumenter\.

Turning now to fig, 11, the editor window 1101 illustrates the two data sources in two separate groups of data, the first group comprises a dimensional model "AC Storkøb detailløsning" 1102 of a first database and the second group of data is represented by a node 1103, KatOmsGrapper.xls.

Moreover, four nodes 1106, 1107, 1108, 1109 represent the four above-mentioned columns of the selected spreadsheet, i.e. Kategori 1106, Kategorinavn 1109, Omsastning 1107, and OmsGruppe 1108.

Turning now to fig. 12, the node 1210 of the database 1103 of fig. 11 has been selected by means of the dimension handling editor.

The editor performs an initial pre-check of the second data group, and as illustrated in the data source representing window 1201, the pre-check results in an assumption of four dimensions being contained in the new data source to be added, i.e. the dimensions Kategori 1211, KategoriNavn 1214, Omsastning 1212, and OmsGruppe 1213.

The editor now facilitates rearrangement of the nodes to form the desired dimensional model. This may e.g. be performed by mouse operations. Evidently, several other model-manipulating procedures which are more or less user-friendly and more or less automatic may be applied within the scope of the invention. Basically, according to a preferred embodiment of the invention, a graphical tool should comprise means for specifying dimensions and their candidate levels and means for specifying their mutual relationships, i.e. a hierarchical level within the dimensions (parent-child relationships).

Basically, the above-described procedural step is the one described with reference to the process of establishing a dimensional model of a selected data source in figs. 6a to 6b. Evidently, if a suitable dimensional model of a selected data source to be included in a hybrid dimensional model has already been established, there is no need to create a new dimensional model of this data source.

In fig. 13, the user has chosen to rearrange the new dimensional model of the spreadsheet table so that the model contains three dimensions only, i.e. the dimension "Vare" 1311 (The "Kategori" 1211, dimension of fig. 12, has now been renamed dimension "Vare" to fit into the new hybrid dimensional model), the dimension "Kategorinavn" 1317 and the dimension "Omsastning" 1312. The column of the new data source "OmsGruppe" 1315 has now been included in the dimension "Kategori" 1311 as a socalled level together with the level "Kategori" 1316.

The position of "OmsGruppe" above "Kategori" shows that a hierarchical relationship exists between these levels in which "OmsGruppe" is the parent of "Kategori".

Now, an improved dimensional model of the second data source has been established.

In fig. 14, a name of the hybrid dimensional model of the first data and the second data source is defined by means of a suitable 1400 editor window. The name of this hybrid is defined by the user as "AC Storkøb (extended)".

Fig. 15 illustrates a further editor window 1500.

Again, it should be emphasized that the specific merger of the two data sources may be performed in several different ways within the scope of the invention. The editor window 1500 basically comprises two areas, a first area representing the available dimensionally modeled data source (=projects) at the left-hand side and a second area representing the selected projects about to be merged, i.e. the members of the hybrid dimensional model. The illustrated second area now comprises two projects, "AC Storkøb detailløsning" 1511 and "OmsGruppe" 1512.

Fig. 16 illustrates a further editor window 1600.

This editor window is applied for user activation of a merge operation.

The left-hand side of the editor comprises an area 1612 in which the dimensions of the first selected project "AC Storkøb detailløsning" may be selected. The right-hand side of the editor comprises an area 1613 in which dimensions of the second selected project "OmsGruppe" 1512 may be selected.

The nodes may be expanded (+) and collapsed (-) in a well-known tree-structure manner.

In the illustrated window, the user has selected the level node "Kategori" 1615 of the dimension Vare 1617 in the project "AC Storkøb detailløsning" at the left-hand side.

At the right-hand side 1613 of the editor window 1600, the user has marked the level "Kategori" 1616 of the dimension "Vare" 1618 in the project "OmsGruppe".

By activation of the ' -" button, the selected dimensions in the two projects are merged at the specified level, i.e. "Kategori" and "Kategori", and by further activation of the "OK" button 1620, the dimensions are effectively merged. Fig. 17 now illustrates the new hybrid model (=hybrid project) in the editor window 1700. The hybrid model 1710 comprises a merged dimension "Vare" 1717 and the merged dimension "Vare" 1717 includes the level of the spreadsheet data source "OmsGruppe" 1720 and the level of the Project "AC Storkøb detailløsning" 1705.

Hence, the merged dimension "vare" 1717 refers to the data source(s) already comprised in the dimensional model "AC Storkøb detailløsning" and the spreadsheet database file illustrated in fig. 7

Now, the merger of the dimension level "Kategori" 1615 and the level "Kategori" 1616 has been completed and the user may access the relevant data of both the first data source "AC Storkøb detailløsning" and the second data source KatOmsGrupper.xls by means of the new dimensional model "AC Storkøb (extended)" 1710 comprising the level 1720.

Hence, the new dimensional model "AC Storkøb (extended)" 1710 comprises data not referred to by the previous dimensional model, "AC Storkøb detailløsning". Please note that "AC Storkøb detailløsning" may still be accessed by means of the project node 1705.

Subsequently, a query on the hybrid dimensional model may now be performed by the already established reporting tools, etc. on the basis of the already established query context of the previously established dimensional model 1705.

Fig. 18 illustrates a new report, including the four columns, the original column "Kategori" 182, the original "KategoriNavn" 183, the original "Omsastning" 184, and the newly added column "OmsGruppe" 181.

It should be noted that the new report is extremely easy to establish and that it refers to both the data sources of the hybrid dimensional model and to changes made in the individual data sources which will be reflected in the illustrated report.

Basically, a conversion of data has been performed. Metric calculation

A further embodiment will be described in the following.

The further embodiments of the invention relate to applicable ways of obtaining metric results from different dimensionally modeled databases on the basis of the hybrid dimensional model.

Figs. 19 and 20 illustrate further embodiments of the invention.

A first embodiment of the invention dealing with evaluation of data contained in a hybrid dimension data source is illustrated in fig. 19.

According to both of the described embodiments, metric calculations are performed on the basis of a hybrid dimensional model referring to two data bases and analytical queries initiated by data source system or a user of the system. Evidently, other numbers of databases may be applied within the scope of the invention.

The illustrated system comprises a first dimensionally modeled database 1910 and a second dimensionally modeled database 1920.

The first database comprises dimension (data) 1911. The dimensions are referred to by meta-data (not shown).

Moreover, the first database comprises a fact table 1913. The fact table 1913 is accessed by a metric calculation node 1914. This calculation node is applied for processing data retrieved from the fact table and for dimensions retrieved from the dimensions 1911 e.g. according to externally established analytical queries; i.e. reports, etc.

The second database comprises dimensions (data) 1921. The dimensions are referred to by associated meta-data (not shown). Moreover, the second database comprises a fact table 1923. The fact table 1923 is accessed by a metric calculation node 1924. This calculation node is applied for processing data retrieved from the fact table and for dimensions retrieved from the dimensions 1921 e.g. according to externally established analytical queries; i.e. reports, etc.

Moreover, the dimensions 1911 and 1921 comprise (optional) local copies of dimensions stored by the other databases referred to by the hybrid dimensional model and established with the puφose of facilitating metric calculations in the nodes 1924 and 1925 on the basis of facts comprised in the fact tables 1913 and 1923, respectively.

Hence, the dimensions 1911, and 1921, both comprise a subgroup of dimensions 1912 and 1922 representing the dimensions needed from the other database 1910, 1920. The exchange of dimensions between the dimensions 1911 and 1921 is controlled by a dimension transportation node 1900.

This feature enables internal metric calculations by the application logic of the database control program. Therefore, the output of the two databases 1910 and 1920 may be preprocessed in order to save further external metric calculations.

It should be noted that consolidation of the results 1934 in a third consolidation database 1930 may be avoided if the system performs metric calculations in only one of the databases 1910, 1920. In general, a consolidation calculation is needed whenever two or more metrics are being calculated in order to align the temporary results of the individual metric calculations.

According to the illustrated embodiment of the invention, transportation of the dimension data is generally preferred over transportation of fact tables due to the fact that tables typically comprise the largest amount of data compared to the dimension data. Evidently, transportation of data from the fact tables may be preferred under certain circumstances. Such circumstances may e.g. be indicated by statistics.

The processing node 1914 and 1924 store processed data in worktables 1932 and 1933, respectively, from a consolidation database 1930. Moreover, dimensions are transported from the dimensions 1911 and 1920 of the databases 1910 and 1920 via a dimension transporting node 1925. The transported and merged dimensions are comprised in a hybrid dimension model 1931 associated with the evaluation database 1930.

According to the illustrated embodiment, only the dimensions needed for establishment of the final consolidation of an analytical query are transported.

The hybrid dimensions 1931 are processed (merged) on the basis of resulting data from the worktables 1932, 1933 via an evaluation node 1934 and the system outputs the resulting data in a worktable 1935.

Again, it should be noted that the unique prerequisite dimension transportation node 1900 facilitates performance of all metric calculations internally by the application logic of the databases 1910, 1920, thereby avoiding superfluous and time/memory consuming external data processing. This "pre-processing" is facilitated by the exchange node 1900 since the metric calculation performed in e.g. the processing node 1914 may be performed on data present in the database 1910 or it may be used for processing queries on dimensions partly referring to both the databases 1910, 1920, and vice versa.

In a simple distributed calculation, the fact tables 1913 and 1914 are distributed in more than one database 1910, 1920. For the sake of utilizing dimension tables across databases, the involved dimensions (data) are transported to the calculation database before the metric is calculated. The data transported by the node 1900 are called borrowed dimensions. The method is preferably applied when dimension data for a metric calculation are stored in more than one database and when creation of temporary storage locally in the databases, here the databases 1910 and 1920, is allowed.

- All databases today have a query language which permits not only retrieval of data from the database but also complex calculations of the data. Using the query language built into the database has some significant advantages compared to retrieval of raw data and to carrying out calculations locally: o Usually data resulting from a calculation are much less comprehensive than the data involved in the calculation. Thus, by performing the calculation in the database and not locally, the network traffic will be greatly reduced. In most environments, the network is the eye of the needle with respect to performance. o Most databases have been tuned to the data stored in the database. The tuning comprises table indexes, field formatting and various other methods of tuning the database to a specific schema. Thus, by performing the calculations in the source database of the involved data, it is ensured that any "clever" handling of the data will be utilized.

In any environment, the largest volume of data involved in a calculation will primarily determine the total performance. In order to handle large amounts of data, a database server of an appropriate size must be used. Thus, for most practical puφoses, it can be assumed that the database server holding a database has been proportioned to the volumes of data in the database, meaning that large databases are kept on high-performance servers. This implies that when accessing large volumes of data, it is safe to assume that the server on which the data are located is proportioned to handling large volumes of data, which is not the case with a general client PC. Therefore, calculation of the data is best performed in the native database rather than by copying the data to carry out the calculations locally. It should be noted that the illustrated embodiment of the invention shows evaluation by two individually modeled dimensional databases 1910, 1920. In general, any number of databases may be applied within the scope of the invention.

According to an advantageous embodiment of the invention, the transportation of dimensions between the dimensions 1911, 1921 should be minimized in order to optimize processing and minimize storage consumption.

A further embodiment of the invention dealing with evaluation of data contained in hybrid dimension data source is illustrated in fig. 20.

This embodiment addresses e.g. hybrid dimensional models referring to databases that do not allow the creation of tables and hence do not support storage of temporary results and borrowed dimensions. Furthermore, as dimensions needed for metric calculations are stored in more than one database, it is necessary to transport temporary, partial copies of the fact tables to a consolidation database 2030. For example, some standard ERP systems use native databases in which creation of tables is not allowed or desired.

The illustrated system comprises two dimensionally modeled databases 2010 and 2020. The databases 2010 and 2020 comprise associated dimensions 2011, 2021.

Moreover, the database comprises fact tables, 2012, 2022. The fact tables are transported (copied) to external partial fact tables 2032, 2042 via a fact transportation node 2025.

Moreover, the relevant dimensions are transported to a hybrid dimension 2031 via a dimension transportation node 2015. Only dimensions relevant to the intermediate calculations and to final consolidation applying descriptors are copied.

The above-mentioned transportation nodes 2015 and 2025 are both external to the consolidation database 2030. However, according to a further embodiment of the invention, the transportation nodes may be comprised in the physical consolidation database if so desired and if the databases possess the necessary capabilities to perform these operations.

Data may be output by the partial fact tables 2032, 2042 to the metric calculation nodes 2033, 2043, respectively, and processed with the hybrid dimensions 2031 (defined by meta-data not shown), and the calculated data are output to worktables 2034, 2044. The worktable data from worktable 2034 and worktable 2044 are processed via a processing node 2050 together with the hybrid dimension data 2031, and the resulting data are output to resulting data table 2051 of the consolidation database 2030.

Basically, the illustrated embodiment represents remote handling of the databases 2010, 2020 in the sense that queries across the databases 2010 and 2020 are performed without modification of the database-defining meta-data of the databases 2010 and 2020.

The illustrated embodiment is applied when there are more than one individually modeled dimensional database, here dimensionally modeled databases 2010 and 2020, and the data sources do not allow creation of tables. All dimensions and facts relevant to the desired calculations are copied using the filter on the consolidation database prior to the metric calculations. All metric calculations may be done in the consolidation database 2030, except if all relevant dimensions for a metric calculation are located in the same database as a fact table, in which case the metric calculation is performed in the source database and only the result is transported to the consolidation database.

Again, it should be noted that the illustrated embodiment of the invention shows evaluation by two individually modeled dimensional databases 2010, 2020. In general, any number of databases may be applied within the scope of the invention. It should also be noted that numerous variations within the same theme may be applied within the scope of the invention, i.e. perforating metric calculations in the involved databases controlled by a hybrid dimension model referring to at least two of the involved individual dimensionally modeled databases.

Hence, the two above-described basically different embodiments involving both remote and local metric calculations may also be combined, thus offering remote evaluation of data with respect to some of the involved databases and local evaluation of data with respect to other databases of the complete hybrid dimensional model.

Turning now to examples of output routines applicable within the scope of the invention.

When running a hybrid report, the final result is a temporary table holding together all data needed to show the graphical layout required by the user. If needed, data are stored in different places (different data sources) and some of the data are transported to the destination database in order to get the final temporaiy table with the report results. The invention includes methods of automatically deciding which data need to be transported, how to filter them and which destination database to use in order to minimize the report run time and the amount of data transported from their original data sources.

Hybrid reports consist of a layout and a filter.

The layout determines the graphical appearance of the retrieved data, e.g. a cross- tabulated grid with the rows and columns of a spreadsheet-like representation, and the metric or metrics to be shown in the spreadsheet cells. Many graphical appearances are possible, like pie charts, bar charts, scatter diagrams, table format, etc. The layout definition contains specifications of the type of visualization and how to map the report results to the graphical components. A metric can be a simple aggregate function (like SUM, AVG, MIN) applied to a fact in a data source, or a mathematical expression using other metrics. The filter allows the user to specify conditions, which restrict the volume of analyzed data. The invention includes methods which use this filter to optimize the amount of data processed or transported by the SQL requests generated in order to get the report data.

The metrics required by the graphical layout of hybrid reports are calculated through SQL statements which store the results in temporary tables. One step includes generation of an 'abstract' SQL which contains references to the elements used in the SELECT, FROM, WHERE, GROUP BY, ORDER BY, HAVING parts of the SQL. The 'abstract' SQL is based on the required result structure (from the graphical layout of the report), on the report filter conditions applicable to the metric and on the joins needed between the tables used by the SQL. Subsequent to all metrics having been processed in this way, an execution plan is built on the basis of the data needed by each metric calculation, and the next step is to build the specific SQL statements based on the 'abstract' SQL structures and the execution plan.

The method generating the execution plan makes decisions related to data transport between data sources or from a data source to the Consolidation database as detailed below. Each concrete SQL statement used in running reports requires all needed data to be present in the same data source. If an 'abstract' SQL requires elements coming from different projects (data sources), transportation of some of the data from their original location to a temporary table in the destination data source will be required. If the metric uses a fact, the destination data source at which the temporary table holding the metric result is created, will - whenever possible - be the one holding the fact used by the metric. The reason is that tables holding facts (fact tables) usually hold large amounts of data compared to the dimension tables. Transporting large amounts of data from one data source to another requires time and temporary storage space and one goal of the invention is optimising these parameters. The decisions of which destination DB to choose, whether to copy borrowed dimensions where to perform the metric calculation and whether to copy fact tables, are based on these determinators: • Source database, DB, type: Is temporary storage allowed? (yes/no)

• Borrowed dimensions: Are borrowed dimensions needed for the metric calculation? (yes/no)

• More than one metric: Does the analytical query, OLAP query, involve more than one metric? (yes/no)

• Different source databases for metrics: If the above is true, are the involved metrics situated in two or more different databases? (yes/no)

Based on the above decisions and determinators, optimization is used to determine how to perform each individual metric calculation.

If the user has rights to create tables for the data source holding the fact, the result of the metric calculation will be stored in a temporary table in the same data source. Otherwise (read-only data source), the result must be transported to another destination. The destination can be another data source at which the user has the right to create tables or the Consolidation database. If possible, the aggregation function is applied before transporting the data from its original data source as part of the optimisation. However, there are cases when data can only be aggregated after the transport phase, for example if the fact data source is read-only and if some of the required dimensions come from different data sources.

In the process of generating 'abstract' SQL for calculation of metrics or transport of data from their original data sources, optimizations are done whenever possible by using filters. The criteria in the report filter are evaluated for every 'abstract' SQL and each applicable filter element is used to limit the amount of data processed or transferred.

The specific SQL statements are generated after the execution plan is created. The steps include getting the names of tables and columns involved, formatting table names, column names and values according to the data source in which they are used. Depending on where a column is used (its original data source or another one), the appropriate table name is used. Fig. 21 illustrates a computer system comprising a computer 1801 and a monitor 1803. The monitor features a graphical user interface GUI for illustrating the execution of program code. The computer is controlled by computer input devices, 1805 and 1805.

The illustrated computer 1801 may be applied for execution of the above-described dimension handling graphical editor.

The computer comprises means for reading data carriers (not shown), such as internal and external memory means. The data carriers may comprise the necessary database and dimensional model defining meta-data and the data sources. Moreover, the data carriers may comprise the software routines enabling the computer's data processing means to execute the above-described dimension handling graphical editor and the hybrid dimension management routines, also as described above.

Fig. 5a and fig. 5b illustrate an example of a specific implementation of meta-data linking two dimensions of different dimensional models into one merged dimension. The examples refer to the hybrid dimensional model "AC Storkøb (Extended)" created according to the figs. 7-18.

Fig. 5a is a general relational description graph of the meta-data defining the hybrid dimensional model AC Storkøb (Extended).

Fig. 5b illustrates a set of meta-data establishing the above-mentioned hybrid dimensional model "AC Storkøb (Extended)".

An explanation of the set of meta-data will now be given with reference to the two figures.

Initially, it should be emphasized that numerous meta-data setups may be applied within the scope of the invention and the below-detailed description of one of the applicable setups is only one option. The first table HYBRID_MODEL, 501 of fig. 5a and 5b identifies the hybrid model with the name "AC Storkøb (Extended)" and the name is associated with a hybrid model ID, HModel D = 2.

The second table HMODEL_PROJECT, 502, defines a number of projects comprised in the "AC Storkøb (Extended)".

HMODELJPROJECT, 502 is applied for determination of the projects comprised by HYBRID_MODEL, 501. Each row of fig. 5b 's HMODELJPROJECT, 502 represents a member project.

In this embodiment, the projects are determined by the parameters HModel_ID, HmodelProject_ID, Project_ID, Project_Type and DB_Alias.

Both projects (rows) show that the projects are associated with the same hybrid model, i.e. the above mentioned AC Storkøb (Extended) by the setting HModel_ID = 2.

This relation is indicated in fig. 5a by the relation 501 A.

Moreover, the parameter HmodelProject_ID identifies an internal identification of the involved projects, here 0 and 1.

The parameter Project_Type identifies the nature of the data source of the project. In the illustrated embodiment, the Project_Type value 0 of the first project indicates that the first data source is a high-quality modeled data source, here a well defined OLAP system, and the associated Project_LD = 3 refers to a specific OLAP system. The DB_Alias field is empty (or null) due to the fact that the OLAP system is completely located by the Project_ID. The ProjectJType value of 0 determines that a relation 505A exists between the HMODELJ ELATION 505 and an OLAP meta-data project 5092.

The ProjectJType value 1 of the second project indicates that the data source is a so- called table project, or more specifically a database table associated with a more primitive dimensional model possibly modified by a user. Here, the DB_Alias identifies the data source as the electronic spreadsheet KatOmsGrupper.xls and as the data to be included in the model is a table called "Omsgruppe".

The ProjectJType value of 1 determines that a relation 505B exists between the HMODEL_RELATION 505 and MyCompDlMENSIONAL JNIT 507.

Basically, the ProjectJDD potentially supplemented by the DB_Alias in table 507 represents a reference to the data sources included in the hybrid dimensional model.

The third table Hdimension 503 defines the hybrid dimension to be included in the hybrid model AC Storkøb (Extended). The dimensions have the parameters Hdimension_ID and Dimension_Name.

The specific, illustrated hybrid dimension has the name "Vare" and the associated HdimensionJDD is 0.

The fourth table HDIMENSIONALJUNIT 504 defines the hybrid dimensional unit (also referred to as hybrid level) of the above defined hybrid dimension "Vare".

The defining parameters of the hybrid units are Hmodel_ID, HDIMUnitJD, DimensionJGD and HDIMUnh Name.

The illlustrated hybrid dimension "Vare" comprises one hybrid dimensional unit, "Kategori" in this example. Basically, the above-described meta-data tables define a unit "Kategori" to be applied for merging the two projects together into one hybrid dimensional model, "AC Storkøb (Extended)".

The fifth table HMODEL_RELATION 505 establishes the basic "merging" relations between the two projects, Project JD=3 (AC Storkøb detailløsning) and ProjectJD Table "OmsGruppe" defined in table HMODELJPROJECT, 502.

The merger is established by the parameters Hmodel_LD, HDIMUnitJD, HModelProjectJD, Dimension JD, DimUnit _ID, Dimension_Name and HDIMUnh Name.

The specific establishment of a hybrid dimension unit is initially defined by the two first columns HmodelJD, HDIMUnitJD in which the merged unit is associated with the above identified hybrid model "AC Storkøb (Extended)", Hmodel_ID=2, and the unit is identified as the hybrid unit defined in table 504, HDIMUnήjLD=0. Moreover, the name of the unit is defined as DimUnhjName- 'Kategori". Likewise, the hybrid dimension name "Vare" is imported from the table 503.

Each of the above-defined projects HmodelProject JD = 0 and 1 is defined in the HmodelProject JD as the "source unit" or merger unit candidate (i.e. the OLAP project AC Storkøb detailløsning and the spreadsheet project "OmsGruppe".

The HmodelProject JD = 0, i.e. the OLAP project, identifies in table 505 a unit by the columns Dimension JD, DimUnitJDD with Dimension D=1 and DimUnit JO= 3. The value of Dimension_Name is here "Vare", but it may basically be omitted, if the Dimension JD=1 and DimUnit_ID=13 explicitly point to the "mergable" unit in the OLAP project.

Turning now to the second project, i.e. the second row of table 505, it is set to be another type of project than the above-described OLAP project by setting HmodelProject JQD = 1, i.e. the spreadsheet project defined in the second row of the HMODEL_PROJECT table 502.

The two "0"-values of the columns DimensionJD, DimUnitJD may basically be regarded as a "null", i.e.: not used for identification of the corresponding "mergable" unit of the second (dimensional) data source. The relevant unit is determined by the Dimension_Name "Vare" referring to the tables 503 and 504.

In short, if the HmodelProject JD = 0, the columns DimensionJD, DimUnitJD are applied for pointing out the "mergable" unit.

The sixth table My_Computer 506 is applied for connecting up to the real physical data source defined by the DB_Alias (defined in table 502 as KatOmsGrupper.xls).

For the sake of simplicity, this table has not been included in fig. 5a.

The first column DBJD defines the database ID, here:DS_XLS_37216_7052178704. The second column, DB_Alias, refers to the above-mentioned DB_Alias defined in table 502 as KatOmsGrupper.xls. The last three columns may be applied for specifically pointing to the desired data source. The first two columns DB_ConnectTypeJD and DB SourceTypeJD, may be applied for pointing to more advanced data sources, e.g. variants of ODBC, and the two last columns, DB JLocation and DB JMame, may be applied for pointing to more primitive data sources, such as the already described spreadsheet. Here, the column DBJLocation refers to a path defining the folder in which the relevant spreadsheet file may be found, C:\Documents and settings\nh\Dol umenter\ (see also fig. 10) and the last column DB_name identifies the name of the file KatOmsGrupper.xls.

The seventh table, MyCompDIMENSIONALJJNIT 507 is applied for storing user- defined changes to the dimension models of individual table projects. The information stored is partial and incremental, i.e. an auto-created model is initially assumed to exist on the table and the information in MyCompDiMENSIONALJUNrT 507 thus represents the changes to this auto- created model. For this reason, not all dimensional levels of a table project will have a reference in MyCompDIMENSIONAL_UNIT 507. If no changes are made to the auto-created dimensional model, no references (records) will be stored in MyCompDIMENSIONALJUNIT 507.

This implementation saves storage space and load/save-time. Evidently, many other implementations are possible including storage of information about all dimension levels in a table project.

The specific table is applied for mapping changes in the dimensional references to the relevant data source.

The table 507 comprises five columns: Project JD, Dimension JSTame, DimensionalUnit_Name, ChildUnitJName and DB_Alias.

The project JD column refers to the Dimension Jsfame "Vare" already established in table 503. The dimensional unit's name of a column of the data source (illustrated e.g. in fig.9) is defined as "OmsGruppe" and refers to the specific column 93 of the data source illustrated in fig. 9. Moreover, a child unit is established by ChildUnitJNfame ="Kategori", see also fig. 13, nodes 1315 and 1316.

The eighth table OLAP_System 508 is applied for determination of the second (dimensional) data source, the OLAP System.

The OLAP_System table 508 comprises five columns: ConfigPrjID, OLAPJD, OLAP_Name, DW_Name and MetaName.

Bascially, the five columns explicitly point to a dimensionally modeled OLAP data warehouse, here the "AC Storkøb detailløsning". Evidently, other ways of establishing meta-data applied for linking at least two dimensional units of two different dimensionally modeled data sources may be applied within the scope of the invenion.

Claims

Patent claims

1. Computer-implemented method of merging at least two dimensionally modeled databases (DBl, DB2) comprising

at least one first dimensional model (DM1) referring to said at least one first database (DBl) by means of a first set of meta-data (DMMDl) said at least one first dimensional model (DM1) comprising at least one dimension,

at least one second dimensional model (DM2) referring to said at least one second database (DB2) by means of a second set of meta-data (DMMD2) said at least one second dimensional model (DM2) comprising at least one dimension,

merging at least two dimensions of the at least two different dimensional models (DM1, DM2) by adding of at least one set of further meta-data (FUMD) said at least one set of further meta-data (FUMD) referring to at least one of said first set of meta-data (DMMD 1 ) and said second set of meta-data (DMMD2) .

2. Computer-implemented method according to claim 1, whereby said at least one set of further meta-data (FUMD) refers to both said first set of metadata (DMMDl) and said second set of meta-data (DMMD2).

3. Computer-implemented method accordmg to claim 1 or 2, whereby said first set of meta-data (DMMDl) refers to said first database (DBl) comprising a first data source (DSl) and at least one associated set of fundamental meta-data (FFMD) and whereby said second set of meta-data refers (DMMD2) to said second database (DB2) comprising a second data source (DSl) and at least one second associated set of fundamental meta-data (SFMD).

4. Computer-implemented method according to any of the claims 1-3, whereby said at least two dimensional models (DM1, DM2) are established as a collection of conceptually related entities.

5. Computer-implemented method according to any of the claims 1-4, whereby said entities comprise dimensions, levels or facts.

6. Computer-implemented method according to any of the claims 1-5, whereby at least one of said dimensional models (DM11, DM2) is established by means of dimensional modeling tools.

7. Computer-implemented method according to any of the claims 1-6, whereby at least one of said at least one first dimensional model (DM1) and said at least second dimensional model (DM2) is established by means of an associated dimension modeler (DIMMOD).

8. Computer-implemented method according to any of the claims 1-7, whereby said dimension modeler (DIMMOD) comprises means for analyzing the structure of a selected database.

9. Computer- implemented method according to any of the claims 1-8, whereby said at least one of said at least one first dimensional model (DM1) and said at least second dimensional model (DM2) is established manually by a user.

10. Computer-implemented method according to any of the claims 1-9, whereby said at least one of said at least one first dimensional model (DM1) and said at least second dimensional model (DM2) is established wholly or partly automatically by computer-implemented algorithms.

11. Computer-implemented method according to any of the claims 1-10, whereby said method comprises the steps of

-establishing or retrieving a hybrid dimension identifier ID

-establishing a reference to the databases (DBl, DB2) comprised in said hybrid dimensional model (HDM),

-including at least one dimensional model (DM1, DM2) of the respective databases

(DBl, DB2) in said hybrid dimensional model (HDM), said dimensional models (DM1, DM2) comprising the respective dimensions (Dl,

D2), said dimensions (Dl, D2) comprising levels (LI, L2, L3, L4; L21, L22),

-identifying at least one equivalent level (L2; L22) of the dimensions (Dl, D2),

-providing a new hybrid dimension (D12) comprising said at least one equivalent level (L2, L22) of at least two dimension (Dl, D2) comprised in at least two different dimensional models (DM1, DM2) and at least one set of further meta-data linking at least one level (LI, L2, L4; L21) associated with said at least two dimensions to one equivalent level (L2, L22).

12. Computer-implemented method according to any of the claims 1-11, whereby said at least one set of further meta-data (FUMD) defines at least one hybrid dimensional model (HDM).

13. Computer-implemented method according to any of the claims 1-12, whereby the at least one set of further meta-data (FUMD) refers to at least two equivalent levels (1615, 1616) of at least two different dimensions (1617, 1618), said at least two dimensions (1617, 1618) being comprised in at least two different dimensional models.

14. Computer-implemented hybrid dimensional model comprising at least two dimensionally modeled databases (DBl, DB2)

at least one second dimensional model (DM2) referring to said at least one second database (DB2) by means of a second set of meta-data (DMMD2) said at least one second dimensional model (DM2) comprising at least one dimension and

at least one set of further meta-data (FUMD) said at least one set of further meta-data (FUMD) referring to at least one of said first set of meta-data (DMMDl) and said second set of meta-data (DMMD2).

15. Computer-implemented hybrid dimensional model according to claim 14, wherein said at least one set of further meta-data (FUMD) refers to both said first set of meta-data (DMMD 1) and said second set of meta-data (DMMD2).

16. Computer-implemented hybrid dimensional model according to claim 14 or 15, wherein said first set of meta-data (DMMDl) refers to said first database (DBl) comprising a first data source (DSl) and at least one first associated set of fundamental meta-data (FFMD) and

said second set of meta-data (DMMD2) refers to said second database (DB2) comprising a second data source (DSl) and at least one second associated set of fundamental meta-data (SFMD).

17. Computer-implemented hybrid dimensional model according to any of the claims 14-16, wherein said at least two dimensional models (DM1, DM2) comprise a collection of conceptually related entities.

18. Computer-implemented hybrid dimensional model according to any of the claims 14-17 wherein said entities comprise dimension, levels or facts.

19. Computer-implemented hybrid dimensional model according to any of the claims 14-18 wherein said databases comprise ERP systems, data warehouses, etc.

20. Computer-implemented hybrid dimensional model according to any of the claims 14-19, wherein said hybrid dimensional model facilitates queries to at least two different databases (DB 1 , DB2).

21. Computer-implemented hybrid dimensional model according to any of the claims 14-20, wherein the at least one set of further meta-data (FUMD) refers to at least two equivalent levels (1615, 1616) of at least two different dimensions (1617, 1618), said at least two dimensions (1617, 1618) being comprised in at least two different dimensional models.

22. Method of performing metric calculations based on a hybrid dimensional model according to the claims 14-21, said method comprising the steps of

establishing at least one consolidation database (2030),

copying at least a subset of facts from said at least one first database (2010, DBl) to at least one fact table (2032) in said at least one consolidation database (2030), copying at least a subset of facts from said at least one second database (2020, DB2) to at least one further fact table (2042) in said at least one consolidation database (2030),

copying at least one subset of dimensions from said at least one first database (2010, DBl) to at least one dimension table (2031) in said at least one consolidation database (2030),

copying at least one subset of dimensions from said at least one second database (2020, DB2) to at least one dimension table (2031) in said at least one consolidation database (2030),

performing said metric calculations on the basis of the facts comprised in said at least one fact table (2032) and said at least one dimension table (2031),

outputting said metric calculations to at least one result table (2051).

23. Method of performing metric calculations based on a hybrid dimensional model according to the claims 14-21, said method comprising the steps of

exchanging dimensions between at least two of said at least two dimensionally modeled databases (1910, 1920),

at least one of said at least two sets of dimensions (1911, 1921) comprising a subset of dimensions (1912, 1922) borrowed from the other set of dimensions, performing metric calculations on facts from fact tables (1913, 1923) by means of application logic associated with said at least two databases (1910, 1920) on the basis of said dimensions (1911, 1921),

outputting said metric calculations to at least one result table (1935).

24. Method of performing metric calculations according to claim 22 or 23, whereby the dimensions and the facts of the at least one first database (1910) and the at least second database (1920) are identified by said further set of meta-data (FUMD).

25. Method of performing metric calculations according to any of the claims 22 to 24, whereby said metric calculations are based upon user request, said user request being performed according to the terms of said hybrid dimension model.

26. Graphical user interface (GUI)

graphically interfaced means for displaying the levels (1615, 1616) of said at least two dimensions (1617, 1618) selected from different dimensional models ,

graphically interfaced means for combining the selected database into one single hybrid dimensional model (1710) comprising at least one hybrid dimension (1717), said at least one hybrid dimension (1717) comprising at least one level (1615, 1616) referring to at least two different individually modeled data sources (1102, 1103).

27. Graphical user interface (GUI) according to claim 26, whereby the graphical user interface comprises means for establishment of dimensional models of selected data bases.

28. Graphical user interface (GUI) according to the claims 26 or 27, whereby the selected at least two databases are individually and dimensionally modeled.

29. Graphical user interface (GUI) according to any of the claims 26 to 28, whereby at least one of the data sources comprises at least one dimensionally modeled data warehouse.

30. Graphical user interface (GUI) according to any of the claims 26 to 29, whereby at least one of the data sources comprises at least one OLAP system (1102).

31. Graphical user interface (GUI) according to any of the claims 26 to 30, whereby at least one of the data sources comprises at least one spreadsheet (1103).

32. Graphical user interface (GUI) according to any of the claims 26 to 31, whereby at least one of the data sources comprises at least one relational or flat-file data source obtained via the Internet.

33. Graphical user interface (GUI) according to any of the claims 26 to 32, wherein the graphical means for combining the selected database into one hybrid dimensional model is established according to the method of any of the claims 1-14.

34. Data carrier for implementation of the computer-implemented method according to any of the claims 1- 13 on a computer.

35. Computer-implemented method according to any of the claims 1-13, whereby at least one of said databases comprises a hybrid-dimensional model.

36. Computer-implemented method according to any of the claims 1-13, said new hybrid dimension (D12) comprising at least two, preferably all, remaining levels (LI, L3, L4; L21) of said at least two dimensions (Dl, D2) while preserving the original mutual relationships.

37. Computer-implemented method according to any of the claims 1-13, whereby at least one dimension comprises at least one level.

38. Graphical user interface (GUI) according to any of the claims 26 or 33, whereby said combination is performed on a dimension level basis.

39. Graphical user interface (GUI) according to any of the claims 26 to 29, whereby at least one of the data sources comprises at least one multidimensional database, also called a cube.