WO2002048906A1 - Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor - Google Patents

Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor Download PDF

Info

Publication number
WO2002048906A1
WO2002048906A1 PCT/DK2000/000700 DK0000700W WO0248906A1 WO 2002048906 A1 WO2002048906 A1 WO 2002048906A1 DK 0000700 W DK0000700 W DK 0000700W WO 0248906 A1 WO0248906 A1 WO 0248906A1
Authority
WO
WIPO (PCT)
Prior art keywords
query processor
domain
robot
query
entities
Prior art date
Application number
PCT/DK2000/000700
Other languages
French (fr)
Inventor
Morten Helles
Esben Kraq Hansen
Original Assignee
Kapow Aps
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kapow Aps filed Critical Kapow Aps
Priority to EP00984910A priority Critical patent/EP1342171A1/en
Priority to US10/450,792 priority patent/US7698277B2/en
Priority to PCT/DK2000/000700 priority patent/WO2002048906A1/en
Priority to CA002431908A priority patent/CA2431908A1/en
Priority to AU2001221507A priority patent/AU2001221507A1/en
Publication of WO2002048906A1 publication Critical patent/WO2002048906A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the invention relates to a query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor.
  • the invention deals with accessing, i.e. reading and/or writing in data sources associated with a certain domain.
  • the data sources are typically web-based which basically means that the data of the data source are made available to the user according to a serial transfer protocol, e.g. http via the Internet.
  • the serial transfer of the data made available to the user is sometimes easily conceivable to a user, especially when dealing with a simple and quite specific request.
  • a problem with data retrieval from web-based data sources is that the user must typically find one or several data sources comprising the relevant data. This search may be very time consuming and typically non-exhaustive due to the fact that several data sources may easily be overlooked.
  • the user has to perform further queries on each site and these queries typically have to be made different from site to site.
  • a problem with the known systems applying agents is that the agents require some kind of knowledge about the data source structure, and the use of the agent requires the accept of the owner of the data source due to the fact that an agent may dig into a data source more or less out of control.
  • Another problem with the known systems applying robots is also that the robots require some kind of knowledge about the data source structure, e.g. knowledge of the structure of data containing an HTML table of a web-based data source, and if this knowledge is not available, the programming of such robot is quite difficult. Hence, the applicable number of robots retrieving data from such data sources is limited as is the data of interest in the domain.
  • the invention relates to domain processor (DP) according to claim 1 comprising
  • DMR domain modeller
  • said robot modeller (RM) comprising
  • said at least one robot (R) being adapted for accessing at least one web- based data source (DS),
  • said at least one data source comprising entities comprised in a predefined domain (D)
  • said at least one domain modeller comprising means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM),
  • STM storage model
  • said at least one Query Processor Modeller comprising
  • QPE Query Processor elements
  • At least one of said query processor elements (QPE) of associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
  • QPE query processor elements
  • RQPE Robot query processor Element
  • the domain processor comprises at least one query processor maintenance manager (QMM), said at least one query processor maintenance manager (QMM) comprising means for executing at least one query processor (QP) established by the domain processor
  • the domain processor may advantageously comprise a tool for running a query processor established by the domain processor.
  • the query processor maintenance manager may thus be adapted for running the query processor on one or several servers.
  • Such a manager may include a visual tool illustrating the running state of the query processor and the individual elements.
  • An example of such intuitive processing is that the individual elements change color according to their state, e.g. within a color range from white to red, depending on the load of the elements.
  • the manager should preferably illustrate basic on-off conditions visually, i.e. illustrate actively if an element is working properly, and whether entities are transferred between the query processor elements and whether entities may actually be transferred between elements.
  • the latter feature may ease operation of the system significantly due to the fact that the absence of an entity flow between the elements does not necessarily indicate that a fault-condition has occurred simply because the element is not queried.
  • Determination of a "clear road" between the elements may e.g. be established by forwarding dummy (testing) queries between elements at certain intervals.
  • the Query Processor Modeller may include submenus facilitating specialized execution of the query processor.
  • the invention relates to a robot modeller (RM) according to claim 3 comprising
  • said at least one robot (R) being adapted for accessing at least one web-based data source (DS), said at least one data source (DS) comprising entities comprised in a predefined domain (D).
  • DMR domain modeller
  • DM domain model associated with at least one chosen domain
  • said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM)
  • STM storage model
  • a domain model represents a structured way of defining properties of different aspects of a domain.
  • a domain model may e.g. comprise an extraction model, i.e. a definition of relevant entities and attributes to be looked for in the web-based data source. It should be noted that the extraction model may primarily describe (or mask) the data source on the basis of text strings and combinations of such strings.
  • a chosen domain may e.g. be "cars offered for sale”.
  • the domain modeller comprises means for establishing reference mapping between extracted data obtained according to said extraction model (EM) and a conceptual representation of said data, a further advantageous embodiment of the invention has been obtained.
  • said reference mapping defines a set of reference entities describing a number of entities (E), said entities having attributes, a further advantageous embodiment of the invention has been obtained.
  • a set of reference entities may e.g. be a product catalogue.
  • Reference mapping may facilitate the possibility of adding knowledge to the retrieved entities.
  • Such information may e.g. be information deducible from a reference product catalogue.
  • the entity may be modified, e.g. as a validation, corrected or inserted as additional information about the entity.
  • a correction may e.g. be that one of the attributes of the Porsche retrieved above is false according to the product catalogue.
  • This false attribute may be detected in several different ways within the scope of the invention.
  • the reference product catalogue may e.g. initially reveal that no Porsche having a 3.0 liter engine has been made with diesel engine.
  • the product catalogue may reveal that no Porsche has been made with a diesel engine, thereby raising the probability that the data source provider has made a mistake. The wrong attribute "Diesel" may then be corrected.
  • reference entities may be applied for different variants of classification and validation.
  • the domain modeller comprises means for establishing at least one language domain dictionary (LDD), a further advantageous embodiment of the invention has been obtained.
  • the general language of the query processor may e.g. be regarded as the "language” defined by an object-oriented conceptual model associated with the query processor. Such language may e.g. be a preferred language or coding chosen as the general language.
  • the language domain dictionary may e.g. make it possible to have an entity that reads read "wagen” or “bil” transformed into an instance of an object car .
  • said domain modeller comprises means for establishing a set of reference recognition patterns
  • the set of reference recognition patterns may e.g. comprise character patterns (also known as regular expressions) or character structures (even pictures) to be applied when identifying attributes and entities, e.g. Ltd., Corp or A/S indicating that a company attribute or entity is associated with the character pattern in English, American English and Danish, respectively.
  • character patterns also known as regular expressions
  • character structures even pictures
  • the invention relates to a query processor modeller (QPM) comprising
  • QPE Query Processor elements
  • CS computer system
  • QPE query processor elements
  • RQPE Robot query processor Element
  • a domain-accessing system may be established by means of general components. Moreover, the components may rely on general knowledge about the domain of interest, thereby facilitating very fast establishment of domain- accessing systems.
  • GUI graphical user interface
  • QPE set of query processor elements
  • RQPE robot query processor element
  • TQPE trigger query processor element
  • the invention relates to a query processor maintenance manager (QMM) comprising
  • QP query processor
  • the query processor maintenance manager should be adapted for controlling the processing of an established query processor.
  • said maintenance manager comprises means for monitoring the state of at least one query processor element (QPE) or the performance of at least one query processor element (QPE), a further advantageous embodiment of the invention has been obtained.
  • domain processor maintenance manager comprises means for evaluating the data flow between query processor elements (QPE) of a query processor path
  • QPE query processor elements
  • domain processor maintenance manager comprises means for running and visual monitoring of the individual modules of a query processor
  • domain processor maintenance manager comprises means for running and visual monitoring of a query processor (QP) on element basis
  • QP query processor
  • the elements may be advantageously monitored as visually separated elements.
  • the invention relates to a web-robot, said robot comprising means for extracting information from web-based data sources (DS) in dependency of at least one extraction model (EM), said at least one extraction model comprising reference data structures defining entities and/or entity structures of data sources in a domain.
  • DS web-based data sources
  • EM extraction model
  • said robot comprises at least one exchangeable plug-in
  • said plug-in comprising retrieving routines adapted for reading knowledge stored in said extraction model, said knowledge preferably being domain-specific
  • EM extraction model
  • the invention relates to a query processor (QP),
  • said query processor comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM),
  • said query processor comprising at least three query processor elements (QPE),
  • QPE query processor elements
  • RQPE robot
  • said robot being attached to at least one data source (DS) said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE),
  • At least one of said query processor elements comprising a trigger (TQPE) said trigger query processor element (TQPE) comprising means for establishing a query.
  • the web-based data sources are typically independent.
  • the trigger element may be both manually and automatically driven, i.e. by a query user or an automated query routine.
  • the query processor elements comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE)
  • TAQPE transformer query processor element
  • MESQPE messenger query processor element
  • MQPE mediator query processor element
  • the invention relates to a method of establishing at least one query processor (QP),
  • said query processor comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM),
  • said query processor comprising at least three query processor elements (QPE),
  • QPE query processor elements
  • RQPE robot
  • said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE),
  • At least one of said query processor elements comprising a trigger (TQPE)
  • TQPE trigger query processor element
  • said method comprising the step of
  • RQPE robot query processor element
  • DS data sources
  • GUI graphical user interface
  • the data source may both be regarded as an internal part or an external part of the query processor within the scope of the invention, depending on whether the associated data source is defined by its data or not.
  • GUI graphical user interface
  • QPE combined query processor elements
  • TAQPE transformer query processor element
  • MESQPE messenger query processor element
  • MQPE mediator query processor element
  • the invention relates to a method of establishing at least one query processor (QP),
  • said query processor comprising means for accessing data from web-based data sources (DS) of a domain by means at least one user interface (UI)
  • said method comprising the steps of selecting a number of query processor element (QPE)
  • At least one of said selected query processor elements being a robot query processor element (RQPE)
  • QPE selected query processor elements
  • RQPE robot query processor element
  • At least one of said selected query processor elements being a trigger query processor element (TQPE), attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain,
  • TRPE trigger query processor elements
  • QPE combined query processor elements
  • TAQPE transformer query processor element
  • MESQPE messenger query processor element
  • MQPE mediator query processor element
  • the invention relates to a method of extracting data from a web-based data source (DS), said method comprising the steps of
  • ERP entity reference base
  • a conceptual model may also include a storage database model.
  • the method comprises at least one step of verifying whether the read instances correspond with an entity reference base, (ERB) on the basis of entities represented in said conceptual entity-representing format, a further advantageous embodiment of the invention has been obtained.
  • entity reference base (ERB)
  • Micro-interpretation according to the invention may be regarded as the reading of individual string-based attributes on a web-based data source.
  • the combination of read string-based attributes into entities may also be regarded as micro-interpretation preformed according the extraction model.
  • micro-interpretation work is e.g. the job (typically performed automatically by software-based routines) of determining whether a read attribute is a "Ford” or a "Fiat".
  • a further example is the determination of whether an engine is a 75 or 155 Hp engine.
  • Entities held in an extraction format are typically string-based, e.g. Fiat, "Fiat”, FIAT, FIATH, etc.
  • Fiat, "Fiat”, FIAT, FIATH are all represented as a Fiat-type in the conceptual format.
  • Such a Fiat type may typically involve an integer representation of a Fiat in old databases whereas new databases may represent Fiat, "Fiat”, FIAT, FIATH as a "Fiat”.
  • Macro-interpretation according to the invention may typically be regarded as a syntax check performed on the basis of the complete and established instance. Such a check may e.g. be performed with the purpose of verifying whether the established instance of an entity is actually realistic, i.e. consistent.
  • the conceptually held entities may easily be grouped and filtered and evidently be performed relatively easily.
  • Conceptual representation of the entities according to the invention is typically a object-oriented representation.
  • An example of macro-interpretation work is e.g. the job (typically performed automatically by software-based routines) of determining whether read attributes combined into an entity "Fiat", "120 Hp” and 2.0 liter engine are actually valid.
  • a check performed on the basis of a reference base of known (valid) entity types i.e. a product catalogue, may moreover be performed with the purpose of adding information to the checked instances of entities.
  • Such procedure may be regarded as a deduction of information exemplified by an instance of a car, "Fiat”, "155 Hp” and 2.0 liter.
  • a reference product catalogue associated with the car domain such a car may be deduced to be a turbo version, i.e. "Fiat”, "155 Hp", "2.0 liter” and TURBO.
  • macro-interpretation may be performed on instances held in a conceptual format.
  • An example may e.g. be the above-mentioned deduction of information exemplified by an instance of a car, "Fiat”, “155 Hp” and 2.0 liter.
  • a reference product catalogue associated with the car domain such a car may be deduced to be a turbo version, i.e. "Fiat”, "155 Hp", “2.0 liter” and TURBO.
  • a storage model may typically be relational.
  • instances may be corrected, e.g. by omitting attributes held in the instance or maybe modified by one ore several attributes forming the instance of an entity.
  • An example may e.g. be the above-mentioned deduction of information exemplified by an instance of a car, "Fiat”, “120 Hp", 2.0 liter engine and Turbo.
  • the verification of the instance may result in a correction of the "Turbo" attribute, as the verification procedure may both conclude (a): no 120 HP Fiat having Turbo is in the reference catalogue (b): a 120 HP Fiat without Turbo is most likely the true intended instance of a car. Consequently, a correction routine may correct the instance accordingly or discard the entity entirely.
  • the invention relates to a method of establishing a query processor
  • said query processor being adapted for accessing data on at least two different web- based data sources, selecting at least two predefined query processor elements (QPE),
  • the overall structure of a query processor may be purely based on some basically intended design rules, i.e. a robot element must be assignedto a data source, a trigger must feature a manual user interface, a database element must contain retrieved database element, etc.
  • Such a conceptual design of a query processor should preferably be made by means of a graphically-based visual program, e.g. a drag and drop-like design program.
  • this conceptual programming of a query processor may be made on the basis of more or less structured knowledge about the domain and the data sources of the domain.
  • such a design of a query processor represents the framework for the intended query processor.
  • the query processor elements basically represent different sub-frameworks which may all be designed and performed in separate structures or routines. Therefore, the design of query processors by means of different functional properties minimizes "error cross-talk" between the elements and the elements may advantageously be put together initially without dealing with complicated details of the individual elements.
  • a query processor according to the invention is established for accessing data of at least two different independent web-based data sources.
  • a further advantage of the above-mentioned method is that a break-down of the functional features of a query processor into standardized elements, which may be configurable, may easily be conceived by a programmer.
  • a further advantage of the invention is that utilization of standardized elements facilitates the possibility of pre-configuring different variants of a certain element type, thereby offering the possibility of inserting a pre-configured element to the user.
  • An example of such pre-configuration of elements may e.g. be a trigger element.
  • a trigger element within the (type) group of trigger elements, several variants may be pre-established with great advantage if such trigger elements are utilized often. Therefore, a programmer may e.g. apply a trigger element predefined for trigging a query at certain time intervals.
  • Other types of trigger elements may e.g. be triggers comprising a statistic module applicable for trigging a query according to different system parameters.
  • a third possible type of triggers may e.g. be a manually operated trigger intended for establishment of a query in corporation with a manually operated user interface.
  • the invention offers a high-level language facilitating easy web-based access.
  • Different functional characteristics may e.g. be elements functioning as converters, triggers, caches, robots.
  • a query processor according to the invention may be established by means of standardized "bricks", thereby doing away with the establishment of a web-oriented query processor being extremely complicated.
  • the different elements may be configured or designed independently.
  • the individual elements may be established so as to fit the individual task(s) of the elements without inducing errors somewhere else in the processing system.
  • said modification of the selected query processor elements comprises at least one plug-in software module, said at least one plug-in defining domain-specific properties of said element, a further advantageous embodiment of the invention has been obtained.
  • domain-specific plug-ins may initially be constructed, e.g. product catalogues, language dictionaries, as completely separate routines.
  • the individual elements may be ideally constructed, e.g. a robot, with no or only little knowledge of the language of the data source due to the fact that the basic structure and functioning of the robot is language independent.
  • Product catalogues should likewise be domain specific.
  • the individual elements may be established with different plug-ins.
  • the invention relates to a method of establishing a domain-accessing routine
  • said domain comprising a plurality of web-based data sources
  • said method comprising the steps of establishing at least one robot () adapted for retrieving entities stored on said plurality of web-based data sources, establishing at least one reference catalogue,
  • said established procedure of verification comprises a modification of the retrieved entities if the verification procedure indicates or proves that a read entity is not valid according to the at least one reference catalogue, a further advantageous embodiment of the invention has been obtained.
  • the invention relates to a query processor maintenance manager (QMM)
  • DPUI domain processor user interface
  • said manager comprising means for evaluating different modules of at least one query processor (QP),
  • said means for evaluating different subroutines of said query processor comprising means for monitoring the state of at least on query processor element (QPE).
  • QPE query processor element
  • the query processor may comprise means for monitoring at the robot element, a transformer element, a trigger element, a mediator etc.
  • said processor comprises means for automatically forwarding messages to said at least one query processor user interface (DPUI) when certain predefined conditions are met, a further advantageous embodiment of the invention has been obtained.
  • DPUI query processor user interface
  • the predefined conditions may e.g. be conditions determining that a transformer has failed to transform extracted entities into conceptual entities.
  • a further predefined condition may be that a maximum load of an element, e.g. a cache or a robot, has been exceeded.
  • manager comprises means for modifying individual query processor elements/sub-routines
  • a further advantageous embodiment of the invention has been obtained.
  • the means for modifying individual query processor elements/sub-routines may e.g. comprise an editor for the robots or means for modifying plug-ins centrally.
  • An example of such an editor may e.g. be the interface of a Query Processor Modeller in which the individual query processor elements may be edited simply by clicking on the elements and thereby starting the editor related to the activated element.
  • Such an editor may e.g. be a Robotmaker, if a robot is clicked on, or a domain modeller if a transformer element is clicked on.
  • said manager comprises means for modifying the query flow in the query processor during execution of the query processor, a further advantageous embodiment of the invention has been obtained.
  • the up-time of the query processor may be maximized.
  • This realtime editor should preferably comprise means for blocking differing query paths of the query processor without invoking fault conditions on the associated signal paths.
  • An example of means for modifying the query flow may e.g. comprise a mute element included in a query path.
  • the activation of such a mute element may then cause the involved branch to be out of work, whereas the rest of the query processor may proceed unaffectedly, insofar that queries or entities (i.e. data) from the muted branch are significant to proceeding the query.
  • the queries and entities missing from one branch of the query processor subroutine may be preferable over closing the complete query processor down.
  • the elements of the muted branch e.g. a robot or a transformer, may be "repaired" or updated without resulting in run-time errors.
  • a further advantageous variant of the above-mentioned modification may be a halt routine acting as the above-mentioned mute but including a memory which may catch and store queries, and subsequently resume processing by means of the cache and stored queries.
  • fig. 1 illustrates some basic principles of a query processor system
  • fig. 2 illustrates a basic approach according to the invention when dealing with domain processing
  • fig. 3 illustrates the process of establishing a domain processor according to a preferred embodiment of the invention
  • figs. 4 to 6 illustrate the principles of one embodiment of a domain modeller according to one embodiment of the invention
  • fig. 7 illustrates the principles of an applicable robot-making program according to one embodiment of the invention
  • fig. 8 illustrates the functionality of a query processor modeller according to one embodiment of the invention.
  • fig. 9 illustrates a possible user interface of a domain execution manager.
  • Fig. la illustrates the basic principles of a web-based market place.
  • a web-based market place generally comprises a number of web-based data sources DS.
  • the data sources are e.g. web-sites associated with a homepage of a data source owner.
  • the data are transferred according to a HTTP protocol.
  • Other protocols e.g. WAP protocol or HTTPS are also applicable.
  • the data sources DS are typically a database or they are powered by a database DB of the data owner.
  • a marketplace may moreover comprise non-web based data sources accessed by means of e.g. ODBC drivers.
  • the data sources offer information, products, services, etc. free or for sale.
  • a market place should technically deal with one domain only, but evidently, several domains may be overlaid and thereby offer a market place dealing with different domains.
  • An example of such domain may e.g. be a car market place.
  • the cars of the domain are offered for sale on the individual web-based data sources DS, and the cars may be new or used.
  • a domain may include different nationalities of data sources and be in many languages.
  • a car market place offering used cars would typically only comprise cars offered for sale in one country.
  • exemplary domains may be jobs, services, stocks, odds, boats etc.
  • web-based access to the data sources facilitates a very broad covering of the entire domain due to the fact that web-based data sources may be accessed without any kind of corporation between the accessing part and the data source owner. Typically, the data sources will be independent.
  • the content of the data source of the domain will be regarded as entities.
  • An entity has different properties, here defined as attributes.
  • An example of an entity is a specific car offered for sale, e.g. a Porsche, and attributes may be color, e.g. black, engine, e.g. 3.0 liters, etc.
  • An entity is a specific boat described by a number of suitable boat-describing parameters, or attributes, such as length, price, year, etc.
  • the data sources DS may be accessed both by reading and/or writing.
  • the data sources may be accessed via a domain handling system, i.e. a processing system, implemented by software in hardware on the illustrated computer system CS.
  • the computer system may comprise one central server or a number of coupled servers located centrally or decentrally.
  • Such system may be regarded as a query processor QP.
  • the query processor is adapted for querying the data sources automatically or upon request, a query Q, made by a user U.
  • the request is performed by means of a user interface implemented on a user platform UPF.
  • a User Platform UPF typically comprises a computer-based user interface which may be manually operated by a user U.
  • a user may forward a query Q to the data sources DS via the query processor QP.
  • the query may be processed in many steps and the query processor QP may also include a data cache or a database for storing entities retrieved from the data sources DS for statistical purposes or for speeding up the query process.
  • the individual web-based data sources are accessed (i.e.: read and/or write) by means of robots attached to the data sources.
  • robots attached to the data sources.
  • one robot is uniquely to a corresponding data source DS.
  • a robot is a kind of automatic process established with the purpose of accessing web-based data.
  • a robot is a sub- arrangement of a so-called agent.
  • a robot is a software-based automatic process established with the purpose of accessing web-based data sources.
  • a robot may even comprise some kind of intelligence embedded in the process establishing elements. It should be noted that a robot according to this definition may even be regarded as an agent by some practitioners within the art.
  • the agent has no personality, and it is not autonomous, nor mobile, in the sense that the agent is free to be transferred and processed on the local data source servers of the data source owners.
  • a robot according to the invention is established for remote execution in relation to the data sources to be accessed and the robots will only be executed in a particular server environment. It should be noted that this particular environment may obviously include several servers located at different places.
  • non-web-based data sources may be added if desired.
  • Fig. lc illustrates the complex nature of a data source to be accessed according to the invention.
  • the illustrated data source DS has a data structure which is initially unrevealed and incompatible with the access tools of the retrieving profile associated with the specific data source DS.
  • the character- based information of the data source DS has been converted into a number of attributes of identified text strings.
  • attributes may be encoded and decoded in various formats such as character based formats, image based formats and active content formats, such as Java applet, JavaScript application or NB script application.
  • the text strings may e.g. be a mix of text strings identifying car names, model names, numbers, etc.
  • the data source must be evaluated and interpreted according to an extraction model in order to facilitate access to hidden information by the retrieving profile RP.
  • Fig. Id illustrates identification and categorization of attributes of a data source according to the invention.
  • the attributes i.e. the text strings of the data source, may subsequently be interpreted and combined into so-called entities of associated attributes ASA.
  • the associated attributes may be established so as to comprise certain predefined types of attributes, i.e. categorized attributes.
  • An example of an entity is a car entity comprising the categorized attributes CA "Trabant", '88 and $100,000 where the first attribute of the category is car model, the second attribute of the category is manufacturing year and the third attribute of the category is the price.
  • the above-mentioned entity may also be referred to as an instance of an extraction model.
  • the extraction model defines and describes certain attributes and entities of interest for the domain. Each entity is established as a set of associated attributes ASA and the irrelevant attributes are filtered away.
  • the identified entities may be copied into the central database DB means in such a way that the retrieving profile initially performs a query in the database instead of visiting every involved data source DS and lists the results to the user according to a predefined listing format. This feature ensures quick access to the search result. If the user U requires additional information, this information may be obtained by means of a link contained in the above-mentioned result list.
  • Such an automatically handled change may take place if e.g. one entity has been removed from the data source and replaced by two other entities when the removed entity represents a sold car and the two new entities represent cars introduced for sale.
  • Such a change observed by the robot should of course be reflected in the database, as the sold car has to be removed and the two cars be added to the database in order to reflect the state of the data source when the data source is visited.
  • a change may likewise be stored and registered for statistic purposes in another database.
  • each data source typically requires a dedicated robot.
  • Fig.2 illustrates three entity models applied in a preferred embodiment of the invention.
  • the three entity models are an extraction model EM, a conceptual model CM and a storage model STM.
  • extraction entities EENT entities according to the three models are referred to as extraction entities EENT, conceptual entities CENT and storage entities SENT.
  • the entities are also referred to in three different formats, i.e. an extraction format, a conceptual format and a storage format.
  • the entity flow is transformed between the different formats by means of converters established for converting the data from one format into another.
  • the converters may preferably be established as so-called transformer elements which will be dealt with in detail below.
  • the entities are accessed according to an extraction model preferably common for all involved data sources of the domain.
  • the extraction entities simply comprise a serial stream of strings.
  • the strings are ordered in such a way that the receiver of the string-stream may recognize what the transmitter actually intends to transmit. This may be established both with accompanying codes or simply as a convention defining the sequence.
  • the extraction model represents more than a data format. It also defines the different attributes which the robots should access when dealing with the different data sources.
  • the extraction model represents a framework in which the designers may design the robots. The robot designers may therefore concentrate fully on designing a robot capable of accessing the attributes contained in the extraction model and on combining the attributes into entities according to the extraction model, i.e. extraction entities.
  • the extraction entities may nevertheless be established e.g. wholly or partly by automated extraction routines.
  • such routines may e.g. be adapted for automatic reading the data source representation, automatic recognition of attribute patterns of the web-based data source, and outputting of these attributes as extraction entities according to the extraction model.
  • Such automated routines may evidently be adapted for assigning the specifically discovered attribute/entity patterns of a data source to a corresponding robot.
  • the extraction model may be established by means of a domain modeller DMR.
  • the extraction entities may then be converted, e.g. by a transformer, into conceptual entities.
  • the conceptual model representation of an entity involves a conversion of the individual entity into a unique object.
  • an extraction entity comprising a string stream of "Porsche”, “Red”, “3.0”,”Diesel” is converted into a unique car object, a conceptual entity, being a Porsche which is red and with a 3.0 liter diesel engine.
  • the conceptual format moreover offers the possibility of handling the entities in a compact way.
  • the entities may be represented in an object-oriented manner instead of a flat string format.
  • a conceptual approach to the entities offers the possibility of adding knowledge to the retrieved entities.
  • Such information may e.g. be information deducible from a reference product catalogue.
  • the entity may be modified, e.g. as a validation, a correction or as an insertion of additional information about the entity.
  • a correction may e.g. be that one of the attributes of the Porsche retrieved above is false according to the product catalogue.
  • This false attribute may be detected in several different ways within the scope of the invention.
  • the reference product catalogue may e.g. initially reveal that no Porsche having a 3.0 liter engine has been made with a diesel engine.
  • the product catalogue may reveal that no Porsche has been made with a diesel engine, thereby raising the probability that the data source provider has made a mistake. The wrong attribute "Diesel" may then be corrected.
  • Insertion of added information may e.g. be that the recognition of a Porsche of the above-mentioned type (now assuming that the diesel statement has not been made) has electronic injection. This information may then be inserted as a new attribute to the unique conceptual entity Porsche or in the fill-in of a text field attribute of the Porsche.
  • Validation comprises the step of evaluating whether the currently investigated conceptual entity should be regarded as a valid entity at all. Such validation may basically result in the fact that the entity is accepted as a valid entity or that the entity is discarded. Subsequently, a valid entity may be further processed with the purpose of deducing information about the entity described above.
  • a discarded entity may result in a further investigation of the original data source with the purpose of evaluating whether an entity has been overlooked.
  • a realtime evaluation of the discard rate of each data source should be performed with the purpose of monitoring whether the robot or the extraction model associated with the individual data source needs an update or replacement.
  • every possible attribute of a conceptual entity should be predefined in the conceptual model.
  • the conceptual entities and attributes should be established by means of a domain modeller.
  • the conceptual model should typically be made by people having a certain kind of knowledge about the domain. It should, nevertheless, be emphasized that the establishment of relevant attributes may be heavily supported by automated procedures traversing trough the domain and identifying the offered combinations of attributes.
  • the last entity model is the storage model.
  • the storage model is primarily adapted for applying traditional database structures and database handling methods to the retrieved entities. Thus, the modeling of a storage model may be performed with very little knowledge of the nature of the domain but more or less by focussing on the involved attributes and entities.
  • the distinction between the different models may be softened up a little in the sense that the conceptual model and the data storage model may more or less be incorporated in one body.
  • the invention features the possibility of performing centralized processing when data retrieved from the different data sources are represented according to a generalized entity model, e.g. a conceptual model.
  • the extraction format may be understood as an analogue format while the conceptual/storage format may be regarded as a digital format.
  • the extraction entities are typically entities extracted directly from the web-based data sources, the conceptual entities are typically the entities flowing in the heart of the query processor capable of more complex processing, and the storage entities are typically the entities represented in e.g. a relational database.
  • the different models e.g. the above-mentioned extraction model EM, conceptual model CM and storage model STM may facilitate an entity flow both ways; downstream as described above from the data sources to the user querying the query processor, or upstream from a user submitting an entity or a request, e.g, an order to a certain data source.
  • An extraction model may thus both be defined as a way of reading the data source and it may be defined as a way of writing (submitting) entities into the data source, e.g. by means of a form into a shopping cart of the data source or a data search form associated with the relevant data source.
  • the two functions, reading and writing should be supported by two separate distinct models for the purpose of clarity, i.e. one model for reading the data source, an extraction model, and one model for writing to a data source, a submission model.
  • the first format, the extraction format is the format in which the entities are accessed in the web-based data source. This format is evidently a little fragile and unhandy due to the fact that this string-based entity stream is primarily based on transmission of data supposed to be entities and attributes of entities. This fragile extraction format may typically not be supported significantly by validity checks due to the fact that the extracted entities are difficult to process on a large scale. Such processing would involve major complex string-based processing.
  • the conceptual format is established on the basis of the predefined conceptual model defining the basic nature of the entities of the domain.
  • the conceptual representation may fundamentally be regarded as an object-oriented representation of the read entities.
  • a conceptual representation of the read entities is relatively easy to process in the sense that the entities are converted into unique instances of the conceptual model, thereby offering filtering, conversion or modification of any information related to the individual instances of predefined information, e.g. attributes, types of attributes etc. consistent with the conceptual model.
  • the storage format is basically intended for storing the retrieved entities for later access.
  • the storage format represents a more handy representation of the retrieved entities of the domain in the sense that superfluous information, e.g. information contained in or related to the conceptual model may be omitted.
  • Such information may e.g. be entity information utilized for converting the extraction entities into conceptual entities.
  • Such information need no longer be present in the storage model as the entities are now conceived as unique entities.
  • the entities stored in a database according to the storage model may (and should ) instead be used for statistical purposes.
  • the conceptual model and the storage model may be more or less overlapping but, preferably, these formats should be dealt with separately, thereby obtaining the possibility of reusing the storage model and even the conceptual model in other applications.
  • the strict separation between the applied data models facilitate the individual models to be modified individually without considering interaction with the other models under some circumstances.
  • An example of such a simple modification of a model is the modification of a classification module which may basically be established without any modification of other modules as long as no new entity attributes have been introduced or removed.
  • a part of the extraction model may be global or at least multiple in the sense that this part of the model may contain general plug-ins of the extraction model applicable for many or all data sources to be accessed.
  • An example of such general plug-ins may e.g. be a language dictionary defining different applicable languages, e.g. English, Japanese, French or Danish.
  • the language dictionary may contain a domain-specific dictionary focussing on the entities characterizing the domain.
  • Fig. 3 illustrates the process of establishing a domain processor according to a preferred embodiment of the invention.
  • This domain may e.g. be a domain comprising boats offered for sale which are either used or new.
  • the boats are offered for sale from different web-based market places, typically the homepage of a dealer or e.g. private homepages.
  • web-based data sources may be supplemented by e.g. direct reading in a dealer's database, e.g. by means of ODBC based reading. Nevertheless, the domain should basically always be located in at least two different web-based data sources.
  • the web-based data source may typically be accessed without the consent or knowledge of the web-based data source owner. Consequently, there are no strict sign-up requirements by the data source owner. Therefore, the data fundament of the domain is huge, insofar it more or less includes all entities offered for sale in the complete worldwide web.
  • the domain modeller DMR outputs a specific Domain model DM needed for the different software modules, also named elements, to be used when establishing the query processor for the domain.
  • the domain model DM may comprise a knowledge base describing different general features and aspects of the invention so to speak.
  • Such a general knowledge "container" benefits from the fact that the knowledge describing the domain may be established centrally and thereby obtain a compact knowledge structure which may be modified centrally and basically without dealing with complicated details of the different query processor elements.
  • the domain model represents a knowledge structure that may be accessed by the different query processor elements simply by defining a so-called plug-in to the individual or some of the query processor elements.
  • the plug-in may represent a domain reading structure, e.g. JANA-code, adapted for reading a certain part of the domain suitable for the establishment and functioning of the element. Therefore, different elements may utilize different parts of the knowledge.
  • the centrally organized knowledge may be modified centrally, thereby inferring that all elements automatically utilize an updated knowledge base with little or typically no modification of the elements or the plug-ins.
  • some general knowledge may evidently be decentralized, i.e. put into the individual query processor elements.
  • the central knowledge base, or the domain model DM should be maximized.
  • a domain model DM may e.g. comprise a reference product catalogue describing all known products of the domain, e.g. a list of different known car models and variants of such models.
  • the domain model DM may comprise mappings between different entity models applied by the query processor, e.g. conversion mappings between extraction entities, conceptual entities and storage entities.
  • the domain model may e.g. comprise the extraction, conceptual and storage models.
  • the domain model may comprise language dictionaries, both domain-specific and more general dictionaries.
  • a change in the domain model may be reflected uniformly in the complete query processor.
  • the next step, Create Query Processor CQP initiates the combination of different elements by means of a Query Processor Modeller QPM.
  • Some of the elements combined by the Query processor Modeller QPM are established by the domain modeller DMR and some of the components are general preestablished elements.
  • Other elements to be used may e.g. be robots intended for accessing the data of the individual sites.
  • the next step Create Accessors CA, initiates the assignment of individual robots to specific data sources of the domain.
  • a detailed description of such a robot-generating program may be found in PCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and is hereby incorporated by reference.
  • the last step, Maintenance involves the establishment of different procedures intended for maintaining the query processor.
  • Such procedures may e.g. be establishment of a robot and system monitoring.
  • Such monitoring may e.g. include the monitoring of the load of the software elements/modules and whether the robots actually fit the sites, etc.
  • such procedures may include modifying or exchanging robots if such actions are considered necessary.
  • Figs. 4 to 6 illustrate the principles of a domain modeller according to one embodiment of the invention.
  • the relations between the table of the database are made in a selectable "edit" environment.
  • a combined view/edit environment is applicable within the scope of the invention.
  • the illustrated domain modeller comprises an interface having a menu bar comprising four different selectable menus File, Edit, Niew and Mapping.
  • Fig. 4a illustrates that the menu Niew has been selected.
  • the Niew menu which is a Relationships Window, may comprise several menu items: Storage model, Extraction model, Conceptual model and submission model.
  • the models define the different entity models adapted by the complete query processor.
  • Different kinds of entity models and definitions of entity models may be adapted within the scope of the invention.
  • database model may also be referred to as a storage model.
  • the view area NA appearing when selecting Storage Model Niew illustrates the basic components of the database attached to the domain by means of visual indications of relations between the tables.
  • the database model defines the structure of a database intended for storage and handling of the entities of the domain.
  • a database model is typically a relational database rather than a flat-file database in order to accommodate the knowledge obtained by the query processor.
  • the Relationships window may be in different "show relationships"- modes, e.g. "Show All Relationships” or “Show Direct Relationships”.
  • the first mode shows all tables of the current database.
  • the other mode shows the tables of the database within the currently selected domain. When selecting the available tables, the viewer will show the relationships to all tables related directly to the selected table.
  • this viewing area NA may operate like known visualizing tools adapted for viewing relations between tables of relational databases.
  • the viewer is in the second mode.
  • An open domain model intended for attachment to a PC distributing domain comprises a PC Equipment table PCE.
  • the illustrated PCE table comprises an ID, DealerlD, ProdID and Price. The first is a primary key to the PCE-table, while DealerlD and ProdID are foreign keys to the tables DCAT and PCAT, respectively.
  • the PCE table refers to a product catalogue PCAT and a dealer's catalogue DCAT.
  • the product catalogue PCAT is a table of the products attached to the domain and intended for sale.
  • the dealer's catalogue DCAT is a table of the dealers attached to the domain.
  • the PCE table refers to price.
  • PCE table would typically be more complex, e.g. comprising relations of tables comprising further product characteristics such as color, comments to the products, currency, URL etc.
  • the Price field definitions appear as a dialogue box PD. This field may be applied for defining the Price field.
  • the illustrated Price field has the name "Price" and the field type may be selected as a string or an integer, here selected as an integer.
  • Fig. 4b illustrates that the menu Mapping has been selected.
  • the Mapping menu which is a table or Relationships Window, may comprise several menu items, e.g. the illustrated EM to CM, CM to STM, STM to CM or CM to SM.
  • mappings deal with mappings needed for retrieval of entities from a data source, while the two latter deal with writing, i.e. submission to a data source (e.g. filling-in of a form in a data source to place an order, filling-in of a search form or e.g. insertion of a new entity in the data source.
  • the EM to CM Extraction model to Conceptual model mapping, defines the mapping between the entities and/or attributes retrieved according to the extraction model EM into entities and/or attributes according to a conceptual model CM.
  • the CM to STM Conceptual model mapping, defines the mapping between the entities and/or attributes held according to the conceptual model CM into entities and/or attributes according to a storage model STM.
  • the STM to CM Storage model to Conceptual model mapping, defines the mapping between the entities and/or attributes represented according to the storage model STM into entities and/or attributes according to conceptual model CM.
  • the CM to SM Conceptual model to submission model mapping, defines the mapping between the entities and/or attributes represented according to the conceptual model CM into entities and/or attributes according to a submission model SM.
  • mapping from one model to another may be performed in several other ways than the table-based method illustrated in fig. 4b within the scope of the invention.
  • the mapping may include direct transformation of a number of associated attributes into a unique object in a relational manner. That is; the bundle of associated extractions is transformed as a whole into one unique object instead of applying the above-mentioned method of initially mapping the extraction attributes into conceptual attributes, and then subsequently establish a unique entity on the basis of a reference system, e.g. a product catalog defining different possible entities of the domain.
  • a reference system e.g. a product catalog defining different possible entities of the domain.
  • the mapping from the extraction model to the conceptual model preferably involves a classifier (i.e. a classification system) that will map extracted entities into conceptual entities according to a product catalogue. That is; the product catalogue may contain various (generic) conceptual entities existing in the domain.
  • the conceptual entities are made unique according to the extracted entities by transferring various attribute values from the extracted entities to the conceptual entities, such as price, URL, currency etc. This transfer of values from extraction entities to conceptual entities is done by selecting and configuring a transfer function that maps one or more extraction model attribute values into one or more conceptual model attribute values.
  • the view area appearing when selecting EM to CM attributes illustrates the attributesto be converted into conceptual entities, e.g. in the form of a table.
  • the extraction attribute "Make” has been selected, thereby opening a mapping table where EM-CA A has been selected.
  • the table comprises different applicable mappings between extraction attributes to conceptual attributes, here exemplified by the strings Ferrari, Fiat and Ford converted into integers 17, 18 and 19, respectively.
  • Fig. 5 illustrates that the PCE table has been double-clicked.
  • a PCE dialogue box appears PCED. This dialogue box facilitates editing of the PCE table defining data, e.g. by insertion of SQL-statements associated with the PCE table, attribute names, etc.
  • the table may be generated by selecting the Table Generate tag, TAG.
  • the storage model may be modeled by known prior art database-generating tools.
  • the important thing when dealing with the database model for the specific domain is to include all necessary attributes and establish an well-structured, easily searchable and quickly accessible database. It should be noted that this structuring of the domain database may be performed independently of the rest of the domain query processor, as long as the necessary entity attributes have been defined.
  • Fig. 6 illusfrates the Domain Modellers Exfraction model viewer.
  • the database in the the database modeller viewer may be regarded as the representation of entities "understood" by the query processor
  • the domain extraction model to be made by the extraction modeller may be regarded as the definition of relevant attributes included in the syntax of "raw" string-based data of the web-based data sources to be accessed as defined by the data source provider.
  • Fig. 7 illustrates the principles of an applicable robot-establishing program according to one embodiment of the invention.
  • the robots to be used in the query processor may be established and attached to a certain data source in many ways within the scope of the invention.
  • the main principles of the robot generator mentioned below is to make a robot and assign it to a certain site containing data relevant to the domain of interest, i.e. assign the robot to the site by means of an address, e.g. URL address, and generate a data reader (the robot) capable of reading the data of interest contained in the data source, e.g. a web-site, and transfer these data in a certain data format to the central control of a query processor in response to a query.
  • an address e.g. URL address
  • a new and unique robot has to be made for each web-based data source to be queried.
  • the nodes may be arranged in straight-forward paths. However, the nodes are typically arranged in branched IF-THEN paths.
  • the robot generating program is adapted for establishing sequential access of a web- based data source.
  • the control of this sequential reading is e.g. established by means of a graphical path of node processors NP, each node processor NP performing some configurable processing of its input.
  • the nodes are sequenced in such a manner that a web-based data source, e.g. in HTML, may be traversed and data extracted or submitted. It should be noted that high-volume establishment of such robots is somewhat time-consuming. Hence, the robot-generating programs should be very user friendly or even automatic.
  • a nodeprocessor selector NPS is adapted for configuration to the current application in the node processor configuration view NPC. Moreover, the nodeprocessor may be attached to a certain document area by means of a document range definer DRD.
  • the robot maker viewer comprises a document view which e.g. may be adapted for viewing the XML text of the data source or a part of the data source.
  • the robot maker outputs robots and each robot is specialized in operating one dedicated web-based data source.
  • the robot outputs entities according to the extraction model(s), i.e. non-classified or interpreted data, to a central control, e.g. to a transformer query processor element.
  • the extracted strings may be converted into coded representations, e.g. as objects stored in a database, and the extracted data may then be classified.
  • the established robots may contain transforming means for transformation of extracted data into a conceptual representation, e.g. conversion of a sequence of strings "Ford","2.0”,”red” into an object stored in a database as a "car”, which is a red Ford having a 2.0 liter engine. It should be noted that the preferred embodiments of the invention benefit from a more central transformation of entities into conceptual data, thereby reducing the requirements of maintaining decenfral transformers.
  • a query processor modeller is intended for establishment of the "transfer function" between the user, the web data accessing machine and the data located in a web-based data source.
  • the meaning of "transfer function” involves a data flow from the user towards the data acessing machine and/or the web-based data sources.
  • the transfer function involves control of the flow of data from web-based data sources towards the web-data extraction machine and/or the user.
  • this functionality is referred to as a query process flow and the established "accessing machine" is referred to as a query processor.
  • the query processor will preferably be adapted for processing of a certain well-defined domain, e.g. a car domain. It should be noted that some kind of overlapping between the domains may be acceptable in the sense that one query processor may e.g. comprise query processor elements accessing data from different domains. Preferably, the domains should be separated since a query processor should only deal with one domain.
  • the query processor will be defined in a query process graph below by means of a visual programming tool.
  • Fig. 8 illustrates a preferred embodiment of the invention involving a visual programming tool for establishing the above-mentioned transfer function by means of a query processor graph QPG.
  • the query processor modeller comprises a visual and programmable editor.
  • the illustrated editor facilitates the combination of a number of Query Processor Elements QPE into a query processor graph.
  • the query processor elements may be of different types defined by their main functions.
  • An example of a query processor element QPE may e.g. be a robot, such as a robot query processor element RQPE.
  • a robot query processor element RQPE is adapted for accessing web-based data sources upon request.
  • a single robot may typically be attached to one single data source.
  • a robot query processor element may also be adapted for reading only or writing only if suitable.
  • a query processor element QPE may e.g. be a cache, such as a cache processor element CQPE.
  • a cache such as a cache processor element CQPE.
  • Such an element is adapted for returning a response to a query or it may guide the query further on in the process if the cache contains no answer to the query.
  • the cache element CQPE returns a part of the response which may be established by means of the entities already contained in the cache, and forward a query further upstream in the processor in order to establish the rest of the response.
  • a further example of a query processor element QPE may e.g. be a so-called mediator query processor element MQPE.
  • This element is adapted for distributing an incoming query to other query processor elements and for gathering the response returned by these queried processor elements, e.g. robots, and returning the answer back to the processor which queried the mediator MQPE.
  • Another query processor element may be of a trigger type, i.e. a trigger processor element TPE adapter, for triggering a certain operation or a query.
  • the trigger processor element TQPE is adapted for initiating a certain action, e.g. an automatically scheduled initiation of a query, an automatic trigger processor element ATPE.
  • Another applicable trigger processor element TPE may e.g. be a trigger adapted for initiation of a query upon request by a user, i.e. a manually activated trigger MTPE.
  • the latter trigger processors represent another type of query processor elements than the first.
  • the trigger query processor element is not activated by an incoming query but at its own initiative.
  • a manually operated trigger element MTPE may be regarded as an element including a user.
  • the figure illustrates a query processor adapted for processing a certain domain.
  • the domain comprises three web-based data sources.
  • the illustrated query processor QP is constructed and monitored by means of a visually programmed drag- and drop query processor graph QPG.
  • the establishment of this query processor graph may also include the configuration of the individual query processor elements.
  • the configuration of e.g. a robot may thus be performed by means of an embedded robot modeller which may be activated via the Query Processor Modeller.
  • the illustrated query processor graph comprises three robot query processor elements RQPE1, RQPE2 and RQPE3.
  • Each robot is attached to a specific, dedicated data source, i.e. determined by the URL of the data source.
  • Each robot is made automatic or semi-automatic by means of a robot modeller RM, both referred to as robot maker and robot modeller RM in this application.
  • the robots RQPE1, RQPE2 and RQPE3 are adapted for accessing, i.e. reading and/or writing, the associated data source (not shown) according to a readwrite pattern defined and associated with the individual robots. This defined read/write pattern enables each robot to access the corresponding data source.
  • there is a one-to-one relationship between the robots and the data sources i.e. one web-based data source is accessed by one robot only.
  • the read/write pattern in the robot is typically highly specialized in order to fit the specific data structure of the associated data source. It should be noted that web-based data structures are typically programmed and structured independently, e.g. in HTML tables or other more or less unforeseeable data structures.
  • the establishment of a read/write pattern may also be referred to as a creation of a robot.
  • the invention offers different web-based data source owners the possibility of entering their data in a data structure which is easy to access by the query processor.
  • Such easy access may e.g. be provided to the data source owners in the form of design requirements if they want their data source to be roboted.
  • the query processor may also include data-accessing robots, e.g. by featuring direct ODBC access to the database of the data owner.
  • a standard robot type to such generalized data source if so desired.
  • requirements to the data source owner will be kept low, thereby offering the possibility of accessing numerous different data sources.
  • this robot is dedicated to a specific web-based data source and communicates with a query processor element in the form of a cache CQPEl.
  • the cache may be activated by a trigger TQPE1.
  • This trigger element TQPE1 may initiate a certain trigger-defined query subsequently performed by the robot query processor element RQPE 1.
  • the cache element CQPEl may e.g. be provided as an encapsulation of the robot's data source. This direct and local pre-cache operation on one data source provides the possibility of reducing access time to certain data of the data source operated by the robot RQPE1.
  • this facility is attractive for the purpose of boot- strapping the cache with entities (data of the data structure of the data source) that are often queried.
  • the trigger element TQPE1 should typically ensure that data often queried are updated regularly according a preferred embodiment of the invention in order to avoid a completely empty cache.
  • this control may also be integrated in the cache CQPEl within the scope of the invention.
  • the cache CQPEl is a coupled mediator query processor element MQPEl. The functioning of the mediator MQPEl will be described below.
  • the cache element CQPEl may e.g. be adapted with the purpose of reducing the load on the specific site roboted by the robot element RQPE1 in a more strict sense, as the cache may be adapted for returning entities stored in the cache without querying the robot irrespective of the fact that the entities stored in the cache are not completely updated.
  • the local cache element CQPEl may thus set a minimum interval for activation of the robot RQPE1, thereby ensuring that each and every query not does necessarily result in a query of the data source.
  • This application of a cache may ensure that a certain site is not overloaded by the robot.
  • a further robot query processor element RQPE2 is dedicated to a specific web-based data source and communicates with a query processor elements in the form of a transformer TAQPE 1.
  • the transformer element TAQPE 1 is adapted for receiving a query from a user-activated query element MPTE located downstream to the located data sources located upstream.
  • the illustrated transformer element TAQPE 1 channels an unmodified query further on to the robot query processor element RQPE2.
  • the response may be modified by the fransformer before being returned to the connected mediator MQPEl.
  • Such a modification may e.g. be established as a trivial mapping of km: 34 to be read as km: 34,000 or the like.
  • utilization of fransformers for such purposes should be made when certain data sources, e.g. web-site, use certain terms deviating from the general terms applied by other data source providers within the domain.
  • the system comprises a further robot query processor element RQPE3 dedicated to a specific web-based data source.
  • This robot RQPE3 is directly coupled to the mediator MQPEl.
  • the mediator MQPEl is applied for branching the query process path into several different paths, e.g. three as illustrated. During the return path, the mediator collects the information obtained by the queried robot branches and returns the data to a transformer element TAQPE2.
  • This transformer element TAQPE2 defines a principle borderline between the upstream robots RQPE1, RQPE2 and RQPE3 and the downstream user U as the transformer performs a fransformation of data retrieved by the robots into conceptual data according to a conceptual model associated with each robot. These conceptual data are handed over from the transformer element TAQPE2 to a cache query processor element CQPE2. Typically, the conceptual model should be common for all involved elements dealing with entities in a conceptual manner.
  • the cache element CQPE2 may be regarded as the main storage means for the query processor QP intended for storage of the currently updated entities retrieved by the robots of the query processor.
  • the nature of the cache may vary significantly from application to application.
  • the cache may comprise only recently entered conceptual data, while caches in other applications may comprise a more or less complete database of the entities comprised in the data sources associated with the domain processor.
  • the cache CQPE2 may be activated by a trigger query processor TQPE2.
  • This trigger may e.g. be adapted for refreshing the cache CQPE2 according to scheduled trigger criteria.
  • the trigger criteria may both be established on the basis of user query statistics and/or statistics associated with data stored in the cache CQPE2.
  • the data contained in the cache CQPE2 are conceptual data.
  • the cache CQPE2 are coupled to a user interface represented by a manually operated trigger element MTPE located downstream of the query processor graph via a tracking module TMO adapted for gathering and storing data.
  • the gathered data are used for keeping track of the history of data contained in the data sources of the domain and for establishing and maintaining query statistics.
  • This tracking module is a combination of a number of query processor elements QPE.
  • the module comprises a storing query processor element SQPE1 adapted for writing data into a database query processor element DBPEl.
  • the database DBPEl comprises entities retrieved from the associated domain of data sources and the entities are stored according to a preferred storage model.
  • the storage may also contain history-describing data or data from which the entities may be deduced.
  • the storing query processor element SQPE1 may be activated by both a user query or a trigger query TQPE3.
  • the frigger query processor element TQPE3 is intended to maintain and establish desired data, such as prices of cars or the like and thereby offer the possibility of registering if an entity comprised in a data source covered by the domain processor has offered another price etc.
  • the illustrated query processor path comprises a transformer element TAQPE3.
  • This transformer element is primarily responsible for transforming conceptual data into storage data in the database DBPEl.
  • query processor elements should function without any knowledge of the context.
  • a cache query processor element may implemented in many ways.
  • the cache should (as a traditional cache) contain some of the entities recently read from one or some of the data sources.
  • the idea of applying a cache should generally be that of reducing access time to the data sources.
  • the cache may be controlled in many ways, depending on the purpose.
  • the cache may be activated from time to time by an automatic frigger with the purpose of refreshing the content of the cache with respect to certain types of entities. Triggering of the cache would then imply that the triggered cache forwards a query to the relevant data sources of the domain, collects the response and writes the returned entities into the memory.
  • triggering of the cache may be constructed in numerous ways within the scope of the invention as long as the main purpose of the triggering is to obtain the best possible performance of the current application.
  • the cache should not be applied for entities exceeding a certain age, e.g. 3 minutes, if the nature of the entities contained in the domain are changing quite often.
  • An example of advantageous triggering according to the invention may e.g. be that of triggering the cache with the purpose of refreshing the cache with entities often queried by the users of the query processor. This boot-strapping ensures that start-up time is reduced by maintaining the often queried entities in the cache.
  • the statistical control may therefore imply triggering of the cache which may vary dynamically, i.e. be controlled by the user request.
  • a further possible approach may e.g. be triggering of the whole domain once a day which means that all relevant data contained in all data sources of the domain are read into the cache and that all data are updated at least once a day.
  • the cache is controlled in a manner resembling a kind of persistent database.
  • the fransformer query processor element is basically an element which may transform an incoming query or entity to another query or entity. Hence, the fransformer works both ways: downstream and upstream.
  • Applicable fransformer elements may e.g. be transformers transforming raw extracted text-string entities received from upstream (e.g. from a robot) into entities in a conceptual representation of the entities read from the data-source according to a preferred embodiment of the invention.
  • fransformer elements may e.g. be a fransformer receiving conceptual entities and outputting the entities according to a data storage model.
  • a further, and more simple fransformer may e.g. be a mute transformer element, arranged in front of a robot or in a certain branch. This mute may be adapted for blocking the entity or query stream in the respective branch.
  • Such a mute transformer may e.g. be advantageous if a certain robot must receive maintenance, thereby offering the possibility to an operator of maintaining a query processor to modify or exchange a certain robot without modifying the query process graph. Hence, a robot may be maintained without simultaneously receiving a stream of queries.
  • the transformers may by arranged in many different positions in the query graph within the scope of the invention.
  • the trigger query processor element comprises means e.g. for invoking a query in an element associated with the trigger.
  • the trigger may then comprise a schedule adapted for defining fixed time intervals which determine when to query the associated element, e.g. a cache.
  • the trigger may comprise calculation algorithms adapted for calculating suitable trigger conditions, e.g. when to query, and/or how to query. Therefore, the trigger may advantageously comprise statistical evaluation means.
  • a mediator query processor element MQPE is adapted for distributing an incoming query to other query processor elements and for gathering the response returned by these queried processor elements, e.g. robots, and returning the answer back to the processor which initially queried the mediator MQPE.
  • the mediator may show several different levels of intelligence, from the somewhat simple and uncomplicated branch element simply distributing an incoming query to a number branching elements, to quite intelligent elements capable of distributing an incoming query to the branches most likely comprising the queried entities.
  • a mediator may deal with data according to any representation, e.g. conceptual entities, storage entities or exfraction entities.
  • query processor elements may e.g. be MESQPE Messenger query process elements.
  • the messenger elements MESQPE are adapted for monitoring the process of the individual QPE's or between the QPE's. These messengers may e.g. be adapted for returning a processor's state-describing parameters to an operator responsible for the query processor or the query processor element. Messengers may e.g. be adapted for providing statistical material or fault warnings.
  • the conceptual building of the domain processor may be performed in many different ways. This means that the word "element” and the word "graph” should in no way restrict the scope of the invention in the sense that the wording primarily reflects the functional understanding of the elements.
  • a robot processor comprising a fransformer (i.e. the robots read exfraction entities, transform the data to conceptual entities, and return the entities to a central control, e.g. a database; e.g. a cache comprising a fransformer, e.g. cache comprising a trigger, etc.)
  • a central control e.g. a database; e.g. a cache comprising a fransformer, e.g. cache comprising a trigger, etc.
  • a further advantageous messenger may e.g. be a messenger adapted for raising a flag to the operator managing the query processor when the entities to be transformed into conceptual data are not contained in a reference product catalogue, thereby offering the operator the possibility of updating such a catalogue locally or globally.
  • Other advantageous elements may e.g. be elements directly adapted for reading a well-known database, i.e. by means of ODBC drivers, thereby making it possible for extracted reading of "foreign" web-based data sources to be supplemented by readings from few or several databases comprising entities included by the domain.
  • each of the present elements may be activated by clicking on the element in the editor, thereby initiating/activating the element- creating application.
  • the RobotMaker application will be activated by double- clicking on a selected robot, e.g. RQPEl, and the Domain Modeller will be activated when double-clicking on e.g. the transformer TAQPE2.
  • the graph may be saved, thereby maintaining the properties of the complete query processor QP.
  • query processor elements are defined by means of the domain modeller DMR and the Robotmaker RM.
  • some of the query processor elements are domain independent in the sense that they may be included in the query processor graph of several different types of query processors DP, e.g. trigger processor elements with little or no modification, whereas other query processor are somewhat domain specific.
  • An example of a domain independent query processor element may e.g. be the aforementioned mute fransformer element which may be applied by any desired domain without pre- modification.
  • the Query Processor Modeller may even, and preferably, include query processor execution tools included in the illusfrated "view" setup.
  • Such a setup may include the illustrated view which, when in run mode, illusfrates the running state of the query processor and the individual elements.
  • An example of such intuitive processing is that the individual elements change color according to the state, e.g. within a color range from white to red, depending on the load of the elements.
  • the interface e.g. the illustrated view
  • the latter feature may ease operation of the system significantly due to the fact that the absence of an entity flow between the elements does necessarily indicate that a fault-condition has occurred simply because the element is not queried.
  • Determination of a "clear road" between the elements may e.g. be established by forwarding dummy (testing) queries between elements at certain intervals.
  • the Query Processor Modeller may include submenus facilitating specialized execution of the query processor. Such a submenu is illustrated in fig. 9, and it may e.g. be selected by the "run" drop down menu of the Query Processor Modeller. Moreover, the Query Processor Modeller may feature specialized visualization of certain groups of query processor elements. Thus, a "robot element” viewer may be activated, thereby offering the operator the possibility to concentrate fully on his task, e.g. maintenance or design of robot elements and thereby ignore elements dealt with by other operators.
  • a query processor according to the invention may easily comprise several hundreds of robots.
  • Fig. 9 illusfrates a possible user interface of a domain processor DP.
  • a domain processor is adapted for supporting maintenance of one or several query processors QP when established.
  • the illusfrated user interface of a domain processor comprises a free-based structure monitoring area.
  • One domain processor may control execution and maintenance of several different domains.
  • This area monitors a first level of node-represented servers NLl.
  • This level illusfrates different servers applied, Webserver, RobotServerl, RobotServer2.
  • a second node level NL2 shows the current domains controlled by the domain server, e.g. Cars, Yachts and PC's.
  • a third level NL3 illustrates different selectable query processor state-indicating functions, e.g. queries, triggers and messages. The function Messages has been selected in the illusfrated view.
  • server referred to in level 1 NLl may both reflect a physical location of a query processor with respect to a server, or it may refer to a kind of virtual server comprising several different servers, each processing their part (e.g. element or groups of elements) of the query processor.
  • the illusfrated viewer comprises a message viewing area MVA adapted for viewing messages forwarded automatically by e.g. different unique elements of a query process path or groups of elements.
  • the attributes of listed messages may e.g. be chosen as the illusfrated Title, Date, Priority, Origin Element.
  • an operator may e.g. establish a filtering of messages from a certain element, Original Element, or of groups of elements, e.g. mediators or transformers.
  • the viewer comprises a message detail window MDW.
  • This viewer may illustrate details about a single message or groups of selected messages in the messages view area MVA.
  • Each message may e.g. be associated with a startup- facility with the purpose of activating the editor or editors associated with the individual message.
  • a query element program e.g. a robot editor, may be started directly from the domain processor DP, e.g. by automatically importing the data from an element selected in the viewer such as a specific robot.

Abstract

The invention relates to a domain processor (DP) comprising: at least one robot modeller (RM); at least one domain modeller (DMR); at least one Query Processor Modeller (QPM), said robot modeller (RM) comprising: means for modelling at least one computer-based robot (R); said at least one robot (R) being adapted for accessing at least one web-based data source (DS); said at least data source (DS) comprising entities comprised in a predefined domain (D); said at least one domain modeller (DMR) comprising: means for modelling at least one domain model (DM); means for establishing at least one extraction model (EM) associated with a chosen domain; means for establishing at least one storage model (STM) associated with said chosen domain, said at least one Query Processor Modeller (QPM) comprising: means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE); means for combining at least two of the selected Query Processor elements (QPE); means for executing said associated query processor elements on at least one computer system (CS); at least one of said query processor elements (QPE) of the associated query processor elements being a Robot query processor Elemet (RQPE) adapted for accessing at least one web-based data source (DS).

Description

A QUERY PROCESSOR
The invention relates to a query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor.
Field of the invention The invention deals with accessing, i.e. reading and/or writing in data sources associated with a certain domain. The data sources are typically web-based which basically means that the data of the data source are made available to the user according to a serial transfer protocol, e.g. http via the Internet. The serial transfer of the data made available to the user is sometimes easily conceivable to a user, especially when dealing with a simple and quite specific request. A problem with data retrieval from web-based data sources is that the user must typically find one or several data sources comprising the relevant data. This search may be very time consuming and typically non-exhaustive due to the fact that several data sources may easily be overlooked. Moreover, the user has to perform further queries on each site and these queries typically have to be made different from site to site.
This problem has been dealt with in the prior art by applying robots and agents with the purpose of collecting information within a certain domain of interest and by providing these domain data or an extraction of the data to a user in a more straightforward searchable way.
A problem with the known systems applying agents is that the agents require some kind of knowledge about the data source structure, and the use of the agent requires the accept of the owner of the data source due to the fact that an agent may dig into a data source more or less out of control. Another problem with the known systems applying robots is also that the robots require some kind of knowledge about the data source structure, e.g. knowledge of the structure of data containing an HTML table of a web-based data source, and if this knowledge is not available, the programming of such robot is quite difficult. Hence, the applicable number of robots retrieving data from such data sources is limited as is the data of interest in the domain.
It is an object of the invention to provide a domain processor capable of processing even large-scale domains.
Summary of the invention
The invention relates to domain processor (DP) according to claim 1 comprising
-at least one robot modeller (RM)
-at least one domain modeller (DMR),
-at least one Query Processor Modeller (QPM)
said robot modeller (RM) comprising
means for modelling at least one computer-based robot (R),
said at least one robot (R) being adapted for accessing at least one web- based data source (DS),
said at least one data source (DS) comprising entities comprised in a predefined domain (D),
said at least one domain modeller (DMR) comprising means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM),
means for establishing at least one extraction model (EM) associated with a chosen domain,
means for establishing at least one storage model (STM) associated with said chosen domain,
said at least one Query Processor Modeller (QPM) comprising
means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE),
means for combining at least two of the selected Query Processor elements (QPE),
means for executing said associated query processor elements on at least one computer system (CS),
at least one of said query processor elements (QPE) of associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
When, as stated in claim 2, the domain processor (DP) comprises at least one query processor maintenance manager (QMM), said at least one query processor maintenance manager (QMM) comprising means for executing at least one query processor (QP) established by the domain processor, an advantageous embodiment has been obtained. According to the invention, the domain processor may advantageously comprise a tool for running a query processor established by the domain processor. The query processor maintenance manager may thus be adapted for running the query processor on one or several servers.
Such a manager may include a visual tool illustrating the running state of the query processor and the individual elements. An example of such intuitive processing is that the individual elements change color according to their state, e.g. within a color range from white to red, depending on the load of the elements.
Moreover, the manager should preferably illustrate basic on-off conditions visually, i.e. illustrate actively if an element is working properly, and whether entities are transferred between the query processor elements and whether entities may actually be transferred between elements. The latter feature may ease operation of the system significantly due to the fact that the absence of an entity flow between the elements does not necessarily indicate that a fault-condition has occurred simply because the element is not queried.
Determination of a "clear road" between the elements may e.g. be established by forwarding dummy (testing) queries between elements at certain intervals.
Moreover, the Query Processor Modeller may include submenus facilitating specialized execution of the query processor.
Moreover, the invention relates to a robot modeller (RM) according to claim 3 comprising
means for modelling at least one computer-based robot (R),
said at least one robot (R) being adapted for accessing at least one web-based data source (DS), said at least one data source (DS) comprising entities comprised in a predefined domain (D).
Moreover, the invention relates to a domain modeller (DMR) according to claim 4 comprising
means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM),
means for establishing at least one extraction model (EM) associated with a chosen domain,
means for establishing at least one storage model (STM) associated with said chosen domain.
Thus, a domain model represents a structured way of defining properties of different aspects of a domain.
A domain model may e.g. comprise an extraction model, i.e. a definition of relevant entities and attributes to be looked for in the web-based data source. It should be noted that the extraction model may primarily describe (or mask) the data source on the basis of text strings and combinations of such strings.
A chosen domain may e.g. be "cars offered for sale".
When the domain modeller comprises means for establishing reference mapping between extracted data obtained according to said extraction model (EM) and a conceptual representation of said data, a further advantageous embodiment of the invention has been obtained. When said reference mapping defines a set of reference entities describing a number of entities (E), said entities having attributes, a further advantageous embodiment of the invention has been obtained.
A set of reference entities may e.g. be a product catalogue.
Reference mapping may facilitate the possibility of adding knowledge to the retrieved entities. Such information may e.g. be information deducible from a reference product catalogue. Thus, if an entity is matched to an entity type of the product catalogue, the entity may be modified, e.g. as a validation, corrected or inserted as additional information about the entity.
A correction may e.g. be that one of the attributes of the Porsche retrieved above is false according to the product catalogue. This false attribute may be detected in several different ways within the scope of the invention. The reference product catalogue may e.g. initially reveal that no Porsche having a 3.0 liter engine has been made with diesel engine. Moreover, the product catalogue may reveal that no Porsche has been made with a diesel engine, thereby raising the probability that the data source provider has made a mistake. The wrong attribute "Diesel" may then be corrected.
Furthermore, the reference entities may be applied for different variants of classification and validation.
When the domain modeller (DMR) comprises means for establishing at least one language domain dictionary (LDD), a further advantageous embodiment of the invention has been obtained.
When said at least one language domain dictionary (LDD) maps the language of the extracted entities into the general language of the query processor (QP), a further advantageous embodiment of the invention has been obtained. The general language of the query processor may e.g. be regarded as the "language" defined by an object-oriented conceptual model associated with the query processor. Such language may e.g. be a preferred language or coding chosen as the general language. Hence, the language domain dictionary may e.g. make it possible to have an entity that reads read "wagen" or "bil" transformed into an instance of an object car .
When, said domain modeller (DMR) comprises means for establishing a set of reference recognition patterns, a further advantageous embodiment of the invention has been obtained.
The set of reference recognition patterns may e.g. comprise character patterns (also known as regular expressions) or character structures (even pictures) to be applied when identifying attributes and entities, e.g. Ltd., Corp or A/S indicating that a company attribute or entity is associated with the character pattern in English, American English and Danish, respectively.
Evidently, such reference patterns will typically be domain specific or at least language specific.
Moreover, the invention relates to a query processor modeller (QPM) comprising
means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE),
means for combining at least two of the selected Query Processor elements (QPE),
means for executing said associated query processor elements on at least one computer system (CS), at least one of said query processor elements (QPE) of the associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
According to the invention, a domain-accessing system may be established by means of general components. Moreover, the components may rely on general knowledge about the domain of interest, thereby facilitating very fast establishment of domain- accessing systems.
When the Query Processor Modeller comprises a graphical user interface (GUI) in the form of a visual programming tool, a further advantageous embodiment of the invention has been obtained.
When said set of query processor elements (QPE) comprises at least two different types of query processor elements
at least one type being a robot query processor element (RQPE) and at least one type being a trigger query processor element (TQPE), a further advantageous embodiment of the invention has been obtained.
Moreover, the invention relates to a query processor maintenance manager (QMM) comprising
means for executing at least one query processor (QP) established by the domain processor.
According to the invention, the query processor maintenance manager should be adapted for controlling the processing of an established query processor.
When said maintenance manager (QMM) comprises means for monitoring the state of at least one query processor element (QPE) or the performance of at least one query processor element (QPE), a further advantageous embodiment of the invention has been obtained.
When said domain processor maintenance manager (QMM) comprises means for evaluating the data flow between query processor elements (QPE) of a query processor path, a further advantageous embodiment of the invention has been obtained.
When said domain processor maintenance manager (QMM) comprises means for running and visual monitoring of the individual modules of a query processor, a further advantageous embodiment of the invention has been obtained.
When said domain processor maintenance manager (QMM) comprises means for running and visual monitoring of a query processor (QP) on element basis, a further advantageous embodiment of the invention has been obtained.
According to the invention, the elements may be advantageously monitored as visually separated elements.
Moreover, the invention relates to a web-robot, said robot comprising means for extracting information from web-based data sources (DS) in dependency of at least one extraction model (EM), said at least one extraction model comprising reference data structures defining entities and/or entity structures of data sources in a domain.
When said robot comprises at least one exchangeable plug-in, said plug-in comprising retrieving routines adapted for reading knowledge stored in said extraction model, said knowledge preferably being domain-specific, a further advantageous embodiment of the invention has been obtained. When said plug-in defines reference mapping between extracted data obtained according to said extraction model (EM) and conceptual representation of said data, a further advantageous embodiment of the invention has been obtained.
When said extraction model (EM) is shared between at least two robots, a further advantageous embodiment of the invention has been obtained.
Moreover, the invention relates to a query processor (QP),
said query processor (QP) comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM),
said query processor (QP) comprising at least three query processor elements (QPE),
at least two of said query processor elements (QPE) comprising a robot (RQPE)
said robot (RQPE) being attached to at least one data source (DS) said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE),
at least one of said query processor elements (QPE) comprising a trigger (TQPE) said trigger query processor element (TQPE) comprising means for establishing a query.
The web-based data sources are typically independent.
The trigger element may be both manually and automatically driven, i.e. by a query user or an automated query routine. When, at least one of the query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE), a further advantageous embodiment of the invention has been obtained.
Moreover, the invention relates to a method of establishing at least one query processor (QP),
said query processor (QP) comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM),
said query processor (QP) comprising at least three query processor elements (QPE),
at least two of said query processor elements (QPE) comprising a robot (RQPE),
said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE),
at least one of said query processor elements (QPE) comprising a trigger (TQPE),
said trigger query processor element (TQPE) comprising means for establishing a query.
said method comprising the step of
attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain, combining the selected query processor elements into a query processor (QP) by means of a graphical user interface (GUI).
It should be noted that the data source may both be regarded as an internal part or an external part of the query processor within the scope of the invention, depending on whether the associated data source is defined by its data or not.
When said graphical user interface (GUI) defines a query processor element path visually on a drag- and drop basis, a further advantageous embodiment of the invention has been obtained.
When at least one of the combined query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE), a further advantageous embodiment of the invention has been obtained.
Moreover, the invention relates to a method of establishing at least one query processor (QP),
said query processor comprising means for accessing data from web-based data sources (DS) of a domain by means at least one user interface (UI)
said method comprising the steps of selecting a number of query processor element (QPE)
at least one of said selected query processor elements (QPE) being a robot query processor element (RQPE),
at least one of said selected query processor elements (QPE) being a trigger query processor element (TQPE), attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain,
combining the selected query processor elements into at least one query path defining the data flow in the query processor (QP) between the user interface (UI) and the web-based data sources of the domain, said method comprising a further step of
customizing the at least one individual robot query processor element (RQPE) to the corresponding attached data sources (DS),
customizing at least one of the trigger query processor elements (TRPE) to the query processor (QP).
When, at least one of the combined query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE), a further advantageous embodiment of the invention has been obtained.
Moreover, the invention relates to a method of extracting data from a web-based data source (DS), said method comprising the steps of
-identifying and reading attributes and entities of a web-based data source,
-converting the read entities into instances of conceptual entities,
-verifying whether the read instances correspond with an entity reference base, (ERB).
According to the above-mentioned embodiment of the invention, very advantageous entity processing has been obtained. A conceptual model may also include a storage database model.
When, the method comprises at least one step of verifying whether the read instances correspond with an entity reference base, (ERB) on the basis of entities represented in said conceptual entity-representing format, a further advantageous embodiment of the invention has been obtained.
According to the invention, very advantageous processing of entities has been obtained. Hence, a conceptual check of the data may be performed on compact represented data, thereby reducing processing significantly. Hence, according to the invention, the micro-interpretation of the read entities and attributes is made separately, and prior to macro-interpretation of the entities.
Micro-interpretation according to the invention may be regarded as the reading of individual string-based attributes on a web-based data source. According to the preferred embodiment of the invention, the combination of read string-based attributes into entities may also be regarded as micro-interpretation preformed according the extraction model.
An example of micro-interpretation work is e.g. the job (typically performed automatically by software-based routines) of determining whether a read attribute is a "Ford" or a "Fiat". A further example is the determination of whether an engine is a 75 or 155 Hp engine.
Entities held in an extraction format are typically string-based, e.g. Fiat, "Fiat", FIAT, FIATH, etc.
Entities held in an conceptual format are typically held in an object-like format. Hence Fiat, "Fiat", FIAT, FIATH are all represented as a Fiat-type in the conceptual format. Such a Fiat type may typically involve an integer representation of a Fiat in old databases whereas new databases may represent Fiat, "Fiat", FIAT, FIATH as a "Fiat". Macro-interpretation according to the invention may typically be regarded as a syntax check performed on the basis of the complete and established instance. Such a check may e.g. be performed with the purpose of verifying whether the established instance of an entity is actually realistic, i.e. consistent.
Moreover, the conceptually held entities may easily be grouped and filtered and evidently be performed relatively easily.
Conceptual representation of the entities according to the invention is typically a object-oriented representation.
An example of macro-interpretation work is e.g. the job (typically performed automatically by software-based routines) of determining whether read attributes combined into an entity "Fiat", "120 Hp" and 2.0 liter engine are actually valid. Such a check performed on the basis of a reference base of known (valid) entity types, i.e. a product catalogue, may moreover be performed with the purpose of adding information to the checked instances of entities. Such procedure may be regarded as a deduction of information exemplified by an instance of a car, "Fiat", "155 Hp" and 2.0 liter. When compared with a reference product catalogue associated with the car domain, such a car may be deduced to be a turbo version, i.e. "Fiat", "155 Hp", "2.0 liter" and TURBO.
According to the invention, macro-interpretation may be performed on instances held in a conceptual format.
When modifying the verified instances according to the entity reference base (ERB) by adding information associated with said instances corresponding to said entity reference base, a further advantageous embodiment of the invention has been obtained. Hence, information may be added to the instances, e.g. by adding further attributes, or maybe modifying one or several attributes forming the instance of an entity slightly.
An example may e.g. be the above-mentioned deduction of information exemplified by an instance of a car, "Fiat", "155 Hp" and 2.0 liter. When compared with a reference product catalogue associated with the car domain, such a car may be deduced to be a turbo version, i.e. "Fiat", "155 Hp", "2.0 liter" and TURBO.
A storage model may typically be relational.
When correcting of the verified instances according to the entity reference base (ERB) by correcting information associated with said instances corresponding to said entity reference base, a further advantageous embodiment of the invention has been obtained.
Hence, instances may be corrected, e.g. by omitting attributes held in the instance or maybe modified by one ore several attributes forming the instance of an entity.
An example may e.g. be the above-mentioned deduction of information exemplified by an instance of a car, "Fiat", "120 Hp", 2.0 liter engine and Turbo. When compared with a reference product catalogue associated with the car domain, the verification of the instance may result in a correction of the "Turbo" attribute, as the verification procedure may both conclude (a): no 120 HP Fiat having Turbo is in the reference catalogue (b): a 120 HP Fiat without Turbo is most likely the true intended instance of a car. Consequently, a correction routine may correct the instance accordingly or discard the entity entirely.
Moreover, the invention relates to a method of establishing a query processor,
said query processor being adapted for accessing data on at least two different web- based data sources, selecting at least two predefined query processor elements (QPE),
combining the selected query processor elements into a desired query processor structure.
According to the invention, the overall structure of a query processor may be purely based on some basically intended design rules, i.e. a robot element must be assignedto a data source, a trigger must feature a manual user interface, a database element must contain retrieved database element, etc.
Such a conceptual design of a query processor should preferably be made by means of a graphically-based visual program, e.g. a drag and drop-like design program.
Evidently, this conceptual programming of a query processor may be made on the basis of more or less structured knowledge about the domain and the data sources of the domain.
Basically, such a design of a query processor represents the framework for the intended query processor.
The query processor elements basically represent different sub-frameworks which may all be designed and performed in separate structures or routines. Therefore, the design of query processors by means of different functional properties minimizes "error cross-talk" between the elements and the elements may advantageously be put together initially without dealing with complicated details of the individual elements.
A query processor according to the invention is established for accessing data of at least two different independent web-based data sources. A further advantage of the above-mentioned method is that a break-down of the functional features of a query processor into standardized elements, which may be configurable, may easily be conceived by a programmer.
A further advantage of the invention is that utilization of standardized elements facilitates the possibility of pre-configuring different variants of a certain element type, thereby offering the possibility of inserting a pre-configured element to the user.
An example of such pre-configuration of elements may e.g. be a trigger element. Within the (type) group of trigger elements, several variants may be pre-established with great advantage if such trigger elements are utilized often. Therefore, a programmer may e.g. apply a trigger element predefined for trigging a query at certain time intervals. Other types of trigger elements may e.g. be triggers comprising a statistic module applicable for trigging a query according to different system parameters. A third possible type of triggers may e.g. be a manually operated trigger intended for establishment of a query in corporation with a manually operated user interface.
Basically, the invention offers a high-level language facilitating easy web-based access.
When said at least two predefined query processor elements have different functional characteristics, an advantageous embodiment of the invention has been obtained.
Different functional characteristics may e.g. be elements functioning as converters, triggers, caches, robots.
Hence, a query processor according to the invention may be established by means of standardized "bricks", thereby doing away with the establishment of a web-oriented query processor being extremely complicated. When modifying the selected query processor elements according to the data structure of said web-based data sources, a further advantageous embodiment of the invention has been obtained.
According to the invention, the different elements may be configured or designed independently. Hence, the individual elements may be established so as to fit the individual task(s) of the elements without inducing errors somewhere else in the processing system.
When said modification of the selected query processor elements comprises at least one plug-in software module, said at least one plug-in defining domain-specific properties of said element, a further advantageous embodiment of the invention has been obtained.
Hence, domain-specific plug-ins may initially be constructed, e.g. product catalogues, language dictionaries, as completely separate routines. Moreover, the individual elements may be ideally constructed, e.g. a robot, with no or only little knowledge of the language of the data source due to the fact that the basic structure and functioning of the robot is language independent. Product catalogues should likewise be domain specific.
Moreover, the individual elements may be established with different plug-ins.
Moreover, the invention relates to a method of establishing a domain-accessing routine,
said domain comprising a plurality of web-based data sources,
said method comprising the steps of establishing at least one robot () adapted for retrieving entities stored on said plurality of web-based data sources, establishing at least one reference catalogue,
establishing at least one procedure of verifying, the retrieved entities by comparing the read entities with the at least one reference catalogue.
Thereby, an ideal way of retrieving information from a web-based data source has been obtained.
When said method comprising the steps of
establishing at least one storage means
establishing a data-exchanging interface between said at least one robot and at least one storage means, a further advantageous embodiment of the invention has been obtained.
When said reference catalogue is a product catalogue, a further advantageous embodiment of the invention has been obtained.
When said established procedure of verification comprises a modification of the retrieved entities if the verification procedure indicates or proves that a read entity is not valid according to the at least one reference catalogue, a further advantageous embodiment of the invention has been obtained.
Moreover, the invention relates to a query processor maintenance manager (QMM)
comprising at least one domain processor user interface (DPUI)
said manager (QMM) comprising means for evaluating different modules of at least one query processor (QP),
said means for evaluating different subroutines of said query processor comprising means for monitoring the state of at least on query processor element (QPE).
Hence, the query processor may comprise means for monitoring at the robot element, a transformer element, a trigger element, a mediator etc.
When said processor comprises means for automatically forwarding messages to said at least one query processor user interface (DPUI) when certain predefined conditions are met, a further advantageous embodiment of the invention has been obtained.
The predefined conditions may e.g. be conditions determining that a transformer has failed to transform extracted entities into conceptual entities.
A further predefined condition may be that a maximum load of an element, e.g. a cache or a robot, has been exceeded.
When said manager (QMM) comprises means for modifying individual query processor elements/sub-routines, a further advantageous embodiment of the invention has been obtained.
The means for modifying individual query processor elements/sub-routines may e.g. comprise an editor for the robots or means for modifying plug-ins centrally.
An example of such an editor may e.g. be the interface of a Query Processor Modeller in which the individual query processor elements may be edited simply by clicking on the elements and thereby starting the editor related to the activated element. Such an editor may e.g. be a Robotmaker, if a robot is clicked on, or a domain modeller if a transformer element is clicked on. When said manager (QMM) comprises means for modifying the query flow in the query processor during execution of the query processor, a further advantageous embodiment of the invention has been obtained.
When allowing realtime editing in the query processor, the up-time of the query processor may be maximized. This realtime editor should preferably comprise means for blocking differing query paths of the query processor without invoking fault conditions on the associated signal paths.
An example of means for modifying the query flow may e.g. comprise a mute element included in a query path. The activation of such a mute element may then cause the involved branch to be out of work, whereas the rest of the query processor may proceed unaffectedly, insofar that queries or entities (i.e. data) from the muted branch are significant to proceeding the query. Typically, the queries and entities missing from one branch of the query processor subroutine may be preferable over closing the complete query processor down.
Meanwhile, the elements of the muted branch , e.g. a robot or a transformer, may be "repaired" or updated without resulting in run-time errors.
A further advantageous variant of the above-mentioned modification may be a halt routine acting as the above-mentioned mute but including a memory which may catch and store queries, and subsequently resume processing by means of the cache and stored queries.
The figures
The invention will be described below with reference to the drawings of which
fig. 1 illustrates some basic principles of a query processor system,
fig. 2 illustrates a basic approach according to the invention when dealing with domain processing,
fig. 3 illustrates the process of establishing a domain processor according to a preferred embodiment of the invention,
figs. 4 to 6 illustrate the principles of one embodiment of a domain modeller according to one embodiment of the invention,
fig. 7 illustrates the principles of an applicable robot-making program according to one embodiment of the invention,
fig. 8 illustrates the functionality of a query processor modeller according to one embodiment of the invention and
fig. 9 illustrates a possible user interface of a domain execution manager.
Detailed description
Fig. la illustrates the basic principles of a web-based market place.
A web-based market place generally comprises a number of web-based data sources DS. The data sources are e.g. web-sites associated with a homepage of a data source owner. Typically, the data are transferred according to a HTTP protocol. Other protocols, e.g. WAP protocol or HTTPS are also applicable.
The data sources DS are typically a database or they are powered by a database DB of the data owner.
It should be noted that a marketplace may moreover comprise non-web based data sources accessed by means of e.g. ODBC drivers.
The data sources offer information, products, services, etc. free or for sale.
According to the invention, a market place should technically deal with one domain only, but evidently, several domains may be overlaid and thereby offer a market place dealing with different domains.
An example of such domain may e.g. be a car market place. The cars of the domain are offered for sale on the individual web-based data sources DS, and the cars may be new or used. A domain may include different nationalities of data sources and be in many languages. On the other hand, a car market place offering used cars would typically only comprise cars offered for sale in one country.
Other exemplary domains may be jobs, services, stocks, odds, boats etc.
It should be noted that web-based access to the data sources facilitates a very broad covering of the entire domain due to the fact that web-based data sources may be accessed without any kind of corporation between the accessing part and the data source owner. Typically, the data sources will be independent.
According to the invention, the content of the data source of the domain will be regarded as entities. An entity has different properties, here defined as attributes.
An example of an entity is a specific car offered for sale, e.g. a Porsche, and attributes may be color, e.g. black, engine, e.g. 3.0 liters, etc.
Another example of an entity is a specific boat described by a number of suitable boat-describing parameters, or attributes, such as length, price, year, etc.
When reading a web-based data source DS, a combination of attributes will typically be read and interpreted as a car. Such reading of attributes may be regarded as an extraction of information from the web-based data source according to the invention.
The data sources DS may be accessed both by reading and/or writing.
The data sources may be accessed via a domain handling system, i.e. a processing system, implemented by software in hardware on the illustrated computer system CS. The computer system may comprise one central server or a number of coupled servers located centrally or decentrally. Such system may be regarded as a query processor QP. The query processor is adapted for querying the data sources automatically or upon request, a query Q, made by a user U. The request is performed by means of a user interface implemented on a user platform UPF.
As illustrated in fig. lb, a User Platform UPF typically comprises a computer-based user interface which may be manually operated by a user U.
Hence, a user may forward a query Q to the data sources DS via the query processor QP. The query may be processed in many steps and the query processor QP may also include a data cache or a database for storing entities retrieved from the data sources DS for statistical purposes or for speeding up the query process.
The individual web-based data sources are accessed (i.e.: read and/or write) by means of robots attached to the data sources. Typically, one robot is uniquely to a corresponding data source DS.
The definition of a robot differs significantly from the somewhat popular definitions and the more scientific definitions.
The definition adapted in this application is that a robot is a kind of automatic process established with the purpose of accessing web-based data. A robot is a sub- arrangement of a so-called agent.
According to the invention, a robot is a software-based automatic process established with the purpose of accessing web-based data sources. According to the invention, a robot may even comprise some kind of intelligence embedded in the process establishing elements. It should be noted that a robot according to this definition may even be regarded as an agent by some practitioners within the art.
According to the invention, the agent has no personality, and it is not autonomous, nor mobile, in the sense that the agent is free to be transferred and processed on the local data source servers of the data source owners. A robot according to the invention is established for remote execution in relation to the data sources to be accessed and the robots will only be executed in a particular server environment. It should be noted that this particular environment may obviously include several servers located at different places.
Again, it should be noted that non- web-based data sources may be added if desired.
Fig. lc illustrates the complex nature of a data source to be accessed according to the invention. The illustrated data source DS has a data structure which is initially unrevealed and incompatible with the access tools of the retrieving profile associated with the specific data source DS.
According to the illustrated embodiment, the character- based information of the data source DS has been converted into a number of attributes of identified text strings. Evidently, attributes may be encoded and decoded in various formats such as character based formats, image based formats and active content formats, such as Java applet, JavaScript application or NB script application.
The text strings may e.g. be a mix of text strings identifying car names, model names, numbers, etc.
Subsequently, the data source must be evaluated and interpreted according to an extraction model in order to facilitate access to hidden information by the retrieving profile RP.
Fig. Id illustrates identification and categorization of attributes of a data source according to the invention.
The attributes, i.e. the text strings of the data source, may subsequently be interpreted and combined into so-called entities of associated attributes ASA. The associated attributes may be established so as to comprise certain predefined types of attributes, i.e. categorized attributes.
An example of an entity is a car entity comprising the categorized attributes CA "Trabant", '88 and $100,000 where the first attribute of the category is car model, the second attribute of the category is manufacturing year and the third attribute of the category is the price. The above-mentioned entity may also be referred to as an instance of an extraction model. The extraction model defines and describes certain attributes and entities of interest for the domain. Each entity is established as a set of associated attributes ASA and the irrelevant attributes are filtered away.
Evidently, the establishment of entities of associated attributes may be performed in several different ways, and more or less automatically, within the scope of the invention. It should be noted that the preferred embodiment of the invention implies a completely automatic establishment of as many robots as possible.
A detailed description of a semi-automatic robot establishment according to one embodiment of the invention is described with reference to figs.7 to 9.
Subsequently, the identified entities may be copied into the central database DB means in such a way that the retrieving profile initially performs a query in the database instead of visiting every involved data source DS and lists the results to the user according to a predefined listing format. This feature ensures quick access to the search result. If the user U requires additional information, this information may be obtained by means of a link contained in the above-mentioned result list.
When the entities have been copied to the database and associated with the retrieving profile, further information is added to the retrieving profile in the form of a robot adapted to the data structure of the specific data source. This robot is associated with the retrieving profile in order to visit the data source according to certain trigger criteria and to reevaluate the data source in order determine whether the contents of the data source have been changed. Hence, the robot will access the data source e.g. at certain intervals and update the contents of the database if changes have occurred.
Such an automatically handled change may take place if e.g. one entity has been removed from the data source and replaced by two other entities when the removed entity represents a sold car and the two new entities represent cars introduced for sale. Such a change observed by the robot should of course be reflected in the database, as the sold car has to be removed and the two cars be added to the database in order to reflect the state of the data source when the data source is visited.
A change may likewise be stored and registered for statistic purposes in another database.
If, on the other hand, the data structure of the data source has changed in such a way that the robot is no longer able to extract the correct information, an error is reported to the retrieving profile. Such an error results in the establishment of a new robot fitting the new structure of the data source.
It should be noted that each data source typically requires a dedicated robot.
Fig.2 illustrates three entity models applied in a preferred embodiment of the invention.
The three entity models are an extraction model EM, a conceptual model CM and a storage model STM.
For reasons of simplicity, entities according to the three models are referred to as extraction entities EENT, conceptual entities CENT and storage entities SENT. The entities are also referred to in three different formats, i.e. an extraction format, a conceptual format and a storage format.
The entity flow is transformed between the different formats by means of converters established for converting the data from one format into another. According to the invention, the converters may preferably be established as so-called transformer elements which will be dealt with in detail below.
Starting from the web-based data source end, upstream, the entities are accessed according to an extraction model preferably common for all involved data sources of the domain. The extraction entities simply comprise a serial stream of strings. According to the extraction model, the strings are ordered in such a way that the receiver of the string-stream may recognize what the transmitter actually intends to transmit. This may be established both with accompanying codes or simply as a convention defining the sequence.
In fact, the extraction model represents more than a data format. It also defines the different attributes which the robots should access when dealing with the different data sources. In other words, the extraction model represents a framework in which the designers may design the robots. The robot designers may therefore concentrate fully on designing a robot capable of accessing the attributes contained in the extraction model and on combining the attributes into entities according to the extraction model, i.e. extraction entities.
The extraction entities may nevertheless be established e.g. wholly or partly by automated extraction routines. In a certain web-based data source, such routines may e.g. be adapted for automatic reading the data source representation, automatic recognition of attribute patterns of the web-based data source, and outputting of these attributes as extraction entities according to the extraction model.
Moreover, such automated routines may evidently be adapted for assigning the specifically discovered attribute/entity patterns of a data source to a corresponding robot.
According to the preferred embodiment of the invention, the extraction model may be established by means of a domain modeller DMR.
The extraction entities may then be converted, e.g. by a transformer, into conceptual entities. Among other things, the conceptual model representation of an entity involves a conversion of the individual entity into a unique object. In a simplified manner, an extraction entity comprising a string stream of "Porsche", "Red", "3.0","Diesel" is converted into a unique car object, a conceptual entity, being a Porsche which is red and with a 3.0 liter diesel engine.
The conceptual format moreover offers the possibility of handling the entities in a compact way. Now, the entities may be represented in an object-oriented manner instead of a flat string format.
Moreover, a conceptual approach to the entities offers the possibility of adding knowledge to the retrieved entities. Such information may e.g. be information deducible from a reference product catalogue. Thus, if an entity is matched with an entity type of the product catalogue, the entity may be modified, e.g. as a validation, a correction or as an insertion of additional information about the entity.
A correction may e.g. be that one of the attributes of the Porsche retrieved above is false according to the product catalogue. This false attribute may be detected in several different ways within the scope of the invention. The reference product catalogue may e.g. initially reveal that no Porsche having a 3.0 liter engine has been made with a diesel engine. Moreover, the product catalogue may reveal that no Porsche has been made with a diesel engine, thereby raising the probability that the data source provider has made a mistake. The wrong attribute "Diesel" may then be corrected.
Insertion of added information may e.g. be that the recognition of a Porsche of the above-mentioned type (now assuming that the diesel statement has not been made) has electronic injection. This information may then be inserted as a new attribute to the unique conceptual entity Porsche or in the fill-in of a text field attribute of the Porsche.
Validation comprises the step of evaluating whether the currently investigated conceptual entity should be regarded as a valid entity at all. Such validation may basically result in the fact that the entity is accepted as a valid entity or that the entity is discarded. Subsequently, a valid entity may be further processed with the purpose of deducing information about the entity described above.
A discarded entity may result in a further investigation of the original data source with the purpose of evaluating whether an entity has been overlooked. Evidently, a realtime evaluation of the discard rate of each data source should be performed with the purpose of monitoring whether the robot or the extraction model associated with the individual data source needs an update or replacement.
Typically, every possible attribute of a conceptual entity should be predefined in the conceptual model. According to a preferred embodiment of the invention, the conceptual entities and attributes should be established by means of a domain modeller.
The conceptual model should typically be made by people having a certain kind of knowledge about the domain. It should, nevertheless, be emphasized that the establishment of relevant attributes may be heavily supported by automated procedures traversing trough the domain and identifying the offered combinations of attributes.
The last entity model is the storage model. The storage model is primarily adapted for applying traditional database structures and database handling methods to the retrieved entities. Thus, the modeling of a storage model may be performed with very little knowledge of the nature of the domain but more or less by focussing on the involved attributes and entities.
Evidently, other entity format approaches may be applied within the scope of the invention. Specifically, the distinction between the different models may be softened up a little in the sense that the conceptual model and the data storage model may more or less be incorporated in one body. Evidently, the invention features the possibility of performing centralized processing when data retrieved from the different data sources are represented according to a generalized entity model, e.g. a conceptual model.
The extraction format may be understood as an analogue format while the conceptual/storage format may be regarded as a digital format.
The extraction entities are typically entities extracted directly from the web-based data sources, the conceptual entities are typically the entities flowing in the heart of the query processor capable of more complex processing, and the storage entities are typically the entities represented in e.g. a relational database.
It should be emphasized that the different models, e.g. the above-mentioned extraction model EM, conceptual model CM and storage model STM may facilitate an entity flow both ways; downstream as described above from the data sources to the user querying the query processor, or upstream from a user submitting an entity or a request, e.g, an order to a certain data source.
If, for instance, a user wants to buy an item found in the domain, he may then submit an order associated with a chosen entity, e.g. a PC, car, etc. This order would comprise the selected item as a storage or conceptual entity which is subsequently converted in the query processor and submitted to the relevant data source according to the extraction model. An extraction model according to the invention may thus both be defined as a way of reading the data source and it may be defined as a way of writing (submitting) entities into the data source, e.g. by means of a form into a shopping cart of the data source or a data search form associated with the relevant data source.
Preferably, the two functions, reading and writing, should be supported by two separate distinct models for the purpose of clarity, i.e. one model for reading the data source, an extraction model, and one model for writing to a data source, a submission model. The first format, the extraction format, is the format in which the entities are accessed in the web-based data source. This format is evidently a little fragile and unhandy due to the fact that this string-based entity stream is primarily based on transmission of data supposed to be entities and attributes of entities. This fragile extraction format may typically not be supported significantly by validity checks due to the fact that the extracted entities are difficult to process on a large scale. Such processing would involve major complex string-based processing.
The conceptual format is established on the basis of the predefined conceptual model defining the basic nature of the entities of the domain. The conceptual representation may fundamentally be regarded as an object-oriented representation of the read entities. A conceptual representation of the read entities is relatively easy to process in the sense that the entities are converted into unique instances of the conceptual model, thereby offering filtering, conversion or modification of any information related to the individual instances of predefined information, e.g. attributes, types of attributes etc. consistent with the conceptual model.
The storage format is basically intended for storing the retrieved entities for later access. The storage format represents a more handy representation of the retrieved entities of the domain in the sense that superfluous information, e.g. information contained in or related to the conceptual model may be omitted. Such information may e.g. be entity information utilized for converting the extraction entities into conceptual entities. Such information need no longer be present in the storage model as the entities are now conceived as unique entities.
The entities stored in a database according to the storage model may (and should ) instead be used for statistical purposes.
The conceptual model and the storage model may be more or less overlapping but, preferably, these formats should be dealt with separately, thereby obtaining the possibility of reusing the storage model and even the conceptual model in other applications. Moreover, the strict separation between the applied data models facilitate the individual models to be modified individually without considering interaction with the other models under some circumstances. An example of such a simple modification of a model is the modification of a classification module which may basically be established without any modification of other modules as long as no new entity attributes have been introduced or removed.
A part of the extraction model may be global or at least multiple in the sense that this part of the model may contain general plug-ins of the extraction model applicable for many or all data sources to be accessed. An example of such general plug-ins may e.g. be a language dictionary defining different applicable languages, e.g. English, Japanese, French or Danish. Moreover, the language dictionary may contain a domain-specific dictionary focussing on the entities characterizing the domain.
Fig. 3 illustrates the process of establishing a domain processor according to a preferred embodiment of the invention.
It should be noted that the establishment of the components and logistics needed for collecting data from a domain and the maintenance of the components may be performed in other ways within the scope of the invention.
Initially, the main steps to be introduced below with reference to fig. 3 will be described shortly. A throughout discussion of the steps and the meaning of these steps will made below with reference to the subsequent figures.
Initially, it has been decided that a new domain must be established. This domain may e.g. be a domain comprising boats offered for sale which are either used or new.
The boats are offered for sale from different web-based market places, typically the homepage of a dealer or e.g. private homepages. As discussed later, web-based data sources may be supplemented by e.g. direct reading in a dealer's database, e.g. by means of ODBC based reading. Nevertheless, the domain should basically always be located in at least two different web-based data sources.
Moreover, the web-based data source may typically be accessed without the consent or knowledge of the web-based data source owner. Consequently, there are no strict sign-up requirements by the data source owner. Therefore, the data fundament of the domain is huge, insofar it more or less includes all entities offered for sale in the complete worldwide web.
The decision that a new domain ND has to be made initially invokes the Domain modeller DMR to establish the characteristics of the domain. These characteristics are to be used when establishing the different technical measures needed for accessing the web-based data sources. Details of the functioning of the very important Domain Modeller DMR will be discussed later. It should be noted that the domain modeller may operate more or less automatically.
According to a preferred embodiment of the invention, the domain modeller DMR outputs a specific Domain model DM needed for the different software modules, also named elements, to be used when establishing the query processor for the domain.
Hence, the elements described at a later point may advantageously utilize the domain model DM for different or overlapping purposes. The domain model DM may comprise a knowledge base describing different general features and aspects of the invention so to speak. Such a general knowledge "container" benefits from the fact that the knowledge describing the domain may be established centrally and thereby obtain a compact knowledge structure which may be modified centrally and basically without dealing with complicated details of the different query processor elements.
Therefore, the domain model represents a knowledge structure that may be accessed by the different query processor elements simply by defining a so-called plug-in to the individual or some of the query processor elements. The plug-in may represent a domain reading structure, e.g. JANA-code, adapted for reading a certain part of the domain suitable for the establishment and functioning of the element. Therefore, different elements may utilize different parts of the knowledge. Moreover, the centrally organized knowledge may be modified centrally, thereby inferring that all elements automatically utilize an updated knowledge base with little or typically no modification of the elements or the plug-ins.
According to the invention, some general knowledge may evidently be decentralized, i.e. put into the individual query processor elements. However, according to a preferred embodiment of the invention, the central knowledge base, or the domain model DM, should be maximized.
A domain model DM may e.g. comprise a reference product catalogue describing all known products of the domain, e.g. a list of different known car models and variants of such models.
Furthermore, the domain model DM may comprise mappings between different entity models applied by the query processor, e.g. conversion mappings between extraction entities, conceptual entities and storage entities.
Furthermore, the domain model may e.g. comprise the extraction, conceptual and storage models.
Also, the domain model may comprise language dictionaries, both domain-specific and more general dictionaries.
By applying a domain model, a change in the domain model may be reflected uniformly in the complete query processor.
The next step, Create Query Processor CQP, initiates the combination of different elements by means of a Query Processor Modeller QPM. Some of the elements combined by the Query processor Modeller QPM are established by the domain modeller DMR and some of the components are general preestablished elements. Other elements to be used may e.g. be robots intended for accessing the data of the individual sites.
The next step, Create Accessors CA, initiates the assignment of individual robots to specific data sources of the domain. A detailed description of such a robot-generating program may be found in PCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and is hereby incorporated by reference.
The last step, Maintenance, involves the establishment of different procedures intended for maintaining the query processor. Such procedures may e.g. be establishment of a robot and system monitoring. Such monitoring may e.g. include the monitoring of the load of the software elements/modules and whether the robots actually fit the sites, etc.
Moreover, such procedures may include modifying or exchanging robots if such actions are considered necessary.
Evidently, the chronology of the above-mentioned steps may be modified within the scope of the invention, e.g. by establishing the robots before the query processor is combined in the Create Query Processor step.
Figs. 4 to 6 illustrate the principles of a domain modeller according to one embodiment of the invention.
Evidently, the user interface providing the domain modelling features to the user may be established in numerous variants within the scope of the invention.
According to the illustrated embodiment, the relations between the table of the database are made in a selectable "edit" environment. Evidently, a combined view/edit environment is applicable within the scope of the invention. The illustrated domain modeller comprises an interface having a menu bar comprising four different selectable menus File, Edit, Niew and Mapping.
Fig. 4a illustrates that the menu Niew has been selected. The Niew menu, which is a Relationships Window, may comprise several menu items: Storage model, Extraction model, Conceptual model and Submission model. The models define the different entity models adapted by the complete query processor. Evidently, different kinds of entity models and definitions of entity models may be adapted within the scope of the invention.
The term database model may also be referred to as a storage model.
In fig. 4a the Database model view has been selected.
The view area NA appearing when selecting Storage Model Niew illustrates the basic components of the database attached to the domain by means of visual indications of relations between the tables. The database model defines the structure of a database intended for storage and handling of the entities of the domain. A database model is typically a relational database rather than a flat-file database in order to accommodate the knowledge obtained by the query processor.
The Relationships window may be in different "show relationships"- modes, e.g. "Show All Relationships" or "Show Direct Relationships".
The first mode shows all tables of the current database. The other mode shows the tables of the database within the currently selected domain. When selecting the available tables, the viewer will show the relationships to all tables related directly to the selected table.
Basically, this viewing area NA may operate like known visualizing tools adapted for viewing relations between tables of relational databases. According to the illustrated embodiment, the viewer is in the second mode. An open domain model intended for attachment to a PC distributing domain comprises a PC Equipment table PCE. The illustrated PCE table comprises an ID, DealerlD, ProdID and Price. The first is a primary key to the PCE-table, while DealerlD and ProdID are foreign keys to the tables DCAT and PCAT, respectively.
The PCE table refers to a product catalogue PCAT and a dealer's catalogue DCAT. The product catalogue PCAT is a table of the products attached to the domain and intended for sale. The dealer's catalogue DCAT is a table of the dealers attached to the domain. Finally, the PCE table refers to price.
Evidently, such a PCE table would typically be more complex, e.g. comprising relations of tables comprising further product characteristics such as color, comments to the products, currency, URL etc.
When double-clicking on the Price field of the PCE table, the Price field definitions appear as a dialogue box PD. This field may be applied for defining the Price field. The illustrated Price field has the name "Price" and the field type may be selected as a string or an integer, here selected as an integer.
Fig. 4b illustrates that the menu Mapping has been selected. The Mapping menu, which is a table or Relationships Window, may comprise several menu items, e.g. the illustrated EM to CM, CM to STM, STM to CM or CM to SM.
The first-mentioned mappings, EM to CM and CM to STM, deal with mappings needed for retrieval of entities from a data source, while the two latter deal with writing, i.e. submission to a data source (e.g. filling-in of a form in a data source to place an order, filling-in of a search form or e.g. insertion of a new entity in the data source. The EM to CM, Extraction model to Conceptual model mapping, defines the mapping between the entities and/or attributes retrieved according to the extraction model EM into entities and/or attributes according to a conceptual model CM.
The CM to STM, Conceptual model mapping, defines the mapping between the entities and/or attributes held according to the conceptual model CM into entities and/or attributes according to a storage model STM.
The STM to CM, Storage model to Conceptual model mapping, defines the mapping between the entities and/or attributes represented according to the storage model STM into entities and/or attributes according to conceptual model CM.
The CM to SM, Conceptual model to Submission model mapping, defines the mapping between the entities and/or attributes represented according to the conceptual model CM into entities and/or attributes according to a submission model SM.
Evidently, the mapping from one model to another may be performed in several other ways than the table-based method illustrated in fig. 4b within the scope of the invention.
Thus, the mapping may include direct transformation of a number of associated attributes into a unique object in a relational manner. That is; the bundle of associated extractions is transformed as a whole into one unique object instead of applying the above-mentioned method of initially mapping the extraction attributes into conceptual attributes, and then subsequently establish a unique entity on the basis of a reference system, e.g. a product catalog defining different possible entities of the domain.
The mapping from the extraction model to the conceptual model preferably involves a classifier (i.e. a classification system) that will map extracted entities into conceptual entities according to a product catalogue. That is; the product catalogue may contain various (generic) conceptual entities existing in the domain. After classification, if a classifier is at all available in the domain, the conceptual entities are made unique according to the extracted entities by transferring various attribute values from the extracted entities to the conceptual entities, such as price, URL, currency etc. This transfer of values from extraction entities to conceptual entities is done by selecting and configuring a transfer function that maps one or more extraction model attribute values into one or more conceptual model attribute values.
In fig. 4b, the EM to CM has been selected.
The view area appearing when selecting EM to CM attributes illustrates the attributesto be converted into conceptual entities, e.g. in the form of a table.
In fig. 4b, the extraction attribute "Make" has been selected, thereby opening a mapping table where EM-CA A has been selected. The table comprises different applicable mappings between extraction attributes to conceptual attributes, here exemplified by the strings Ferrari, Fiat and Ford converted into integers 17, 18 and 19, respectively.
Fig. 5 illustrates that the PCE table has been double-clicked. A PCE dialogue box appears PCED. This dialogue box facilitates editing of the PCE table defining data, e.g. by insertion of SQL-statements associated with the PCE table, attribute names, etc. Finally, the table may be generated by selecting the Table Generate tag, TAG.
Basically, the storage model may be modeled by known prior art database-generating tools. The important thing when dealing with the database model for the specific domain is to include all necessary attributes and establish an well-structured, easily searchable and quickly accessible database. It should be noted that this structuring of the domain database may be performed independently of the rest of the domain query processor, as long as the necessary entity attributes have been defined. Fig. 6 illusfrates the Domain Modellers Exfraction model viewer.
In fig. 6, the Domain Modellers Extraction model viewer has been selected.
While the database in the the database modeller viewer may be regarded as the representation of entities "understood" by the query processor, the domain extraction model to be made by the extraction modeller may be regarded as the definition of relevant attributes included in the syntax of "raw" string-based data of the web-based data sources to be accessed as defined by the data source provider.
ROBOTMAKER
Fig. 7 illustrates the principles of an applicable robot-establishing program according to one embodiment of the invention.
Evidently, the robots to be used in the query processor may be established and attached to a certain data source in many ways within the scope of the invention.
The main principles of the robot generator mentioned below is to make a robot and assign it to a certain site containing data relevant to the domain of interest, i.e. assign the robot to the site by means of an address, e.g. URL address, and generate a data reader (the robot) capable of reading the data of interest contained in the data source, e.g. a web-site, and transfer these data in a certain data format to the central control of a query processor in response to a query.
Hence, according to a preferred embodiment of the invention, a new and unique robot has to be made for each web-based data source to be queried.
Turning now to fig. 7, a short overview of this program will be described. A detailed description of such a robot-generating program may be found in PCT/DK00/00163 and PCT/DK00/00429 filed by the applicant and is hereby incorporated by reference.
The nodes may be arranged in straight-forward paths. However, the nodes are typically arranged in branched IF-THEN paths.
The robot generating program is adapted for establishing sequential access of a web- based data source. The control of this sequential reading is e.g. established by means of a graphical path of node processors NP, each node processor NP performing some configurable processing of its input. The nodes are sequenced in such a manner that a web-based data source, e.g. in HTML, may be traversed and data extracted or submitted. It should be noted that high-volume establishment of such robots is somewhat time-consuming. Hence, the robot-generating programs should be very user friendly or even automatic.
A nodeprocessor selector NPS is adapted for configuration to the current application in the node processor configuration view NPC. Moreover, the nodeprocessor may be attached to a certain document area by means of a document range definer DRD.
Finally, the robot maker viewer comprises a document view which e.g. may be adapted for viewing the XML text of the data source or a part of the data source.
Basically, the robot maker outputs robots and each robot is specialized in operating one dedicated web-based data source.
According to the preferred embodiment of the invention, the robot outputs entities according to the extraction model(s), i.e. non-classified or interpreted data, to a central control, e.g. to a transformer query processor element. Here, the extracted strings may be converted into coded representations, e.g. as objects stored in a database, and the extracted data may then be classified. Evidently, according to additional/other embodiments of the invention, the established robots may contain transforming means for transformation of extracted data into a conceptual representation, e.g. conversion of a sequence of strings "Ford","2.0","red" into an object stored in a database as a "car", which is a red Ford having a 2.0 liter engine. It should be noted that the preferred embodiments of the invention benefit from a more central transformation of entities into conceptual data, thereby reducing the requirements of maintaining decenfral transformers.
QUERY PROCESSOR MODELLER
A query processor modeller according to the invention is intended for establishment of the "transfer function" between the user, the web data accessing machine and the data located in a web-based data source. The meaning of "transfer function" involves a data flow from the user towards the data acessing machine and/or the web-based data sources. Moreover, the transfer function involves control of the flow of data from web-based data sources towards the web-data extraction machine and/or the user.
According to a preferred embodiment of the invention, this functionality is referred to as a query process flow and the established "accessing machine" is referred to as a query processor. The query processor will preferably be adapted for processing of a certain well-defined domain, e.g. a car domain. It should be noted that some kind of overlapping between the domains may be acceptable in the sense that one query processor may e.g. comprise query processor elements accessing data from different domains. Preferably, the domains should be separated since a query processor should only deal with one domain.
The query processor will be defined in a query process graph below by means of a visual programming tool. Fig. 8 illustrates a preferred embodiment of the invention involving a visual programming tool for establishing the above-mentioned transfer function by means of a query processor graph QPG.
According to a preferred embodiment of the invention, the query processor modeller comprises a visual and programmable editor. The illustrated editor facilitates the combination of a number of Query Processor Elements QPE into a query processor graph. The query processor elements may be of different types defined by their main functions.
Initially, a short introduction of query processor elements will be provided.
An example of a query processor element QPE may e.g. be a robot, such as a robot query processor element RQPE. A robot query processor element RQPE is adapted for accessing web-based data sources upon request. A single robot may typically be attached to one single data source.
Evidently, a robot query processor element may also be adapted for reading only or writing only if suitable.
Another example of a query processor element QPE may e.g. be a cache, such as a cache processor element CQPE. Such an element is adapted for returning a response to a query or it may guide the query further on in the process if the cache contains no answer to the query. A further possibility is that the cache element CQPE returns a part of the response which may be established by means of the entities already contained in the cache, and forward a query further upstream in the processor in order to establish the rest of the response.
A further example of a query processor element QPE may e.g. be a so-called mediator query processor element MQPE. This element is adapted for distributing an incoming query to other query processor elements and for gathering the response returned by these queried processor elements, e.g. robots, and returning the answer back to the processor which queried the mediator MQPE.
Another query processor element may be of a trigger type, i.e. a trigger processor element TPE adapter, for triggering a certain operation or a query.
The trigger processor element TQPE is adapted for initiating a certain action, e.g. an automatically scheduled initiation of a query, an automatic trigger processor element ATPE. Another applicable trigger processor element TPE may e.g. be a trigger adapted for initiation of a query upon request by a user, i.e. a manually activated trigger MTPE. It should be noted that the latter trigger processors represent another type of query processor elements than the first. The trigger query processor element is not activated by an incoming query but at its own initiative. Hence, a manually operated trigger element MTPE may be regarded as an element including a user.
Turning now to fig. 8, the figure illustrates a query processor adapted for processing a certain domain. According to the illustrated embodiment, the domain comprises three web-based data sources. The illustrated query processor QP is constructed and monitored by means of a visually programmed drag- and drop query processor graph QPG. The establishment of this query processor graph may also include the configuration of the individual query processor elements. The configuration of e.g. a robot may thus be performed by means of an embedded robot modeller which may be activated via the Query Processor Modeller.
The illustrated query processor graph comprises three robot query processor elements RQPE1, RQPE2 and RQPE3.
Each robot is attached to a specific, dedicated data source, i.e. determined by the URL of the data source. Each robot is made automatic or semi-automatic by means of a robot modeller RM, both referred to as robot maker and robot modeller RM in this application. The robots RQPE1, RQPE2 and RQPE3 are adapted for accessing, i.e. reading and/or writing, the associated data source (not shown) according to a readwrite pattern defined and associated with the individual robots. This defined read/write pattern enables each robot to access the corresponding data source. According to a preferred embodiment of the invention, there is a one-to-one relationship between the robots and the data sources, i.e. one web-based data source is accessed by one robot only. The read/write pattern in the robot is typically highly specialized in order to fit the specific data structure of the associated data source. It should be noted that web-based data structures are typically programmed and structured independently, e.g. in HTML tables or other more or less unforeseeable data structures.
The establishment of a read/write pattern may also be referred to as a creation of a robot.
Evidently, the invention offers different web-based data source owners the possibility of entering their data in a data structure which is easy to access by the query processor. Such easy access may e.g. be provided to the data source owners in the form of design requirements if they want their data source to be roboted. Likewise, the query processor may also include data-accessing robots, e.g. by featuring direct ODBC access to the database of the data owner. Thus, it will sometimes be possible to assign a standard robot type to such generalized data source if so desired.
According to a preferred embodiment of the invention, requirements to the data source owner will be kept low, thereby offering the possibility of accessing numerous different data sources.
Turning now to the defined robot query processor element RQPE1, this robot is dedicated to a specific web-based data source and communicates with a query processor element in the form of a cache CQPEl. The cache may be activated by a trigger TQPE1. This trigger element TQPE1 may initiate a certain trigger-defined query subsequently performed by the robot query processor element RQPE 1. The cache element CQPEl may e.g. be provided as an encapsulation of the robot's data source. This direct and local pre-cache operation on one data source provides the possibility of reducing access time to certain data of the data source operated by the robot RQPE1. Evidently, this facility is attractive for the purpose of boot- strapping the cache with entities (data of the data structure of the data source) that are often queried. The trigger element TQPE1 should typically ensure that data often queried are updated regularly according a preferred embodiment of the invention in order to avoid a completely empty cache. Evidently, this control may also be integrated in the cache CQPEl within the scope of the invention. The cache CQPEl is a coupled mediator query processor element MQPEl. The functioning of the mediator MQPEl will be described below. Moreover, the cache element CQPEl may e.g. be adapted with the purpose of reducing the load on the specific site roboted by the robot element RQPE1 in a more strict sense, as the cache may be adapted for returning entities stored in the cache without querying the robot irrespective of the fact that the entities stored in the cache are not completely updated. Thus, the local cache element CQPEl may thus set a minimum interval for activation of the robot RQPE1, thereby ensuring that each and every query not does necessarily result in a query of the data source. This application of a cache may ensure that a certain site is not overloaded by the robot.
A further robot query processor element RQPE2 is dedicated to a specific web-based data source and communicates with a query processor elements in the form of a transformer TAQPE 1. The transformer element TAQPE 1 is adapted for receiving a query from a user-activated query element MPTE located downstream to the located data sources located upstream. The illustrated transformer element TAQPE 1 channels an unmodified query further on to the robot query processor element RQPE2. Subsequently, when the robot RQPE2 returns a reply to the query, the response may be modified by the fransformer before being returned to the connected mediator MQPEl. Such a modification may e.g. be established as a trivial mapping of km: 34 to be read as km: 34,000 or the like. Preferably, utilization of fransformers for such purposes should be made when certain data sources, e.g. web-site, use certain terms deviating from the general terms applied by other data source providers within the domain.
The system comprises a further robot query processor element RQPE3 dedicated to a specific web-based data source. This robot RQPE3 is directly coupled to the mediator MQPEl.
The mediator MQPEl is applied for branching the query process path into several different paths, e.g. three as illustrated. During the return path, the mediator collects the information obtained by the queried robot branches and returns the data to a transformer element TAQPE2.
This transformer element TAQPE2 defines a principle borderline between the upstream robots RQPE1, RQPE2 and RQPE3 and the downstream user U as the transformer performs a fransformation of data retrieved by the robots into conceptual data according to a conceptual model associated with each robot. These conceptual data are handed over from the transformer element TAQPE2 to a cache query processor element CQPE2. Typically, the conceptual model should be common for all involved elements dealing with entities in a conceptual manner.
The cache element CQPE2 may be regarded as the main storage means for the query processor QP intended for storage of the currently updated entities retrieved by the robots of the query processor.
The nature of the cache may vary significantly from application to application. In some applications, the cache may comprise only recently entered conceptual data, while caches in other applications may comprise a more or less complete database of the entities comprised in the data sources associated with the domain processor.
The cache CQPE2 may be activated by a trigger query processor TQPE2. This trigger may e.g. be adapted for refreshing the cache CQPE2 according to scheduled trigger criteria. The trigger criteria may both be established on the basis of user query statistics and/or statistics associated with data stored in the cache CQPE2.
The data contained in the cache CQPE2 are conceptual data.
The cache CQPE2 are coupled to a user interface represented by a manually operated trigger element MTPE located downstream of the query processor graph via a tracking module TMO adapted for gathering and storing data. The gathered data are used for keeping track of the history of data contained in the data sources of the domain and for establishing and maintaining query statistics. This tracking module is a combination of a number of query processor elements QPE.
Basically, the module comprises a storing query processor element SQPE1 adapted for writing data into a database query processor element DBPEl. The database DBPEl comprises entities retrieved from the associated domain of data sources and the entities are stored according to a preferred storage model. The storage may also contain history-describing data or data from which the entities may be deduced. The storing query processor element SQPE1 may be activated by both a user query or a trigger query TQPE3. The frigger query processor element TQPE3 is intended to maintain and establish desired data, such as prices of cars or the like and thereby offer the possibility of registering if an entity comprised in a data source covered by the domain processor has offered another price etc.
Finally, the illustrated query processor path comprises a transformer element TAQPE3. This transformer element is primarily responsible for transforming conceptual data into storage data in the database DBPEl.
Short explanations of some of the above-mentioned query processor elements will be provided below. Generally, according to a preferred embodiment of the invention, the query processor elements should function without any knowledge of the context.
THE CACHE QUERYPROCESSORELEMENT
A cache query processor element according to the invention may implemented in many ways. Generally, the cache should (as a traditional cache) contain some of the entities recently read from one or some of the data sources. The idea of applying a cache should generally be that of reducing access time to the data sources. Generally, the cache may be controlled in many ways, depending on the purpose. Thus, the cache may be activated from time to time by an automatic frigger with the purpose of refreshing the content of the cache with respect to certain types of entities. Triggering of the cache would then imply that the triggered cache forwards a query to the relevant data sources of the domain, collects the response and writes the returned entities into the memory. Obviously, triggering of the cache may be constructed in numerous ways within the scope of the invention as long as the main purpose of the triggering is to obtain the best possible performance of the current application. Evidently, in some domains, the cache should not be applied for entities exceeding a certain age, e.g. 3 minutes, if the nature of the entities contained in the domain are changing quite often.
An example of advantageous triggering according to the invention may e.g. be that of triggering the cache with the purpose of refreshing the cache with entities often queried by the users of the query processor. This boot-strapping ensures that start-up time is reduced by maintaining the often queried entities in the cache. The statistical control may therefore imply triggering of the cache which may vary dynamically, i.e. be controlled by the user request.
A further possible approach may e.g. be triggering of the whole domain once a day which means that all relevant data contained in all data sources of the domain are read into the cache and that all data are updated at least once a day. Evidently, according to the latter strategy, the cache is controlled in a manner resembling a kind of persistent database.
THE TRANSFORMERQUERYPROCESSORELEMENT
The fransformer query processor element is basically an element which may transform an incoming query or entity to another query or entity. Hence, the fransformer works both ways: downstream and upstream.
Applicable fransformer elements may e.g. be transformers transforming raw extracted text-string entities received from upstream (e.g. from a robot) into entities in a conceptual representation of the entities read from the data-source according to a preferred embodiment of the invention.
Further possible fransformer elements may e.g. be a fransformer receiving conceptual entities and outputting the entities according to a data storage model.
A further, and more simple fransformer, may e.g. be a mute transformer element, arranged in front of a robot or in a certain branch. This mute may be adapted for blocking the entity or query stream in the respective branch. Such a mute transformer may e.g. be advantageous if a certain robot must receive maintenance, thereby offering the possibility to an operator of maintaining a query processor to modify or exchange a certain robot without modifying the query process graph. Hence, a robot may be maintained without simultaneously receiving a stream of queries. It should be noted that the transformers may by arranged in many different positions in the query graph within the scope of the invention.
TRIGGER QUERY PROCESSOR ELEMENT
The trigger query processor element comprises means e.g. for invoking a query in an element associated with the trigger. The trigger may then comprise a schedule adapted for defining fixed time intervals which determine when to query the associated element, e.g. a cache. Likewise, the trigger may comprise calculation algorithms adapted for calculating suitable trigger conditions, e.g. when to query, and/or how to query. Therefore, the trigger may advantageously comprise statistical evaluation means.
MEDIATOR QUERY PROCESSOR ELEMENT
A mediator query processor element MQPE is adapted for distributing an incoming query to other query processor elements and for gathering the response returned by these queried processor elements, e.g. robots, and returning the answer back to the processor which initially queried the mediator MQPE.
Hence, the mediator may show several different levels of intelligence, from the somewhat simple and uncomplicated branch element simply distributing an incoming query to a number branching elements, to quite intelligent elements capable of distributing an incoming query to the branches most likely comprising the queried entities.
- A mediator may deal with data according to any representation, e.g. conceptual entities, storage entities or exfraction entities.
MESSENGER QUERY PROCESSOR ELEMENT
Other possible types of query processor elements to be included in the query processor graph may e.g. be MESQPE Messenger query process elements. The messenger elements MESQPE are adapted for monitoring the process of the individual QPE's or between the QPE's. These messengers may e.g. be adapted for returning a processor's state-describing parameters to an operator responsible for the query processor or the query processor element. Messengers may e.g. be adapted for providing statistical material or fault warnings. It should be noted that the conceptual building of the domain processor may be performed in many different ways. This means that the word "element" and the word "graph" should in no way restrict the scope of the invention in the sense that the wording primarily reflects the functional understanding of the elements. Evidently, other types of elements may be derived within the scope of the invention, e.g. elements combined on the basis of the above-mentioned elements. Examples of such possible derivatives within the scope of the invention may e.g. be a robot processor comprising a fransformer (i.e. the robots read exfraction entities, transform the data to conceptual entities, and return the entities to a central control, e.g. a database; e.g. a cache comprising a fransformer, e.g. cache comprising a trigger, etc.)
A further advantageous messenger may e.g. be a messenger adapted for raising a flag to the operator managing the query processor when the entities to be transformed into conceptual data are not contained in a reference product catalogue, thereby offering the operator the possibility of updating such a catalogue locally or globally.
Other advantageous elements may e.g. be elements directly adapted for reading a well-known database, i.e. by means of ODBC drivers, thereby making it possible for extracted reading of "foreign" web-based data sources to be supplemented by readings from few or several databases comprising entities included by the domain.
According to the invention, each of the present elements may be activated by clicking on the element in the editor, thereby initiating/activating the element- creating application. Hence, the RobotMaker application will be activated by double- clicking on a selected robot, e.g. RQPEl, and the Domain Modeller will be activated when double-clicking on e.g. the transformer TAQPE2.
When the query processor graph QPG has been established, the graph may be saved, thereby maintaining the properties of the complete query processor QP.
The structure and functioning of the individual query processor elements are defined by means of the domain modeller DMR and the Robotmaker RM. Evidently, some of the query processor elements are domain independent in the sense that they may be included in the query processor graph of several different types of query processors DP, e.g. trigger processor elements with little or no modification, whereas other query processor are somewhat domain specific. An example of a domain independent query processor element may e.g. be the aforementioned mute fransformer element which may be applied by any desired domain without pre- modification.
It should be noted that the Query Processor Modeller may even, and preferably, include query processor execution tools included in the illusfrated "view" setup. Such a setup may include the illustrated view which, when in run mode, illusfrates the running state of the query processor and the individual elements. An example of such intuitive processing is that the individual elements change color according to the state, e.g. within a color range from white to red, depending on the load of the elements.
Moreover, the interface, e.g. the illustrated view, should preferably visually illustrate basic on-off conditions, i.e. illustrate actively if an element is working properly, and whether entities are transferred between the query processor elements and preferably whether entities may actually be transferred between elements. The latter feature may ease operation of the system significantly due to the fact that the absence of an entity flow between the elements does necessarily indicate that a fault-condition has occurred simply because the element is not queried.
Determination of a "clear road" between the elements may e.g. be established by forwarding dummy (testing) queries between elements at certain intervals.
Moreover, the Query Processor Modeller may include submenus facilitating specialized execution of the query processor. Such a submenu is illustrated in fig. 9, and it may e.g. be selected by the "run" drop down menu of the Query Processor Modeller. Moreover, the Query Processor Modeller may feature specialized visualization of certain groups of query processor elements. Thus, a "robot element" viewer may be activated, thereby offering the operator the possibility to concentrate fully on his task, e.g. maintenance or design of robot elements and thereby ignore elements dealt with by other operators.
It should be noted that a query processor according to the invention may easily comprise several hundreds of robots.
Likewise, other designers may advantageously activate a "no robot view" while designing the main body of the query processor.
It should also be noted that the above-mentioned examples of elements may be combined into groups of macro-elements, e.g. of a robot element comprising a fransformer, etc.
Fig. 9 illusfrates a possible user interface of a domain processor DP. A domain processor is adapted for supporting maintenance of one or several query processors QP when established.
The illusfrated user interface of a domain processor comprises a free-based structure monitoring area. One domain processor may control execution and maintenance of several different domains.
This area monitors a first level of node-represented servers NLl. This level illusfrates different servers applied, Webserver, RobotServerl, RobotServer2. A second node level NL2 shows the current domains controlled by the domain server, e.g. Cars, Yachts and PC's. A third level NL3 illustrates different selectable query processor state-indicating functions, e.g. queries, triggers and messages. The function Messages has been selected in the illusfrated view. It should be noted that the term server referred to in level 1 NLl may both reflect a physical location of a query processor with respect to a server, or it may refer to a kind of virtual server comprising several different servers, each processing their part (e.g. element or groups of elements) of the query processor.
Moreover, the illusfrated viewer comprises a message viewing area MVA adapted for viewing messages forwarded automatically by e.g. different unique elements of a query process path or groups of elements. The attributes of listed messages may e.g. be chosen as the illusfrated Title, Date, Priority, Origin Element.
The viewer may moreover facilitate a filtering of the individual elements of the original element. Hence, an operator may e.g. establish a filtering of messages from a certain element, Original Element, or of groups of elements, e.g. mediators or transformers.
Moreover, the viewer comprises a message detail window MDW. This viewer may illustrate details about a single message or groups of selected messages in the messages view area MVA. Each message may e.g. be associated with a startup- facility with the purpose of activating the editor or editors associated with the individual message.
A query element program, e.g. a robot editor, may be started directly from the domain processor DP, e.g. by automatically importing the data from an element selected in the viewer such as a specific robot.

Claims

Patent Claims
1. Domain processor (DP) comprising
-at least one robot modeller (RM)
-at least one domain modeller (DMR),
-at least one Query Processor Modeller (QPM)
said robot modeller (RM) comprising
means for modelling at least one computer-based robot (R),
said at least one robot (R) being adapted for accessing at least one web- based data source (DS),
said at least one data source (DS) comprising entities comprised in a predefined domain (D),
said at least one domain modeller (DMR) comprising
means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one exfraction model (EM) and at least one storage model (STM),
means for establishing at least one exfraction model (EM) associated with a chosen domain,
means for establishing at least one storage model (STM) associated with said chosen domain, said at least one Query Processor Modeller (QPM) comprising
means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE),
means for combining at least two of the selected Query Processor elements (QPE),
means for executing said associated query processor elements on at least one computer system (CS),
at least one of said query processor elements (QPE) of associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
2. Domain processor (DP) according to claim 1, wherein the domain processor (DP) comprises at least one query processor maintenance manager (QMM), said at least one query processor maintenance manager (QMM) comprising means for executing at least one query processor (QP) established by the domain processor.
3. Robot modeller (RM) comprising
means for modelling at least one computer-based robot (R),
said at least one robot (R) being adapted for accessing at least one web-based data source (DS),
said at least one data source (DS) comprising entities comprised in a predefined domain (D).
4. Domain modeller (DMR) comprising
means for modelling at least one domain model (DM) associated with at least one chosen domain, said domain model (DM) comprising at least one extraction model (EM) and at least one storage model (STM),
means for establishing at least one exfraction model (EM) associated with a chosen domain,
means for establishing at least one storage model (STM) associated with said chosen domain,
5. Domain modeller (DMR) according to claim 4, wherein
said domain modeller comprises means for establishing reference mapping between extracted data obtained according to said exfraction model (EM) and a conceptual representation of said data.
6. Domain modeller (DMR) according to claim 4 or 5, wherein
said reference mapping defines a set of reference entities describing a number of entities (E), said entities having attributes.
7. Domain modeller (DMR) according to claim 4 to 6, wherein said domain modeller (DMR) comprises means for establishing at least one language domain dictionary (LDD).
8. Domain modeller (DMR) according to claims 4-7, wherein said at least one language domain dictionary (LDD) maps the language of the extracted entities into the general language of the query processor (QP).
9. Domain modeller (DMR) according to claims 4-6, wherein said domain modeller (DMR) comprises means for establishing a set of reference recognition patterns.
10. Query Processor Modeller (QPM) comprising
means for selecting at least two Query Processor elements (QPE) from a set of predefined query processor elements (QPE),
means for combining at least two of the selected Query Processor elements (QPE),
means for executing said associated query processor elements on at least one computer system (CS),
at least one of said query processor elements (QPE) of the associated query processor elements being a Robot query processor Element (RQPE) adapted for accessing at least one web-based data source (DS).
11. Query Processor Modeller (QPM) according to claim 10, wherein the Query Processor Modeller comprises a graphical user interface (GUI) in the form of a visual programming tool.
12. Query Processor Modeller (QPM) according to claim 10 or 11 wherein said set of query processor elements (QPE) comprises at least two different types of query processor elements,
at least one type being a robot query processor element (RQPE) and at least one type being a trigger query processor element (TQPE).
13. Query processor maintenance manager (QMM) comprising
means for executing at least one query processor (QP) established by the domain processor.
14. Query processor maintenance manager (QMM) according to claim 13, wherein said maintenance manager (QMM) comprises means for monitoring the state of at least one query processor element (QPE) or the performance of at least one query processor element (QPE).
15. Query processor maintenance manager (QMM) according to claim 13 or 14, wherein said domain processor maintenance manager (QMM) comprises means for evaluating the data flow between query processor elements (QPE) of a query processor path.
16. Query processor maintenance manager (QMM) according to claims 13 - 15, wherein said domain processor maintenance manager (QMM) comprises means for running and visual monitoring of the individual modules of a query processor.
17. Query processor maintenance manager (QMM) according to claims 13-16, wherein said domain processor maintenance manager (QMM) comprises means for running and visual monitoring of a query processor (QP) on element basis.
18. Web-robot said robot comprising means for extracting information from web-based data sources (DS) in dependency of at least one extraction model (EM), said at least one extraction model comprising reference data structures defining entities and/or entity structures of data sources in a domain.
19. Web-robot according to claim 18,
said robot comprising at least one exchangeable plug-in, said plug-in comprising retrieving routines adapted for reading knowledge stored in said extraction model, said knowledge preferably being domain-specific.
20. Web-robot according to claim 18 or 19, wherein said plug-in defines a reference mapping between extracted data obtained according to said extraction model (EM) and a conceptual representation of said data.
21. Web-robot according to claims 18 - 20, wherein said exfraction model (EM) is shared between at least two robots.
22. Query processor (QP),
said query processor (QP) comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM),
said query processor (QP) comprising at least three query processor element (QPE),
at least two of said query processor elements (QPE) comprising a robot (RQPE)
said robot (RQPE) being attached to at least one data source (DS) said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE),
at least one of said query processor elements (QPE) comprising a trigger (TQPE) said trigger query processor element (TQPE) comprising means for establishing a query.
23. Query processor (QP) according to claim 22, wherein at least one of the query processor elements (QPE) comprises a fransformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE).
24. Method of establishing at least one query processor (QP),
said query processor (QP) comprising a set of web-based data sources (DS), wherein at least two of said data sources (DS) comprise entities according to a domain model (DM),
said query processor (QP) comprising at least three query processor element (QPE),
at least two of said query processor elements (QPE) comprising a robot (RQPE),
said robot comprising means for accessing information from the at least one data source (DS) according to at least one extraction model (EM) associated with said robot (RQPE),
at least one of said query processor elements (QPE) comprising a trigger (TQPE),
said trigger query processor element (TQPE) comprising means for establishing a query,
said method comprising the step of
attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain,
combining the selected query processor elements into a query processor (QP) by means of a graphical user interface (GUI).
25. Method of establishing at least one query processor (QP) according to claim 24, wherein said graphical user interface (GUI) defines a query processor element path visually on a drag- and drop basis.
26. Method of establishing at least one query processor (QP) according to claim 24 or 25, wherein at least one of the combined query processor elements (QPE) comprises a fransformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE).
27. Method of establishing at least one query processor (QP),
said query processor comprising means for accessing data from web-based data sources (DS) of a domain by means at least one user interface (UI)
said method comprising the steps of selecting a number of query processor element (QPE)
at least one of said selected query processor elements (QPE) being a robot query processor element (RQPE),
at least one of said selected query processor elements (QPE) being a trigger query processor element (TQPE),
attaching at least one selected robot query processor element (RQPE) to at least one of the data sources (DS) of the domain,
combining the selected query processor elements into at least one query path defining the data flow in the query processor (QP) between the user interface (UI) and the web-based data sources of the domain, said method comprising a further step of customizing the at least one individual robot query processor element (RQPE) to the corresponding attached data sources (DS),
customizing at least one of the trigger query processor elements (TRPE) to the query processor (QP).
28. Method of establishing at least one query processor (QP) according to claim 27, wherein at least one of the combined query processor elements (QPE) comprises a transformer query processor element (TAQPE), a messenger query processor element (MESQPE) or a mediator query processor element (MQPE).
29. Method of extracting data from a web-based data source (DS), said method comprising the steps of
-identifying and reading attributes and entities of a web-based data source,
-converting the read entities into instances of conceptual entities,
-verifying whether the read instances correspond with an entity reference base, (ERB).
30. Method of extracting data from a web-based data source according to claim 29, whereby
- the read instances are verified to determine whether they correspond with an entity reference base, (ERB) on the basis of entities represented in said conceptual entity- representing format.
31. Method of extracting data from a web-based data source according to claim 29 or 30, whereby the verified instances are modified according to the entity reference base
(ERB) by adding information associated with said instances corresponding to said entity reference base.
32. Method of extracting data from a web-based data source according to claims 29- 31, said method comprising correction of the verified instances according to the entity reference base (ERB) by correcting information associated with said instances corresponding to said entity reference base.
33. Method of establishing a query processor,
said query processor being adapted for accessing data on at least two different web- based data sources,
selecting at least two predefined query processor elements (QPE),
combining the selected query processor elements into a desired query processor structure.
34. Method of establishing a query processor according to claim 33,
said at least two predefined query processor elements having different functional characteristics.
35. Method of establishing a query processor according to claims 33 and 34, said method comprising the step of modifying the selected query processor elements according to the data structure of said web-based data sources.
36. Method of establishing a query processor according to claims 33-35,
wherein said modification of the selected query processor elements comprises at least one plug-in software module, said at least one plug-in defining domain-specific properties of said element.
37. Method of establishing a domain-accessing routine,
said domain comprising a plurality of web-based data sources,
said method comprising the steps of establishing at least one robot () adapted for retrieving entities stored on said plurality of web-based data sources
establishing at least one reference catalogue,
establishing at least one procedure of verifying the retrieved entities by comparing the read entities with the at least one reference catalogue.
38. Method of establishing a domain-accessing routine according to claim 37,
said method comprising the steps of
establishing at least one storage means
establishing a data-exchanging interface between said at least one robot and at least one storage means.
39. Method of establishing a domain-accessing routine according to claims 37-38, wherein said reference catalogue is a product catalogue.
40. Method of establishing a domain-accessing routine according to claims 37-39, wherein said established procedure of verification comprises modification of the retrieved entities if the verification procedure indicates or proves that a read entity is not valid according to the at least one reference catalogue.
41. Query processor maintenance manager (QMM) comprising at least one domain processor user interface (DPUI)
said manager (QMM) comprising means for evaluating different modules of at least one query processor (QP),
said means for evaluating different sub-routines of said query processor comprising
means for monitoring the state of at least on query processor element (QPE)
42. Query processor maintenance manager (QMM) according to claim 41, said processor comprising means for automatically forwarding messages to said at least one query processor user interface (DPUI) when certain predefined conditions are met.
43. Query processor maintenance manager (QMM) according to claim 41 or 42 said manager (QMM) comprising means for modifying individual query processor elements/sub-routines .
44. Query processor maintenance manager (QMM) according to claims 41-43, said manager (QMM) comprising means for modifying the query flow in the query processor during execution of the query processor.
PCT/DK2000/000700 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor WO2002048906A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP00984910A EP1342171A1 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor
US10/450,792 US7698277B2 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor
PCT/DK2000/000700 WO2002048906A1 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor
CA002431908A CA2431908A1 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor
AU2001221507A AU2001221507A1 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/DK2000/000700 WO2002048906A1 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor

Publications (1)

Publication Number Publication Date
WO2002048906A1 true WO2002048906A1 (en) 2002-06-20

Family

ID=8149409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK2000/000700 WO2002048906A1 (en) 2000-12-14 2000-12-14 Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor

Country Status (5)

Country Link
US (1) US7698277B2 (en)
EP (1) EP1342171A1 (en)
AU (1) AU2001221507A1 (en)
CA (1) CA2431908A1 (en)
WO (1) WO2002048906A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7650608B2 (en) * 2002-05-16 2010-01-19 Bea Systems, Inc. System and method for application and resource data integration
US20060004848A1 (en) * 2004-05-25 2006-01-05 Williams Evelyn L Methods and system for presenting attributes and associations of managed objects
US8020110B2 (en) * 2005-05-26 2011-09-13 Weisermazars Llp Methods for defining queries, generating query results and displaying same
JP4144806B2 (en) * 2005-08-30 2008-09-03 株式会社プロフィールド Information editing apparatus, information editing system, information editing method, and program
US8261189B2 (en) * 2005-11-30 2012-09-04 International Business Machines Corporation Database monitor replay
US8275312B2 (en) * 2005-12-31 2012-09-25 Blaze Mobile, Inc. Induction triggered transactions using an external NFC device
US7698257B2 (en) * 2006-05-16 2010-04-13 Business Objects Software Ltd. Apparatus and method for recursively rationalizing data source queries
US8423561B2 (en) * 2009-07-02 2013-04-16 Catavolt, Inc. Method and system for simplifying object mapping for a user interface
US8983984B2 (en) 2009-07-02 2015-03-17 Catavolt, Inc. Methods and systems for simplifying object mapping for external interfaces
US8918436B2 (en) 2011-12-22 2014-12-23 Sap Ag Hybrid database table stored as both row and column store
US20160019316A1 (en) 2014-07-21 2016-01-21 Splunk Inc. Wizard for creating a correlation search
US20160342431A1 (en) * 2015-05-22 2016-11-24 Bank Of America Corporation Interactive help interface
US11249710B2 (en) * 2016-03-31 2022-02-15 Splunk Inc. Technology add-on control console

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000074294A2 (en) * 1999-05-31 2000-12-07 Webnara Co., Ltd. General-purpose robot agent and real-time search method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085186A (en) * 1996-09-20 2000-07-04 Netbot, Inc. Method and system using information written in a wrapper description language to execute query on a network
JP3655714B2 (en) * 1996-11-15 2005-06-02 株式会社ニューズウオッチ Information filtering apparatus and recording medium
JP2873222B2 (en) * 1997-05-12 1999-03-24 川崎重工業株式会社 Robot information processing device
US5999940A (en) * 1997-05-28 1999-12-07 Home Information Services, Inc. Interactive information discovery tool and methodology
US6301584B1 (en) * 1997-08-21 2001-10-09 Home Information Services, Inc. System and method for retrieving entities and integrating data
WO2000073942A2 (en) * 1999-05-27 2000-12-07 Mobile Engines, Inc. Intelligent agent parallel search and comparison engine
US6564210B1 (en) * 2000-03-27 2003-05-13 Virtual Self Ltd. System and method for searching databases employing user profiles
US6674450B1 (en) * 2000-04-14 2004-01-06 Trilogy Development Group, Inc. Interactive data-bound control

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000074294A2 (en) * 1999-05-31 2000-12-07 Webnara Co., Ltd. General-purpose robot agent and real-time search method

Also Published As

Publication number Publication date
EP1342171A1 (en) 2003-09-10
AU2001221507A1 (en) 2002-06-24
CA2431908A1 (en) 2002-06-20
US20040030685A1 (en) 2004-02-12
US7698277B2 (en) 2010-04-13

Similar Documents

Publication Publication Date Title
US6567812B1 (en) Management of query result complexity using weighted criteria for hierarchical data structuring
US7698277B2 (en) Query processor, query processor elements and a method of establishing such a query processor and query processor elements and a domain processor
Van Deursen et al. Domain-specific language design requires feature descriptions
US6594669B2 (en) Method for querying a database in which a query statement is issued to a database management system for which data types can be defined
US8484611B2 (en) Method and system for simplified assembly of information processing applications
US7543228B2 (en) Template for rendering an electronic form
CN101686146B (en) Method and equipment for fuzzy query, query result processing and filtering condition processing
US20020123984A1 (en) Dynamic query of server applications
US20030105745A1 (en) Text-file based relational database
US20020091835A1 (en) System and method for internet content collaboration
US20080281904A1 (en) Associating service listings with open source projects
JPH06175906A (en) Information accumulation system and method
CN101438280A (en) Managing related data objects
JP2004506962A (en) Automatic software creation system
WO2003009517A2 (en) Generate customized xml mapping conversion code
WO2012160381A2 (en) Platform for the delivery of content and services to networked connected computing devices
CN1647081A (en) System and method for design, procurement and manufacturing collaboration
JP2003529829A (en) Methods and systems for modeling legacy computer systems
CN107241914A (en) The system and method rewritten for search inquiry
CN113608955B (en) Log recording method, device, equipment and storage medium
KR20000058925A (en) A system of automated account registration, login and management on internet web services
US20030050967A1 (en) Apparatus and method for optimal selection of IP modules for design integration
CN101635711A (en) Programmable character communication method
US20140059051A1 (en) Apparatus and system for an integrated research library
JP2008123423A (en) Audit system for storing/retrieving telegram data, audit method for storing/retrieving telegram data, and audit program for storing/retrieving telegram data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ CZ DE DE DK DK DM DZ EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2000984910

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2001221507

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 10450792

Country of ref document: US

Ref document number: 2431908

Country of ref document: CA

WWP Wipo information: published in national office

Ref document number: 2000984910

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP