WO2001090934A1

WO2001090934A1 - Automatic and secure data search method using a data transmission network

Info

Publication number: WO2001090934A1
Application number: PCT/FR2000/001407
Authority: WO
Inventors: Daniel Vinsonneau
Original assignee: Daniel Vinsonneau
Priority date: 2000-05-23
Filing date: 2000-05-23
Publication date: 2001-11-29
Also published as: EP1290578B1; EP1290578A1; DE60037681D1; US7043482B1; AU2000252256A1; DE60037681T2

Abstract

The invention concerns a method for searching data stored in at least a database (32, 34) accessible through at least a server external to a data transmission network (14) which consists in addressing a request (36) for each database from a local server associated with at least a local database (12), the request comprising fields containing general criteria concerning the type of searched data, data content and/or dates associated with the data, the fields capable of being linked by boolean operators. The downloading process comprises the following steps: generating with the local server scripts consisting of a series of commands based on the general criteria and enabling the generation of pointers identifying the data fields to be downloaded, and activating automatons for downloading the fields pointed by the pointers and other data fields associated by predefined relationships with the pointed data fields.

Description

Automated and secure data search method using a data transmission network

Technical Field: The present invention relates to methods and systems of document retrieval in which a user has access by means of a local server of a data transmission network, to databases accessible by servers of the network, the data meeting user-defined criteria is then downloaded to be stored in a database associated with the local server.

State of the art: There are a large number of databases offering access to the most diverse information such as patents, stock market prices, airline reservations, ... etc. These databases are generally accessible via the Internet or by direct connection via the telephone network. It is then possible to query a database either by entering the queries using the keyboard or by using a file containing at least one query called an automatic query script. The resulting data are either viewed interactively, or printed, or saved to a file for future reference.

One type of database that is widely used includes databases containing patent publications, which currently make up 80% of the world's written information. Patent databases can be classified into two families: general databases which are constructed from a documentary background in which information is grouped by family. This type of database makes it possible to limit the number of articles while promoting

, reading in a particular language. This is the case of the ESPACENET database of the European Patent Office where bibliographic information is translated into English. On the other hand the national databases are limited to the patents of a given country (France, Germany) but are more complete in the sense that the information provided is for example the patent in its entirety. The search therefore consists of making a first request on a general-purpose database of the ESPACENET type to select the publication numbers, then making a second request on one or more national databases to obtain more complete information.

Although in the patent databases, several downloads concerning the same subject can be grouped together with deletion of duplicates and possibly deletion of patents outside the subject, each request made on the WEB in the Internet is independent. The HTML pages obtained are also independent. Consequently, the user does not have any synthetic information.

The current way of interrogating databases, and in particular databases relating to patents, is therefore not very effective on an industrial level. Indeed, the data received as a result of a search on the Internet is long to obtain, difficult to consult and to handle and is sometimes erroneous. These drawbacks arise from the fact that each request is a manual request and is limited to a single interrogation process, which makes it necessary to multiply the number of these manual requests.

Finally, a major drawback of data search methods performed on the Internet is the lack of confidentiality. In fact, an information server has a log file that allows you to view the content of current requests. The IP address used is also known to this server. Therefore, an attacker can therefore, through the visualization of a request, know the subject on which works the one who interrogates the server.

Statement of the invention:

The object of the invention is therefore to carry out an automated method of searching for data in databases accessible by a data transmission network which allows rapid and efficient access to a plurality of databases without having to formulate a multitude of manual requests.

Another object of the invention is also to provide a secure method of searching for data in databases accessible by a data transmission network in which the original request is a general request which does not allow obtain precise information on the characteristics of the research.

According to a first object, the invention therefore relates to a method of searching for data stored in at least one database accessible by at least one external server of a data transmission network consisting in sending a request for each database to from a local server associated with at least one local database, the request comprising fields containing general criteria relating to the type of information sought, the content of the information, and / or the dates associated with the information, the various fields can be linked by Boolean operators. The download stage comprises the following stages: generation by the local server of scripts composed of a series of commands based on the general criteria and allowing the creation of pointers identifying the data fields to be downloaded, and activation of automata allowing download the data fields pointed to by the pointers as well as other data fields associated by predefined relationships with the pointed data fields.

According to a second object, after the data meeting the general criteria have been downloaded from the external server to the local server, a step of syntactic analysis of the data downloaded takes place according to specific criteria different from the general criteria making it possible to create pointers to specific data fields of the downloaded data, before the downloaded data and pointers are stored in the local database.

Brief description of the figures:

The aims, objects and characteristics of the invention will appear more clearly on reading the description which follows with reference to the drawings in which:

FIG. 1 schematically represents a data transmission network to which a local server and two external servers with databases are connected, in which the user associated with the local server wants to access according to the method of the invention,

FIG. 2 is a block diagram representing the system for implementing the method according to the invention, FIG. 3 represents an example of a request that can be used in the method according to the invention, FIG. 4 represents an example of a page selection can be used in the method according to the invention, FIG. 5 represents an example of an interrogation page which can be used in the method according to the invention, and FIG. 6 is a flowchart representing the different steps implemented in the method according to the invention. Detailed description of the invention:

As illustrated in FIG. 1, the method according to the invention can be implemented in a local server 10 having a local database 12, which local server is connected to a network 14 such as a network of the IP type. and in particular the Internet. The local server 10 allows the sending of data search requests to the external servers 16 and 18 having respec "ent databases 20, 22, 24 and databases 26, 28. It should be noted that the database local data 12 could be replaced by several databases without departing from the scope of the invention.

The general block diagram of the invention is illustrated in FIG. 2. In the following description, the method of the invention will be applied to the search for information in the patent databases. In the user interface 29 which is generally a workstation, a home page 30 is displayed on the display device of the local server when the browser used is opened. This home page displays one or more forms for accessing respectively one or more external databases 32, 34, each form being intended to constitute a request 36 to the selected database. It should be noted that a database could also be accessible by several forms.

Request 36 can be a simple list of patents or any other list such as all of the patents of such an inventor or such a company. More sophisticated search criteria can be used such as searches on words or text using Boolean operators on predetermined fields or not.

Figure 3 describes the elements that can be used for a text search on data servers External. In fact, the query possibilities are adjusted according to the possibilities of each server but retain certain common characteristics. The predefined templates thus simplify the user interface. The proposed example contains a first pair of fields 38 and 40, respectively a text field in natural language and the type of associated field in which we want to perform the searches. Depending on the servers, Boolean operators can be integrated into the text. A second pair 42 and 44 can also optionally be used and correlated with the first pair by a Boolean operator 46. Finally, a date field 48 can also be defined to limit searches. The list of fields is not exhaustive. ^' Depending on the servers, other facilities can be used.

Once the fields have been defined in the request, the "enter" command or the selection of a specific button on the user interface 29 transmits this request to the server associated with the database considered. Since the server can work in a language which is not the usual language of the user, a conversion into the desired language can be made in order to simplify the generation of the request which is then translated by the generation system.

The external database 32 or 34 then returns data from the search which can also be translated into the language of the user. In fact, the responses are in the form of hypertext links and pages which are accessible interactively by means of the browser on the local server.

At this stage, the user selects the type of data he wants to download from a selection page 50 in

HTML which is displayed, for example, in an area of the home page 30 of the user interface 29. This page selection bar illustrated in FIG. 4 generally contains the interesting elements to download which are the cover page 51, the citations 52, the drawings 54, the claims 56, and the description 58. An annex field 60 makes it possible to define when ( time, date) the download will be performed.

The user then launches the download sequence by clicking on the corresponding button. The download action begins with the generation of scripts which are composed of a series of commands based on the search criteria of the query and which allow the creation of pointers identifying the data fields to be downloaded. These scripts are used for processes 62, 64 associated respectively with databases 32 and 34 to fetch the information from the databases. Each process involves computer automatons 66, 68 for the process 62 or 70 for the process 64 and the number of which depends on the volume or the quantity of the data sought. Thus, it is possible that a number of controllers between 10 and 100 working in parallel interrogate the same database. If, for example, the number of responses to the request is greater than 10,000 while the number of responses that can be provided by the server is limited to 500, several controllers will be required by selecting publication date windows for each controller by l 'logical AND. We can thus foresee logical ANDs covering several fields in combination. Thus, each automaton will remain within the authorized limit and the results will be complete and not partial as would have been the case with a conventional process and are supplied to the user in a completely transparent manner.

Automated computers do their work in the background outside the server browser. This is made possible by the fact that each database has known display formats that allow automata to simply find the different pages of a patent. This requires a minimum parsing of the pages in order to extract the information necessary to be able to access the following pages. For example, the USPTO US patent database creates temporary directories for each request. In addition, the machines determine the number of patent pages and the location, the numbers of the drawing pages by reading the underlying information in a patent page. In other words, the controllers download the data fields pointed to by the pointers created in the scripts, but also download other data fields associated with the data fields pointed to by predefined relationships. According to a variant, it can be planned to download only the data fields which are not yet in the local database in order to minimize the volume of data to be downloaded.

After the download phase, the different downloaded pages are analyzed by a parsing unit or "parsing" to find or create fields such as the number of a patent, its date of publication, the names of the inventors ... This step can be performed in parallel with the download step once sufficient data is available to process at least one patent. We can then use the data provided by the analyzer 72 as data for sending a new request. For example, you can select non-US patents cited for a US patent in the USPTO database and search for these patents in the ESPACENET database. Or, the analysis of the different extensions of a patent can lead to choosing the best source for a given client.

Once the parsing process has been carried out for at least one patent, the local database 12 is supplied by filling in the fields for each patent. Note that the patent object in the local database includes a large number of fields. It is the meeting (in the mathematical sense of the term) of the fields available separately on each accessible database. For example: the US patents in the ESPACENET database do not contain the US classification codes and these same patents on the USPTO database do not contain the extensions or the ECLA code.

Finally, the home page 30 in the user interface 29 also includes an interrogation page 74 for immediately or indirectly interrogating the local database 12.

This page displays a grid of boxes to be completed. Each box corresponds to a field of patents present in the database (for example international classification, US, applicant, ...). There may be an additional box to enter the download number, if desired. There can also be a last box to directly enter a query in SQL language or the name of an SQL file with preprogrammed questions (SQL script). Logical operators can be applied between fields. You can also specify display and classification parameters for the information sought

(for example - sorted by company, sorted by publication date ...). These parameters make it possible to display and / or group the patents together in one or more trees that can be defined and where one moves by means of hyperlinks.

A query page generally includes the exhaustive list of fields illustrated in FIG. 6: 1st patent number, title, inventor (s), applicant, date of issue, date of publication, summary, claims, description, US classification, cited patents, international classification, ECLA classification, priority country, priority number, family, filing date as well as filing number, patent attorney and finally the name of the first examiner. This list is only an example and may see new items added.

In order to fully understand the method according to the invention, the different steps of said method are shown in FIG. 6. After the display of the home page (76) in the user interface, the user must determine whether he wishes to make a request or a query to the local database (78). If it is a request, it is sent by the transmission network to the selected external server (80). It then displays the HTML pages sought (82).

Then, the user selects the type of information he wants to download by filling in the selection page (84). As we saw previously, we can provide an additional field in the selection page to define whether the download is deferred or not (86). This corresponds to a batch mode which avoids saturating the network and the server depending on the size of the download. Another advantage of this delayed download is to group together all the downloads concerning this server, which further obscures the objective of the request. If the download has to be postponed, a time counter is started (88), and it is only at the expiration of a predefined time that the process is continued.

After the decision to download has been made, the next step is to generate the scripts (90) followed by the actual download by the computer controllers (92). Then the process passes to the parsing or parsing step (94) which makes it possible to define new data fields (96). Note that these data fields can in turn be used as fields search in a new query from the same database or from another database.

Then, the downloaded data fields and the pointers to the new data fields defined during the parsing step are stored in the local database (98). At this stage, it is possible to perform an immediate query in the local database or not (100). If not, the process returns to displaying the home page (76). If the user desires an immediate interrogation as is generally the case, the interrogation page is populated (102) and a structured display of the data fields selected by the interrogation page takes place at the user interface (104 )

When the data has been stored in the local database (98), it is converted into an SQL sequence. When there is a query, the raw data from the local database is converted into an HTML page comprising an index and the query grid of the query page. This index is a series of hyperlinks

(numbers and titles of patents found, etc. see display parameters) towards a process which makes it possible to display the content of a patent. We also display the SQL script of the query in order to modify it or archive it in a file. A number is associated with each interrogation, which makes it possible to combine several queries. This number and the corresponding SQL scripts are destroyed or kept at the end of the session at the user's choice. You can iterate through the local database query process to refine the query. It is possible to export the result of the request to a directory to allow, for example, the burning of a CD-ROM, the generation of an Intranet Printing site. The foregoing description shows that the method of the invention (and the system allowing its implementation) makes it possible to collect data automatically and efficiently and quickly thanks to the use of a plurality of automata which can work simultaneously on the same base. , multiple databases from the same server, or databases from different servers. In addition, the method of the invention is secure in that the most precise search is never done on the external servers, but on the local server although the download is performed on all of the data. It is therefore possible to make a general request with only one word or classification code and to download all the patents that meet this criterion. The downloaded patents are then analyzed with a syntax analyzer, the local database can then be queried confidentially to obtain precise data.

Modifications can be made to the process which has just been described without departing from the scope of the invention. Thus, a request made can be updated regularly thereafter, for example every month, in order to automatically take into account the updates made in the external servers. You just have to define in the date field of the request (see figure 3) the automatic update function with optionally the update frequency. Thus the local database will be the image of external databases with a slight delay. A message is then generated for the user to notify him of any update.

Claims

1. Method for searching for data stored in at least one database (32, 34) accessible by at least one external server of a data transmission network (14), consisting in sending a request (36) for each database. data from a local server (10) associated with at least one local database (12), said request comprising fields containing general criteria relating to the type of information sought, the content of the information, and / or the dates associated with the information, said fields being able to be linked by Boolean operators; said method being characterized in that it further comprises, after the data meeting said general criteria have been downloaded from said external server to said local server, a step of syntactic analysis of the downloaded data according to specific criteria different from said general criteria allowing creating pointers to specific data fields of said downloaded data, before said downloaded data and said pointers are stored in said local database.

2. Method for searching for data stored in at least one database (32, 34) accessible by at least one external server of a data transmission network, consisting in sending a request (36) for each database from a local server (10) associated with at least one local database (12), said request comprising fields containing general criteria relating to the type of information sought, the content of the information, and / or the dates associated with the information, said fields can be linked by Boolean operators, and then download the data meeting said general criteria from said external server to said local server; said method being characterized in that the downloading step comprises the following steps:

generation by said local server of scripts composed of a series of commands based on said general criteria and allowing the creation of pointers identifying the data fields to be downloaded, and

- activation of automata allowing to download the data fields pointed by said pointers as well as other data fields associated by predefined relationships to said pointed data fields.

3. Method according to claim 2, further comprising after the downloading step, a step of syntactic analysis of the downloaded data according to specific criteria different from said general criteria, making it possible to create pointers to specific data fields of said downloaded data , before said downloaded data and said pointers are stored in said local database.

4. The method of claim 3, wherein said specific data fields created by said parsing step are used to send a new request for the same database or another database.

5. Method according to one of claims 1, 2, 3 or 4, further comprising a selection step in a selection page (50) in the user interface, allowing the user to select some of said data fields meeting said general criteria after the HTML data pages requested in the request have been displayed.

6. The method of claim 5, wherein said selection page (50) comprises an annex field containing the time and / or the date on which the deferred downloading step should be carried out.

7. Method according to one of claims 2 to 6, in which the number of automatic devices activated for the downloading step is a function of the volume of data sought so that the quantity of data downloaded by said automatic devices is not limited by parameters associated with the database in which said data are located.

8. The method of claim 7 wherein each of said automata has the task of performing a syntactic analysis of the data pages to be downloaded so as to extract the information necessary for the recovery of the subsidiary data or underlying data from said data pages .

9. Method according to one of claims 2 to 7, further comprising a step of interrogation by means of an interrogation page (74) in the user interface (29), allowing the user to access to data fields stored in said local database (12).

10. The method of claim 9, wherein said query page (74) includes identifying the data fields which can be selected by the user so as to be displayed in the user interface, said data fields which can be linked by logical operators.

11. The method of claim 10, wherein the information selected in said query page is automatically converted into SQL language before being communicated to said local database (12).

12. Method according to any one of the preceding claims, in which said request (36) includes a date field so as to be updated regularly in order to automatically take account of updates in said external databases (32, 34).

13. Method according to any one of the preceding claims, in which said external databases (32, 34) are patent databases.

14. System for searching for data in external databases comprising means suitable for implementing the steps of the method according to one of the preceding claims.