WO1990008360A1 - System and method for retrieving information from a plurality of databases - Google Patents

System and method for retrieving information from a plurality of databases Download PDF

Info

Publication number
WO1990008360A1
WO1990008360A1 PCT/US1990/000037 US9000037W WO9008360A1 WO 1990008360 A1 WO1990008360 A1 WO 1990008360A1 US 9000037 W US9000037 W US 9000037W WO 9008360 A1 WO9008360 A1 WO 9008360A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
database
search
search request
databases
Prior art date
Application number
PCT/US1990/000037
Other languages
French (fr)
Inventor
Daniel E. Meyer
Richard P. Kollin
Gerald A. Francis
Original Assignee
Telebase Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telebase Systems, Inc. filed Critical Telebase Systems, Inc.
Publication of WO1990008360A1 publication Critical patent/WO1990008360A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion

Definitions

  • This invention relates to the field of information retrieval, and especially the retrieval of information from one or more databases.
  • Groups of related and unrelated databases are sometimes arranged in "families".
  • a single vendor, or “host” may provide access to a family of databases, all of which can be reached through the same telephone num ⁇ ber, and with the same identification number and password.
  • Dialog Information Services, Inc. of California, provides access to a large family of databases, dealing with many different subjects, under the service mark DIALOG.
  • family means a set of related or unrelated databases available on one particular host.
  • the terms "family” and "host” are virtually interchange ⁇ able.
  • To use each database one must know, in general, a telephone number, an identification number, and (in most cases) a password. One must also know the "language" of the database being searched.
  • a database language includes the particular syntax applicable to search requests in that database.
  • a search request includes a word or group of words, connected by various logical (typically Boolean) opera ⁇ tors (e.g. AND, OR, NOT, etc.).
  • the user transmits a search request to a database in order to retrieve all items in the database which contain the specified logical combination of terms of the search request.
  • logical typically Boolean
  • opera ⁇ tors e.g. AND, OR, NOT, etc.
  • a database language includes a command set.
  • command set means a group of commands used to conduct searches in a database.
  • a command is needed to direct the performance of each aspect of database searching.
  • a command is needed to tell the system to begin the search, and another command is needed to direct the system to display the results of a search.
  • a database language includes the field structure of the database.
  • Each database is arranged by fields, i.e. searchable segments of documents. For example, one might want to search for articles in a database by searching according to authors or titles or abstracts.
  • the "author”, “title”, and “abstract” segments of a document are called “fields”.
  • each database provides a different set of fields which can be searched, and a different set of "field tags" which identify particular fields.
  • each database could have a different syntax, a different command set, and a different field structure, since each database is, in general, created by a different entity.
  • each database is, in general, created by a different entity.
  • U.S. Patent No. 4,774,655 the disclosure of which is incorporated by reference herein, addresses the problems of nonuniformity among data ⁇ bases and database families.
  • the system described in that patent is an intermediary between the user and the databases being searched.
  • the sys ⁇ tem described in the patent automatically selects a database in which to search, and then performs the search.
  • the user does not make direct con ⁇ tact with the database, and does not need to know the particular syntax of the database, or even its identity.
  • the system accepts, from the user, a search request, written in a simple and standardized format, and automatically translates the search request into the syntax appropriate to the database in which the system has chosen to search.
  • the system minimizes the time that it is directly connected ("on line") to the data ⁇ base, and presents the search results to the user after the connection with the database has been terminated.
  • the user selects a database directly, but still uses the system as an intermediary, thereby avoiding the need to establish a separate account with each database. It is also possible to take advantage of the ability of the system to translate a search request into the syntax of a selected database or database family, while still maintaining full control over the identity of the database or family se ⁇ lected.
  • the present specification discloses enhancements to the basic system described in the cited patent. One of these enhancements allows a user to search through a large number of databases in one session.
  • Another embodiment improves on the standardization described in the cited patent, by enabling the user to search through many different selected databases using a standardized set of commands.
  • Other features reduce the likeli ⁇ hood of null search results, and also deal with the problem of too many retrieved documents.
  • Another feature provides an economical method of determining, in advance, which retrieved documents are likely to be the most relevant. With these features, the system is even more convenient to use than systems of the prior art, and database searches are more likely to yield meaningful results.
  • the present invention enables a user to obtain information from one or more databases.
  • the user communicates with a central computer, pro ⁇ grammed according to the invention, and the computer establishes direct communication with the databases.
  • the user is not located at the same place as the central computer, and the central computer is not at the same location as the various databases.
  • the invention is not limited by the distances between the user, the central computer, and the databases.
  • the user first transmits an area of interest and a search request to the central computer.
  • the area of in ⁇ terest can be selected from a menu of available subject areas.
  • the search request includes one or more words, connected by logical opera ⁇ tors, such as Boolean operators or proximity connectors.
  • the object of the search is to locate all documents which contain the indicated logical combination of words of the search request.
  • the central computer executes the search request in each of a set of databases associated with the user's area of interest.
  • the databases in this set are predetermined, and are stored in the memory of the central computer, one set being asso ⁇ ciated with each area of interest.
  • the databases may be located on dif ⁇ ferent hosts, or they may reside on the same host.
  • the central computer establishes communication separately with each database, translates the search request into the syntax of that database, if necessary, and exe ⁇ cutes the search. Normally, the searches are done sequentially, in each database, but it is also possible to do the searches simultaneously if the central computer is capable of establishing multiple simultaneous connections.
  • the system displays, to the user, the number of items retrieved from each of the databases, and gives the user the opportunity to view these items in detail.
  • the system may simply re-con ⁇ nect to the indicated database, search for the indicated item, and down ⁇ load the item for display to the user. It is also possible to allow the user to browse through the retrieved item while the central computer is still connected to the database.
  • the system provides a standard ⁇ ized set of commands for searching in databases.
  • "com ⁇ mands” mean the instructions given by the user to guide the operation of a search. Commands are used to generate searches, display results, re ⁇ view the sets of documents retrieved, terminate the search session, etc.
  • the system translates the standardized commands en ⁇ tered by the mser into the commands appropriate to the specific database selected by the user.
  • the central computer establishes the con ⁇ nection with the selected database, and the user does not directly deal with that database.
  • the standardized command set can be used even in the case where the sys ⁇ tem chooses the database for the user. Also, the user may select a data ⁇ base but leave the translation of both the search request and the com ⁇ mands to the system. In the latter case, the user need not know anything about the database other than its name.
  • the system guides the user in reformulating a search, in those cases where a search produces no postings or too many postings.
  • the system identifies words of the search request which were not found in any document, and gives the user the op ⁇ portunity to modify or delete those words.
  • the system provides suggestions to the user for broadening the search.
  • the system pro ⁇ vides the user with suggestions on how to narrow the search, such as by imposing more stringent field or proximity restrictions on portions of the search request.
  • the invention also includes a system and method for determining which retrieved documents are likely to be the most relevant. This method is performed without browsing through the actual documents. In ⁇ stead, the system considers, for example, the fields in which the search terms were found. Thus, if a search term is found in the title of the document, it is likely that the document is more relevant than if the term appeared only in an abstract.
  • Figure 1 is a block diagram showing a possible configuration of the system of the present invention.
  • Figure 2 is a flow chart showing the operation of the embodiment of the invention wherein the system executes a search automatically in each of several databases.
  • Figure 3 is a block diagram showing an alternative configuration of the system, wherein the central computer can perform searches simultane ⁇ ously in more than one database.
  • Figure 4 is a chart containing a hypothetical standardized set of commands, used for searching in databases, and showing the meaning of each command.
  • Figure 5 is a flow chart illustrating an embodiment of the invention which assists the user in reformulating a search that has retrieved too few or too many documents.
  • Figure 1 1s a block diagram showing one possible configuration of the system of the present invention.
  • the system may be operated from terminal 5, having modem 7.
  • Terminal 5 may be simply a personal comput ⁇ er, or it may be a so-called dedicated terminal (or "dumb terminal"), whose sole capability is to transmit and receive information.
  • Modem 7 also converts incoming analog signals from line 9 into sig ⁇ nals having digital format, which can then be interpreted by terminal 5.
  • Switching unit 11 allocates the call to one of a plurality of identically-programmed computers 13, each computer having an incoming modem 15.
  • Switching unit 11 is of standard design, and is pre-programmed to indicate to the caller if all available comput ⁇ ers are currently busy.
  • computers 13 com ⁇ prise identical central processing units (CPUs) which may be conveniently arranged in the same housing. Other arrangements are possible, however.
  • Each computer is connected to an outgoing modem 17.
  • Outgoing modems 17 are also of standard design, and are equipped with automatic telephone dialing mechanisms.
  • the computers 13 can establish communication, via telephone lines 19, with one of many remote commercial databases, illustrated symbolically as blocks 20, 21, 22, 23, and 24.
  • connection between terminal 5 and computer 13 may include a conventional analog connection between the terminal and a given node in a telecommunications network, and a digital connection between that node and one of computers 13. Such alternative arrangements are within the scope of the present invention.
  • Each of the databases 20-24 represents, in general, a separate and independent computer system at some remote location.
  • the information stored in each database may be kept on disks or other storage media, and the searching through the respective databases is governed by the partic ⁇ ular computer system for that database.
  • the system of the present inven ⁇ tion assumes the existence of such multiple databases, and operates with any or all of them, regardless of the specific configuration of the com ⁇ puter system of each database. The system communicates with these data ⁇ bases in the same manner as would any other user of such databases.
  • Master CPU 30 is a computer which controls the overall operation of the system.
  • the master CPU continually checks that all computers 13 are operating properly. It also can be programmed, in conjunction with switching unit 11, to guide an incoming user call to the appropriate com ⁇ puter.
  • the master CPU 30 also serves to direct the various computers to retrieve information from disk 31, or to print billing information on printer 32.
  • the components shown between dotted lines 1 and 2 indicate the com ⁇ ponents which are "local”, i.e. grouped at a central location.
  • the com ⁇ ponents shown outside the dotted lines are "remote", i.e. located else ⁇ where. Normally, terminal 5 will be located at a substantial distance from the system, possibly thousands of miles away.
  • Databases 20-24 are also, in general, found in computers located in other places. However, it should be understood that it is possible that one or more of the "re ⁇ mote" components could be physically located near the central location, without changing the manner of operation of the system. Thus, it is pos ⁇ sible to operate the system from a terminal located near computers 13. It is also possible that one or more of databases 20-24 could be located in the same general area.
  • the invention will operate regardless of the physical locations of the remote terminal and the databases. Also, the number of CPUs can be varied. The invention will work with even one cen ⁇ tral CPU, having a modem connected to the line coming from the user, and another modem connected to a line connected to a database.
  • the databases 20-24 can be individual databases, or they can be database families. In the latter case, the system determines which database to search, within a particular family.
  • U.S. Patent No. 4,774,655 describes several methods of using the system illustrated in Figure 1.
  • computers 13 are programmed to display, to the user, one or more menus from which the user selects an area of interest.
  • the user selects one item from each such menu, whereupon the system automatically chooses one database for search ⁇ ing.
  • the choice of a database is made from a stored "decision tree" which associates one database with every possible combination of menu selections by the user.
  • the user then enters a search request, which comprises one or more words, connected by logical operators.
  • the aim of the search is to find documents which contain the indicated logical com ⁇ bination of words in the search request.
  • the words of the search request can also be connected by non-Boolean operators, and all such alternatives should be considered equivalents for purposes of this invention.
  • the system described in the cited patent then automatically dials the telephone number of the selected database, and establishes a connec ⁇ tion, using an identification number and password which has been stored in memory. Note that it is the system, not the ultimate user, which is the customer of the database.
  • the system automatically translates the search request into the search syntax of the selected database, and transmits the search request to the database.
  • the system may then down ⁇ load, to its own memory, some or all of the results of the search, and terminates the connection with the database.
  • the system displays the downloaded results of the search to the user.
  • This displaying step may include allowing the user to browse electronically through one or more articles or other documents retrieved in the search. This browsing is therefore done after the system is disconnected ("off-line") from the database.
  • the system can also print a bill, using printer 32, based on credit card information previously supplied by the user. Note that the user need establish only one account, i.e. a credit card account, to gain access to a wide variety of databases. The individual databases do not "see" the user as a customer.
  • the database selection step may therefore also include the step of choosing a host on which to search.
  • the choice of a host can be made in the following manner. For each database, the system stores a list of all hosts on which that database is available. The hosts on each list are ranked in a predetermined order which is based on considerations of economics and efficiency. When the system decides to search a particular database, it will attempt to gain access to that database through the first host on the 11st. If the first choice host is busy or otherwise unavailable, the system then tries to connect to the second host on the list, and so on until a connection to a host is made.
  • the rankings of hosts may vary depending on the time of the day or the day of the week, or on other factors. For example, the host which is least expensive on a weekday may not be the least expensive on a weekend.
  • the system can be programmed with several lists of hosts, for each database, each list being appropriate to a different time or day. Before connecting to a database, the system checks the time of day and/or day of the week, and then refers to the ranking of hosts which is appropriate to the day or time. Thus, the system is not limited to a single ranking of hosts for each database.
  • the criteria for ranking hosts can vary, and need not be limited to considerations of economics. It may be that, for a given database, a particular host is most efficient, despite the fact that it is not the least expensive. Also, the hosts can be chosen by means other than from predetermined lists or rankings.
  • the system can connect to a host other than the first choice on the list.
  • the first choice host may be temporarily busy or unavailable.
  • the data telecommunica ⁇ tions network, or "common carrier”, which links the system to a database or host may be unavailable.
  • the system may have exhausted its supply of available passwords for a particular host, due to a large num ⁇ ber of users"doing searches on databases in that host, and it may be im- possible to obtain further access to the host for this reason.
  • the user is unaware of which host the system chooses.
  • Figure 2 is a flow chart illustrating the operation of an embodiment of the present invention which is a modification of the procedures de ⁇ scribed above and in the cited patent. It is assumed, in Figure 2, that the user has already established connection with the system, and has transmitted satisfactory credit card Information, or other identifica ⁇ tion. Then, in block 40, the user begins by selecting an area of inter ⁇ est from a menu displayed by the system. In block 42, the user enters a search request. The functions of blocks 40 and 42 could be performed in reverse order, if desired. In block 44, the system chooses a set of databases in which searches will be made. The system stores, in its mem ⁇ ory, a data file comprising a list of areas of interest and databases.
  • the system associates a set of databases, all of which contain information relating to that area.
  • This set of databas ⁇ es is fixed for each possible selection of area of interest by the user.
  • the user need not know, in advance, what databases to search, and the selection of databases is entirely auto ⁇ matic.
  • the system searches through each database in the selected set.
  • Each search may be performed in substantially the same manner as described for an individual search in the cited patent. That is, the system automatically translates the search request to conform to the syntax of the database being searched, establishes connection with the database (transmitting an appropriate identification number and pass ⁇ word, if necessary), and transmits the translated search request to the database. If a particular database is a member of a family, the system gains access to the family, choosing that family according to the method described above, and then transmits the correct database Identifier, so as to connect to the desired database.
  • the system does not immedi ⁇ ately present retrieved documents to the user. Instead, the system tem ⁇ porarily stores the number of documents, or other items, retrieved from the database and disconnects from the database. The process is repeated for all of the databases in the set to be searched.
  • the system After searching through the entire set of selected databases, the system displays, to the user, in block 48, a summary showing the name of each database that has been searched and the number of items retrieved from each database.
  • the system then asks, in test 50, if the user wants to browse through any or all of the retrieved items. If the answer is no, then the program will stop, as shown in block 52. If the answer is yes, the system accepts a choice, from the user, in block 54, indicating which item(s) from which database should be viewed.
  • the user's choice, made in block 54, can include a direction to download all documents retrieved from one of the databases.
  • this alternative can be very expensive if the number of documents is large, as the database charge is based, in part, on the number of items downloaded.
  • the system is preferably programmed to allow the user to specify which documents from each database should be displayed. For example, the user may specify the desire to view "document numbers 3-6", or some other subset, retrieved from "database number 6". Many other equivalent retrieval schemes can also be used.
  • the system reconnects to the selected database, in block 56.
  • the system translates and executes the search again, in block 58, but this time the system down- loads the selected items to its memory, as indicated in block 60, and terminates the connection with the database.
  • the system displays the retrieved documents to the user.
  • the function of block 62 may include an interactive display, allowing the user to browse electron ⁇ ically through the document(s). During this displaying step, the user remains connected to the system but the system is not connected to any database, so no additional database charges are Incurred.
  • the program can return to test 50, and the document-viewing process can be repeated.
  • the system searches a set of data ⁇ bases one at a time. It is also possible to search through many databas ⁇ es virtually simultaneously. However, in the latter case, it would be necessary for each computers 13 to be connected, through separate modems, to a plurality of databases. Alternatively, computers 13 can be arranged as one larger computer. In either case, the computer would be subdivided into separate processors, or otherwise programmed on a time-sharing basis, so that signals could be passed back and forth between the system and each of several databases virtually simultaneously. The computer could also remain connected to all of the databases continuously, espe ⁇ cially if there is a constant stream of search requests covering all or most of the available databases or database families.
  • FIG. 3 illustrates one possible arrangement for the latter embodi ⁇ ment.
  • Terminal 70 is connected, by a telephone line, to CPU 72, which can establish connections with databases such as 78, 80, and 82.
  • Dotted lines 74 and 76 indicate the boundaries between the system and the compo ⁇ nents external to the system.
  • CPU 72 can send data through any of modems 84, 86, 88, 90, and 92.
  • the CPU automatically connects to as many modems as it needs to do the simultaneous searches. The remaining modems can be used for the searches of another user.
  • the number of modems can be varied; if a single large CPU is used for all users, the number of modems will be quite large, in general.
  • the system needs to search through two or more databases on the same host, the searches can be done sequentially, using the modem which is connected to that host.
  • the CPU can appropriate another modem, so that two modems may be connected to the same host (but not the same database) at the same time.
  • Dotted lines 94 and 96 symbolically illustrate the variability of the number of modems being employed for a given user at any one time.
  • the CPU is searching only three databases, and modems 90 and 92 are not being used. But one or both of these modems could have been used if it had been necessary to search four or five databases. What is important is that the CPU be programmed to take com ⁇ mand of the number of modems necessary to establish the desired simulta ⁇ neous connections with databases and/or hosts.
  • the selection of an area of interest, in block 40 need not be done with only one menu.
  • the system can be programmed to display one or more further menus, in response to the user's previous selection.
  • the system always selects two or more databases to be searched, and the databases searched are a function solely of the user's responses to the menu(s).
  • the selection made in block 40 can also be done without conventional menus. Any other means by which the user can indicate, to the system, an area of interest is included within the scope of block 40 of Figure 2, and should be deemed an equivalent.
  • Another embodiment of the present invention provides an additional level of standardization for database searching.
  • the user employs one standardized set of commands for conducting searches in disparate databases.
  • it 1s helpful to explain the concept of a "command set" for database searching.
  • each commercial database, or family of databas ⁇ es has its own rules of syntax which governs the construction of search requests.
  • Each set of rules of syntax includes a set of logical opera ⁇ tors used to connect the terms of a search request. Examples of rules of syntax for various databases or database families are given 1n U.S. Pat ⁇ ent No. 4,774,655.
  • the user transmits commands selected from a standardized set.
  • the system translates the search request into the equivalent command applicable to the database or database family being searched.
  • Figure 4 is a table showing a hypothetical standardized set of com ⁇ mands, with a brief indication of the meaning of each.
  • the table also shows the equivalents of each command in the BRS, DIALOG, and VU/TEXT families of databases.
  • the hypothetical standardized com ⁇ mand which directs the system to search the database is "FIND".
  • DIALOG the corresponding command is "SELECT”.
  • BRS it is "..SEARCH”.
  • SHOW is used to display the results of a search.
  • the corresponding BRS command is "..PRINT"
  • the DIALOG command is "TYPE”
  • STOP ends the searching session; the corresponding command in BRS is “..OFF” and in DIALOG it is “LOGOFF”.
  • the system of the present invention stores, in a suitable memory device, a table of the type shown in Figure 4.
  • the system selects the corresponding command from the command set for the database or database family being searched, and transmits that command to the database.
  • the user need not "see” the actual command which is transmitted to the database. It is as if the user is communicating directly with the database using only the standardized commands.
  • the "FIND" command is followed by a search re ⁇ quest.
  • the system translates the word “FIND” into the appropriate com ⁇ mand, and may also translate the search request itself into the syntax of the particular database.
  • the command "SHOW” may be followed by a specification of the documents desired to be displayed.
  • the system is programmed to display an error message to the user if the command does not include the required number or type of pa ⁇ rameters.
  • the present invention resides not in the specific commands which form the standardized command set, but in the concept of providing stand ⁇ ardization. Any other choice of a standardized command set could be used.
  • the standardized command set can be combined with other variations of the invention.
  • one way of using the standardized com- mands is in the embodiment of the cited patent wherein the user selects a particular database.
  • the user may know the names and general coverage of a variety of databases, but may not know (and may not want to learn) the command sets appropriate for each database.
  • the system of the present invention can therefore translate standardized commands Into the commands recognized by the selected database or database family.
  • the user con ⁇ ducts the search while the system remains connected, or "on-line", with respect to the database. From the viewpoint of the user, it is as if he or she is directly connected to that database.
  • the user employs the standardized command set, while the system acts as an Intermediary, translating all commands into the commands appropriate to the particular database.
  • the system would then reconnect to the database, perform the search again, download the requested document, dis ⁇ connect from the database, and display the document to the user.
  • the above-described arrangement could be used with the embodiment of Figure 2, wherein multiple databases are searched, as well as with the embodi ⁇ ments of the cited patent, which involves searching in a single database at one time.
  • the above-described arrangement could also be done where the system remains connected to the database, and wherein it is not nec ⁇ essary to disconnect and reconnect.
  • the user does not know the syntax of a particular database, but does know the command set. This and other simi ⁇ lar alternatives are within the scope of the invention.
  • the concept of a database "language” also includes commands relating to field structures.
  • the documents stored in most databases are arranged in segments or "fields". It is usually possible to search a database by fields. Thus, one can request all documents con ⁇ taining the name "Smith” in the "author” field.
  • each database contains a different set of fields, and each database or database family uses a different set of field tags.
  • the invention also includes providing a standardized set of field tags.
  • the system can then translate the command into the format appro ⁇ priate to the database being searched.
  • the system can be pro ⁇ grammed to translate the operand (e.g. "SMITH”) into an format appropri ⁇ ate to a particular database.
  • some databases contain author information with the last name first, and others place the last name last.
  • the invention thus includes at least three possible levels of trans ⁇ lation.
  • the system can translate search requests, commands, and field commands. Any combination of these types of translations can be incorpo ⁇ rated into a given embodiment of the invention.
  • the user not the system, selects a data ⁇ base, wherein the user is expected to know the search syntax of that database, but wherein the system translates search commands from a stand ⁇ ardized set.
  • This variation is appropriate for sophisticated users who are familiar with the search syntax of their favorite databases, but who do not want to memorize different sets of search commands.
  • One can even provide standardization of field commands only.
  • the invention is not limited to the above-described combinations, however.
  • Postings occurs when a word in the search request is found in a document contained in the database.
  • the term "document” is used herein in a general sense, and includes a record retrievable from a database, whether it be an arti ⁇ cle, a patent, or information recorded in any other format.
  • a search which yields no postings is usually of no value to the user, unless the user is trying to verify that a document does not exist.
  • a search which yields a large number of postings, e.g. more than one hundred, is almost as valueless, because it is usually not economical to browse through all the retrieved documents.
  • the database (or host) provides a step-by-step display of search postings. That is, suppose that the search request is "DESKTOP AND PUBLISHING". Then the host will display not only the number of documents in the database con ⁇ taining both "DESKTOP" and “PUBLISHING", but will also separately display the number of documents containing "DESKTOP" and the number containing "PUBLISHING". Many of the databases now available provide this kind of intermediate display.
  • Searches which yield no postings can be grouped into one of two cat ⁇ egories.
  • Condition A at least one of the units of the search request is not found in the database. It is as ⁇ sumed that any given search can be represented as one or more "units", or groups of terms, connected by an "AND” operator or its equivalent.
  • a unit can be a word, or a group of words joined by the "OR” operator.
  • search requests "(ENERGY OR CRISIS) AND (OIL OR PETROLEUM) AND (OIL OR PETROLEUM)" and "ENERGY AND PETROLEUM” both contain two units.
  • the search request includes four words, joined by "AND”, (or, alternatively, by a proximity connector specifying that the words must be within a certain number of words of each other), and 1f, say, the third word 1s not in the database
  • the search will yield no postings because of that third word.
  • the system will know that it was the third word which caused the search to fail.
  • the system displays a message to the user, stating that there were no postings for this word.
  • the system asks the user if the word is spelled correctly or whether 1t was entered in the proper format.
  • format it is meant that the word may have been restricted to a particular field (e.g. title, author, or abstract), and the field designator may be incorrectly entered. If the spelling or format is incorrect, the user can enter the corrected term, and the search will be executed again.
  • the system gives the user suggestions for modifying the search.
  • These suggestions can include 1) entering a related term instead of the term that caused the search to fail, 2) relaxing field restrictions (e.g. searching for all occurrences of the word instead of limiting the search to the title, author, or ab ⁇ stract), and 3) deleting the term from the search.
  • the user may also choose to abandon the search at this point.
  • the suggestions are prefer ⁇ ably arranged in a menu, and the user can easily make one or more choic ⁇ es.
  • the system would attempt to resolve each problem separately before the search is resubmitted. That is, the system performs all the opera ⁇ tions described above for each search term or unit which yielded no post ⁇ ings, before the search can be performed again.
  • Condition B The second category of searches which yield no postings is called "Condition B".
  • Condition B all of the units of the search request yield postings, but the full search request yields no postings.
  • the reason for the null result is that the restrictions on the units of the search request are too strict.
  • the system first displays to the user the number of postings for each of the intermediate steps of the search, so that the user has the opportuni ⁇ ty to make changes.
  • one or more search terms can be modified, replaced, or de ⁇ leted.
  • the field restrictions can be changed or modified. For example, the user may decide not to restrict the search to occurrences of the terms in the title.
  • the user can also relax the combination restrictions. Thus, for example, if the system had chosen to try “LEVER ⁇ AGED (2W) BUYOUT", meaning that it searched for documents in which "LEV ⁇ ERAGED" occurs within two words of "BUYOUT", the user may want to relax this restriction to a larger "window”. The user could even decide to replace the search with "LEVERAGED AND BUYOUT", which will retrieve all documents containing both words in any location within the document. Finally, the user is also given the opportunity to abandon the search entirely.
  • search syntax are only hypothetical.
  • the system could use any other means of informing the user about the restrictions that were initially and automatically placed on the search, and can then give the user the chance to relax such restrictions.
  • the sys ⁇ tem identifies the probable points at which the search failed, and offers options for correction.
  • the system is programmed to display to the user only the options appropriate to a particular search. Thus, for example, if no field re ⁇ strictions were entered by the system, the user would not be given the option of broadening the field restrictions. If there are no proximity connectors in the search, the user would not be asked to broaden them. Also, if the search request consists of only one term, the option to de ⁇ lete a term would not be presented.
  • Condition A prob ⁇ lem and then encounter Condition B.
  • Resolution of either Condition A or Condition B could result in the problem of too many postings, to be discussed below.
  • the system should be programmed to place a limit on the number of failed searches that can be performed by one user in one session.
  • the system presents a menu of the following choices to the user.
  • the user can choose to view the first ten documents, usually arranged in reverse chronological order, with the most recent items first.
  • the user can be given the opportunity to add terms to the search, thereby narrowing its scope.
  • the user can narrow the search by limiting one or more search words to a particular field, or by tightening proximity connectors. For example, instead of searching for "MONOCLONAL" and "ANTIBODIES", the user could search for only those items that contain these words in the title.
  • the user could search for only those items in which these words are no more than two words apart.
  • the user can be given the option of viewing the ten most recent items resulting from one term or phrase of the search request.
  • the user can also be given the option of abandoning the search entirely.
  • Figure 5 is a flow chart which summarizes the embodiment described above, for assisting the user in the case of no postings or too many postings.
  • the system executes a search in block 120. If there are no postings, as determined in test 122, the system determines, in test 124, whether each search term (or each unit of a search request) generated postings. If the answer is no, the system displays the number of post ⁇ ings for each unit, in block 126, and displays a menu of choices to the user, in block 128. The user is given the option of abandoning the search. Test 130 determines whether the user wants to abandon, and, if so, the system stops in block 132. If the user wants to modify the search, the modification is done in block 134, and the search is executed again.
  • the system pro ⁇ ceeds through ** blocks 136 and 138, and test 140, in similar fashion.
  • the user may abandon the search, in block 142, or enter a modification, in block 144.
  • Block 148 can include any other dis ⁇ play steps that may be desired. If there are too many postings, then the system displays the number of postings in block 150 and asks the user for a choice, in block 152. The user may abandon the search, through test 154 and block 156, or may modify the search in block 158.
  • Another embodiment of the invention is useful both in the case of too many postings and in the case of a "successful" search.
  • the system ranks the retrieved documents in order of apparent relevance. The ranking is done without actually browsing through the documents.
  • the principle used in ranking retrieved documents can be illustrated with a simple example.
  • the user wants information on laptop com ⁇ puters.
  • the system searches for documents containing "LAPTOP" and "COM ⁇ PUTERS" within two words of each other.
  • 87 documents are retrieved.
  • the search is narrowed by specifying that both words must appear in the title.
  • the number of documents may be reduced to 23.
  • the search is again narrowed by specifying that the words be within one word of each other.
  • the number of retrieved documents is reduced to 12. It is very likely that these 12 documents are the most relevant of the original 87.
  • a hypothetical set of criteria could be as follows. First, a docu ⁇ ment can be ranked according to whether a search term appears in the ti ⁇ tle and descriptor, the title alone, the descriptor alone, or the ab ⁇ stract.
  • the "descriptor" field is a field containing key words from the document, and is provided in many databases. Thus, if a search term is found in both the title and the descriptor field of a document, the docu ⁇ ment is considered the most relevant. If the term is found only in the abstract, the document is considered the least relevant.
  • the relevance of a retrieved document is related to the number of words of the search request which appear in the same field of the document.
  • a document in which all of the words of the search request appear in the title is more relevant than one in which the words of the request appear in different portions.
  • a retrieved document is con ⁇ sidered most relevant if the words are adjacent, and least relevant if the words are far apart.
  • An algorithm can therefore be constructed which ranks the documents retrieved.
  • the simplest such algorithm simply examines the field in which a search term appears (e.g. title, abstract, etc.), and ranks the documents as described above.
  • a more complex algorithm takes into ac ⁇ count the other criteria described above. It has been found, in prac- tice, that only a relatively small number of indicators of relevance are necessary. The most useful criterion is to determine the field in which the search term appears. The number of terms appearing in a given field has also been found to be a useful criterion of relevance.
  • the system in order to rank the documents by relevance, it may be necessary for the system to perform a search, for each given word, more than once.
  • the system might search for occurrences of a given word, and then might repeat the same search, limiting the second search to, say, the "title" field.
  • the cost of search time is usu ⁇ ally relatively small compared to the cost of displaying search results.
  • a method of ranking retrieved documents according to assumed rele ⁇ vance can be summarized as follows.
  • the system per ⁇ forms a search for documents containing the words "PARALLEL”, “PROCES ⁇ SING", and "COMPUTERS”. It also performs searches for documents which contain the ' above-mentioned words in the title of the document. These extra searches are not "seen” by the user.
  • the corresponding numbers are 100, 150, and 300, respectively, and that ten documents contain all three words in the title.
  • the system has actually done eight searches, i.e. two searches for each word separately, and two searches for the combination of the three words.
  • the ten documents last retrieved are likely to be the most relevant.
  • the search results are far more useful, as a result of the system having performed the extra search ⁇ es, as compared with the results obtained by simply searching for docu ⁇ ments containing all three words.
  • more direct search time was expended, as compared with a simple search for the three words, the search results are much more useful, and it was not necessary to display a multiplicity of documents in order to find the relevant ones.

Abstract

This invention enables a user (5) to obtain information from a large number of commercial databases (20-24). In practicing the invention, the user (5) selects an area of interest (40) and enters a search request (42). The search request (42) includes at least one word for which the user (5) desires to search. In one embodiment of the invention, the system selects a set of at least two databases (two of 20-24), automatically executes the search request (46) in each database, and presents the results to the user (5, 48). In another aspect of the invention, the user (5) selects a database directly, and employs a set of standardized commands (see figure 4) for any database selected. The system translates these standardized commands into the equivalent commands recognized by each database, without the intervention or knowledge of the user (5). The user (5) can thus communicate with a variety of databases (20-24) using the same command set (see figure 4). In another embodiment, the invention guides the user (4) in reformulating a search which retrieved either no documents or too many documents. The invention also includes a method of determining which of the retrieved documents are likely to be the most relevant.

Description

SYSTEM AND METHOD FOR RETRIEVING INFORMATION FROM A PLURALITY OF DATABASES
CROSS-REFERENCE TO PRIOR APPLICATIONS This is a continuation-in-part of U.S. Patent Application Serial No. 231,055, filed August 11, 1988, entitled "System for Retrieving Informa¬ tion From a Plurality of Remote Databases", which is a continuation of Serial No. 664,167, filed Oct. 24, 1984, now U.S. Patent No. 4,774,655.
BACKGROUND OF THE INVENTION
This invention relates to the field of information retrieval, and especially the retrieval of information from one or more databases.
The field of information retrieval has advanced to the point where anyone having a personal computer, a modem, and access to a telephone line can obtain information on virtually any topic, from thousands of commercial databases, without leaving home. Many owners of large compi¬ lations of information have provided their information in the form of computer databases, and these databases can be interrogated, for a fee, by a remote user.
Groups of related and unrelated databases are sometimes arranged in "families". A single vendor, or "host", may provide access to a family of databases, all of which can be reached through the same telephone num¬ ber, and with the same identification number and password. For example, Dialog Information Services, Inc., of California, provides access to a large family of databases, dealing with many different subjects, under the service mark DIALOG. In this specification, the term "family" means a set of related or unrelated databases available on one particular host. As used herein, the terms "family" and "host" are virtually interchange¬ able. To use each database, one must know, in general, a telephone number, an identification number, and (in most cases) a password. One must also know the "language" of the database being searched. As applied to data¬ base searching, the term "language" includes at least three aspects. First, a database language includes the particular syntax applicable to search requests in that database. A search request includes a word or group of words, connected by various logical (typically Boolean) opera¬ tors (e.g. AND, OR, NOT, etc.). The user transmits a search request to a database in order to retrieve all items in the database which contain the specified logical combination of terms of the search request. In gener¬ al, different database families have different rules of search syntax.
Secondly, a database language includes a command set. As used in this specification, the term "command set" means a group of commands used to conduct searches in a database. A command is needed to direct the performance of each aspect of database searching. Thus, for example, a command is needed to tell the system to begin the search, and another command is needed to direct the system to display the results of a search.
Thirdly, a database language includes the field structure of the database. Each database is arranged by fields, i.e. searchable segments of documents. For example, one might want to search for articles in a database by searching according to authors or titles or abstracts. The "author", "title", and "abstract" segments of a document are called "fields". In general, each database provides a different set of fields which can be searched, and a different set of "field tags" which identify particular fields.
In theory, each database could have a different syntax, a different command set, and a different field structure, since each database is, in general, created by a different entity. Thus, to use all of the hundreds or thousands of available databases, it would be necessary to learn all of the rules for syntax, command sets, and field structures for each one.
The problem of nonunifor ity of database languages was ameliorated somewhat by the creation of database families, such as DIALOG, mentioned above. Within a particular family, the syntax and command sets are gen¬ erally the same for each database. As long as a user wishes to search only within one database family, the user needs to know only one search syntax and one command set. Of course, the user would also need a data¬ base catalog, in order to know what databases are available.
However, even within a given database family, the field structures of the various databases are, in general, very different. Moreover, if one desires to search in other families of databases, or in single data¬ bases not belonging to a family, one must, in general, learn a new search syntax, a new command set, and a new field structure. It is also necessary to establish a separate account with each different database or database family. And, of course, it is necessary for the user to know, in advance, what databases are available and what they contain.
Database searching presents additional problems not discussed above. Frequently, a search yields no documents, or "postings". The null result could be due to misspelled words in the search request, or to other fac¬ tors. Just as frustrating is the case where a search produces too many postings. It is tedious and expensive to browse through one hundred or more documents retrieved from a database search. Even if the number of postings is not unreasonably large, it is still necessary to browse through them individually to determine which documents are most relevant. The fees incurred for viewing documents can often exceed the costs of performing the search, because most databases or hosts charge a fixed fee for each document displayed to the user. Search systems of the prior art have not provided an efficient way of determining, in advance, which re¬ trieved documents are likely to be the most relevant. The present inven¬ tion is therefore also directed to the solution of the above-mentioned problems.
U.S. Patent No. 4,774,655, the disclosure of which is incorporated by reference herein, addresses the problems of nonuniformity among data¬ bases and database families. The system described in that patent is an intermediary between the user and the databases being searched. The sys¬ tem described in the patent automatically selects a database in which to search, and then performs the search. The user does not make direct con¬ tact with the database, and does not need to know the particular syntax of the database, or even its identity. The system accepts, from the user, a search request, written in a simple and standardized format, and automatically translates the search request into the syntax appropriate to the database in which the system has chosen to search. The system minimizes the time that it is directly connected ("on line") to the data¬ base, and presents the search results to the user after the connection with the database has been terminated. In an alternative embodiment, also described in the patent, the user selects a database directly, but still uses the system as an intermediary, thereby avoiding the need to establish a separate account with each database. It is also possible to take advantage of the ability of the system to translate a search request into the syntax of a selected database or database family, while still maintaining full control over the identity of the database or family se¬ lected. The present specification discloses enhancements to the basic system described in the cited patent. One of these enhancements allows a user to search through a large number of databases in one session. Another embodiment improves on the standardization described in the cited patent, by enabling the user to search through many different selected databases using a standardized set of commands. Other features reduce the likeli¬ hood of null search results, and also deal with the problem of too many retrieved documents. Another feature provides an economical method of determining, in advance, which retrieved documents are likely to be the most relevant. With these features, the system is even more convenient to use than systems of the prior art, and database searches are more likely to yield meaningful results.
SUMMARY OF THE INVENTION
The present invention enables a user to obtain information from one or more databases. The user communicates with a central computer, pro¬ grammed according to the invention, and the computer establishes direct communication with the databases. In general, the user is not located at the same place as the central computer, and the central computer is not at the same location as the various databases. However, the invention is not limited by the distances between the user, the central computer, and the databases.
In one aspect of the invention, the user first transmits an area of interest and a search request to the central computer. The area of in¬ terest can be selected from a menu of available subject areas. The search request includes one or more words, connected by logical opera¬ tors, such as Boolean operators or proximity connectors. The object of the search is to locate all documents which contain the indicated logical combination of words of the search request. The central computer executes the search request in each of a set of databases associated with the user's area of interest. The databases in this set are predetermined, and are stored in the memory of the central computer, one set being asso¬ ciated with each area of interest. The databases may be located on dif¬ ferent hosts, or they may reside on the same host. The central computer establishes communication separately with each database, translates the search request into the syntax of that database, if necessary, and exe¬ cutes the search. Normally, the searches are done sequentially, in each database, but it is also possible to do the searches simultaneously if the central computer is capable of establishing multiple simultaneous connections. After completing the searches, the system displays, to the user, the number of items retrieved from each of the databases, and gives the user the opportunity to view these items in detail. When the user wishes to view a specific retrieved item, the system may simply re-con¬ nect to the indicated database, search for the indicated item, and down¬ load the item for display to the user. It is also possible to allow the user to browse through the retrieved item while the central computer is still connected to the database.
In another aspect of the invention, the system provides a standard¬ ized set of commands for searching in databases. As used herein, "com¬ mands" mean the instructions given by the user to guide the operation of a search. Commands are used to generate searches, display results, re¬ view the sets of documents retrieved, terminate the search session, etc. In this embodiment, the system translates the standardized commands en¬ tered by the mser into the commands appropriate to the specific database selected by the user. Thus, although the user may want to select a par¬ ticular database for searching, the user need not learn the particular command set for that database. The central computer establishes the con¬ nection with the selected database, and the user does not directly deal with that database.
The latter embodiment may be combined with other embodiments. Thus, the standardized command set can be used even in the case where the sys¬ tem chooses the database for the user. Also, the user may select a data¬ base but leave the translation of both the search request and the com¬ mands to the system. In the latter case, the user need not know anything about the database other than its name.
In another embodiment of the invention, the system guides the user in reformulating a search, in those cases where a search produces no postings or too many postings. The system identifies words of the search request which were not found in any document, and gives the user the op¬ portunity to modify or delete those words. In cases where each word of the search request yielded postings, but the overall request did not yield any postings, the system provides suggestions to the user for broadening the search. In the case of too many postings, the system pro¬ vides the user with suggestions on how to narrow the search, such as by imposing more stringent field or proximity restrictions on portions of the search request.
The invention also includes a system and method for determining which retrieved documents are likely to be the most relevant. This method is performed without browsing through the actual documents. In¬ stead, the system considers, for example, the fields in which the search terms were found. Thus, if a search term is found in the title of the document, it is likely that the document is more relevant than if the term appeared only in an abstract.
It is therefore an object of the invention to provide a system and method for retrieving information from databases, wherein a user's search request is automatically executed in a group of databases selected by the system.
It is another object to provide a method and system which chooses a group of databases which are relevant to an area of interest selected by a user.
It is another object to provide a method and system wherein a search can be executed in several databases simultaneously.
It is another object to provide a method and system which includes automatic selection of a host, where the desired database is found on more than one host.
It is another object to enable a user to communicate with a plurali¬ ty of commercial databases using a standardized set of commands.
It is another object to provide a system wherein the standardized set of commands includes a standardized set of field "tags".
It is another object to provide a system wherein the user can choose a particular database, but wherein the user does not need to learn the search syntax or command set for that database, and wherein the user does not need to establish a separate account with that database.
It is another object to provide a method and system for alleviating the problems associated with too few or too many documents retrieved in a database search.
It is another object to provide an economical method of evaluating which documents, retrieved from a search, are likely to be the most rele¬ vant. It is another object to increase the ease and effectiveness of data¬ base searching.
Other objects and advantages of the invention will be apparent to those skilled in the art from a reading of the following brief descrip¬ tion of the drawings, the detailed description of the invention, and the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram showing a possible configuration of the system of the present invention.
Figure 2 is a flow chart showing the operation of the embodiment of the invention wherein the system executes a search automatically in each of several databases.
Figure 3 is a block diagram showing an alternative configuration of the system, wherein the central computer can perform searches simultane¬ ously in more than one database.
Figure 4 is a chart containing a hypothetical standardized set of commands, used for searching in databases, and showing the meaning of each command.
Figure 5 is a flow chart illustrating an embodiment of the invention which assists the user in reformulating a search that has retrieved too few or too many documents. DETAILED DESCRIPTION OF THE INVENTION
Figure 1 1s a block diagram showing one possible configuration of the system of the present invention. The system may be operated from terminal 5, having modem 7. Terminal 5 may be simply a personal comput¬ er, or it may be a so-called dedicated terminal (or "dumb terminal"), whose sole capability is to transmit and receive information. Modem 7, which is of conventional design, and which is assumed to include a tele¬ phone and dialing mechanism, converts the digital signals from the termi¬ nal 5 into analog signals suitable for transmission over telephone line 9. Modem 7 also converts incoming analog signals from line 9 into sig¬ nals having digital format, which can then be interpreted by terminal 5.
When modem 7 dials the telephone number of the system, the call is answered by switching unit 11. Switching unit 11 allocates the call to one of a plurality of identically-programmed computers 13, each computer having an incoming modem 15. Switching unit 11 is of standard design, and is pre-programmed to indicate to the caller if all available comput¬ ers are currently busy. In the preferred embodiment, computers 13 com¬ prise identical central processing units (CPUs) which may be conveniently arranged in the same housing. Other arrangements are possible, however.
Each computer is connected to an outgoing modem 17. Outgoing modems 17 are also of standard design, and are equipped with automatic telephone dialing mechanisms. Through modems 17, the computers 13 can establish communication, via telephone lines 19, with one of many remote commercial databases, illustrated symbolically as blocks 20, 21, 22, 23, and 24.
The modems shown in Figure 1 can be replaced by equivalents. Thus, it is possible to provide direct digital communication between computers 13 and the ^various databases, by the use of alternative technologies which are available, such as packet assemblers and disassemblers ("pads"). Also, the connection between terminal 5 and computer 13 may include a conventional analog connection between the terminal and a given node in a telecommunications network, and a digital connection between that node and one of computers 13. Such alternative arrangements are within the scope of the present invention.
Each of the databases 20-24 represents, in general, a separate and independent computer system at some remote location. The information stored in each database may be kept on disks or other storage media, and the searching through the respective databases is governed by the partic¬ ular computer system for that database. The system of the present inven¬ tion assumes the existence of such multiple databases, and operates with any or all of them, regardless of the specific configuration of the com¬ puter system of each database. The system communicates with these data¬ bases in the same manner as would any other user of such databases.
Connected in parallel to all the computers 13 is master CPU 30. Master CPU 30 is a computer which controls the overall operation of the system. The master CPU continually checks that all computers 13 are operating properly. It also can be programmed, in conjunction with switching unit 11, to guide an incoming user call to the appropriate com¬ puter. The master CPU 30 also serves to direct the various computers to retrieve information from disk 31, or to print billing information on printer 32.
The components shown between dotted lines 1 and 2 indicate the com¬ ponents which are "local", i.e. grouped at a central location. The com¬ ponents shown outside the dotted lines are "remote", i.e. located else¬ where. Normally, terminal 5 will be located at a substantial distance from the system, possibly thousands of miles away. Databases 20-24 are also, in general, found in computers located in other places. However, it should be understood that it is possible that one or more of the "re¬ mote" components could be physically located near the central location, without changing the manner of operation of the system. Thus, it is pos¬ sible to operate the system from a terminal located near computers 13. It is also possible that one or more of databases 20-24 could be located in the same general area. The invention will operate regardless of the physical locations of the remote terminal and the databases. Also, the number of CPUs can be varied. The invention will work with even one cen¬ tral CPU, having a modem connected to the line coming from the user, and another modem connected to a line connected to a database. The databases 20-24 can be individual databases, or they can be database families. In the latter case, the system determines which database to search, within a particular family.
U.S. Patent No. 4,774,655 describes several methods of using the system illustrated in Figure 1. In one such method, computers 13 are programmed to display, to the user, one or more menus from which the user selects an area of interest. The user selects one item from each such menu, whereupon the system automatically chooses one database for search¬ ing. The choice of a database is made from a stored "decision tree" which associates one database with every possible combination of menu selections by the user. The user then enters a search request, which comprises one or more words, connected by logical operators. The aim of the search is to find documents which contain the indicated logical com¬ bination of words in the search request. The words of the search request can also be connected by non-Boolean operators, and all such alternatives should be considered equivalents for purposes of this invention. The system described in the cited patent then automatically dials the telephone number of the selected database, and establishes a connec¬ tion, using an identification number and password which has been stored in memory. Note that it is the system, not the ultimate user, which is the customer of the database. The system automatically translates the search request into the search syntax of the selected database, and transmits the search request to the database. The system may then down¬ load, to its own memory, some or all of the results of the search, and terminates the connection with the database. The system then displays the downloaded results of the search to the user. This displaying step may include allowing the user to browse electronically through one or more articles or other documents retrieved in the search. This browsing is therefore done after the system is disconnected ("off-line") from the database. The system can also print a bill, using printer 32, based on credit card information previously supplied by the user. Note that the user need establish only one account, i.e. a credit card account, to gain access to a wide variety of databases. The individual databases do not "see" the user as a customer.
Further variations to the above-described process are explained in the cited patent. In general, each such variation uses substantially the same general arrangement of hardware, the difference being in the pro¬ gramming of the computers.
Some databases are included in more than one database family, i.e. they are present on more than one host. The database selection step may therefore also include the step of choosing a host on which to search.
The choice of a host can be made in the following manner. For each database, the system stores a list of all hosts on which that database is available. The hosts on each list are ranked in a predetermined order which is based on considerations of economics and efficiency. When the system decides to search a particular database, it will attempt to gain access to that database through the first host on the 11st. If the first choice host is busy or otherwise unavailable, the system then tries to connect to the second host on the list, and so on until a connection to a host is made.
The rankings of hosts may vary depending on the time of the day or the day of the week, or on other factors. For example, the host which is least expensive on a weekday may not be the least expensive on a weekend. Thus, the system can be programmed with several lists of hosts, for each database, each list being appropriate to a different time or day. Before connecting to a database, the system checks the time of day and/or day of the week, and then refers to the ranking of hosts which is appropriate to the day or time. Thus, the system is not limited to a single ranking of hosts for each database.
The criteria for ranking hosts can vary, and need not be limited to considerations of economics. It may be that, for a given database, a particular host is most efficient, despite the fact that it is not the least expensive. Also, the hosts can be chosen by means other than from predetermined lists or rankings.
Various factors can cause the system to connect to a host other than the first choice on the list. As mentioned above, the first choice host may be temporarily busy or unavailable. Also, the data telecommunica¬ tions network, or "common carrier", which links the system to a database or host, may be unavailable. Also, the system may have exhausted its supply of available passwords for a particular host, due to a large num¬ ber of users"doing searches on databases in that host, and it may be im- possible to obtain further access to the host for this reason. Of course, the user is unaware of which host the system chooses.
Figure 2 is a flow chart illustrating the operation of an embodiment of the present invention which is a modification of the procedures de¬ scribed above and in the cited patent. It is assumed, in Figure 2, that the user has already established connection with the system, and has transmitted satisfactory credit card Information, or other identifica¬ tion. Then, in block 40, the user begins by selecting an area of inter¬ est from a menu displayed by the system. In block 42, the user enters a search request. The functions of blocks 40 and 42 could be performed in reverse order, if desired. In block 44, the system chooses a set of databases in which searches will be made. The system stores, in its mem¬ ory, a data file comprising a list of areas of interest and databases. With each area of interest, the system associates a set of databases, all of which contain information relating to that area. This set of databas¬ es is fixed for each possible selection of area of interest by the user. Thus, as in the cited patent, the user need not know, in advance, what databases to search, and the selection of databases is entirely auto¬ matic.
In block 46, the system then searches through each database in the selected set. Each search may be performed in substantially the same manner as described for an individual search in the cited patent. That is, the system automatically translates the search request to conform to the syntax of the database being searched, establishes connection with the database (transmitting an appropriate identification number and pass¬ word, if necessary), and transmits the translated search request to the database. If a particular database is a member of a family, the system gains access to the family, choosing that family according to the method described above, and then transmits the correct database Identifier, so as to connect to the desired database.
In the embodiment of Figure 2, however, the system does not immedi¬ ately present retrieved documents to the user. Instead, the system tem¬ porarily stores the number of documents, or other items, retrieved from the database and disconnects from the database. The process is repeated for all of the databases in the set to be searched.
After searching through the entire set of selected databases, the system displays, to the user, in block 48, a summary showing the name of each database that has been searched and the number of items retrieved from each database. The system then asks, in test 50, if the user wants to browse through any or all of the retrieved items. If the answer is no, then the program will stop, as shown in block 52. If the answer is yes, the system accepts a choice, from the user, in block 54, indicating which item(s) from which database should be viewed.
The user's choice, made in block 54, can include a direction to download all documents retrieved from one of the databases. However, this alternative can be very expensive if the number of documents is large, as the database charge is based, in part, on the number of items downloaded. Thus, the system is preferably programmed to allow the user to specify which documents from each database should be displayed. For example, the user may specify the desire to view "document numbers 3-6", or some other subset, retrieved from "database number 6". Many other equivalent retrieval schemes can also be used.
After receiving a choice from the user, the system reconnects to the selected database, in block 56. As before, the system translates and executes the search again, in block 58, but this time the system down- loads the selected items to its memory, as indicated in block 60, and terminates the connection with the database. In block 62, the system displays the retrieved documents to the user. The function of block 62 may include an interactive display, allowing the user to browse electron¬ ically through the document(s). During this displaying step, the user remains connected to the system but the system is not connected to any database, so no additional database charges are Incurred. When the user is finished viewing the item, the program can return to test 50, and the document-viewing process can be repeated.
In the procedure discussed above, the system searches a set of data¬ bases one at a time. It is also possible to search through many databas¬ es virtually simultaneously. However, in the latter case, it would be necessary for each computers 13 to be connected, through separate modems, to a plurality of databases. Alternatively, computers 13 can be arranged as one larger computer. In either case, the computer would be subdivided into separate processors, or otherwise programmed on a time-sharing basis, so that signals could be passed back and forth between the system and each of several databases virtually simultaneously. The computer could also remain connected to all of the databases continuously, espe¬ cially if there is a constant stream of search requests covering all or most of the available databases or database families.
Figure 3 illustrates one possible arrangement for the latter embodi¬ ment. Terminal 70 is connected, by a telephone line, to CPU 72, which can establish connections with databases such as 78, 80, and 82. Dotted lines 74 and 76 indicate the boundaries between the system and the compo¬ nents external to the system. CPU 72 can send data through any of modems 84, 86, 88, 90, and 92. The CPU automatically connects to as many modems as it needs to do the simultaneous searches. The remaining modems can be used for the searches of another user. Clearly the number of modems can be varied; if a single large CPU is used for all users, the number of modems will be quite large, in general.
If, in the embodiment of Figure 3, the system needs to search through two or more databases on the same host, the searches can be done sequentially, using the modem which is connected to that host. Alterna¬ tively, the CPU can appropriate another modem, so that two modems may be connected to the same host (but not the same database) at the same time.
Dotted lines 94 and 96 symbolically illustrate the variability of the number of modems being employed for a given user at any one time. In the example of Figure 3, the CPU is searching only three databases, and modems 90 and 92 are not being used. But one or both of these modems could have been used if it had been necessary to search four or five databases. What is important is that the CPU be programmed to take com¬ mand of the number of modems necessary to establish the desired simulta¬ neous connections with databases and/or hosts.
The selection of an area of interest, in block 40, need not be done with only one menu. The system can be programmed to display one or more further menus, in response to the user's previous selection. However, in this embodiment, the system always selects two or more databases to be searched, and the databases searched are a function solely of the user's responses to the menu(s).
The selection made in block 40 can also be done without conventional menus. Any other means by which the user can indicate, to the system, an area of interest is included within the scope of block 40 of Figure 2, and should be deemed an equivalent.
Another embodiment of the present invention provides an additional level of standardization for database searching. In this embodiment, the user employs one standardized set of commands for conducting searches in disparate databases. Before describing this embodiment, it 1s helpful to explain the concept of a "command set" for database searching.
As explained above, each commercial database, or family of databas¬ es, has its own rules of syntax which governs the construction of search requests. Each set of rules of syntax includes a set of logical opera¬ tors used to connect the terms of a search request. Examples of rules of syntax for various databases or database families are given 1n U.S. Pat¬ ent No. 4,774,655.
It is not enough to learn the search syntax for a particular data¬ base or database family. It is also necessary to know a set of commands needed for conducting a search. Such commands are used for such func¬ tions as instructing the system to begin a search, displaying the results of a search, providing information on the types of searching available, terminating a research session, and more. In general, each database, or database family, uses a different set of commands.
In the present invention, the user transmits commands selected from a standardized set. The system translates the search request into the equivalent command applicable to the database or database family being searched.
Figure 4 is a table showing a hypothetical standardized set of com¬ mands, with a brief indication of the meaning of each. The table also shows the equivalents of each command in the BRS, DIALOG, and VU/TEXT families of databases. For example, the hypothetical standardized com¬ mand which directs the system to search the database is "FIND". In DIALOG, the corresponding command is "SELECT". In BRS, it is "..SEARCH". In the. hypothetical standardized command set, "SHOW" is used to display the results of a search. The corresponding BRS command is "..PRINT", the DIALOG command is "TYPE", and the VU/TEXT command Is "PL". In the stand¬ ardized command set, "STOP" ends the searching session; the corresponding command in BRS is "..OFF" and in DIALOG it is "LOGOFF".
The system of the present invention stores, in a suitable memory device, a table of the type shown in Figure 4. When the user enters a command from the standardized set, the system selects the corresponding command from the command set for the database or database family being searched, and transmits that command to the database. The user need not "see" the actual command which is transmitted to the database. It is as if the user is communicating directly with the database using only the standardized commands.
Some of the commands listed in Figure 4 must be used with other pa¬ rameters. For example, the "FIND" command is followed by a search re¬ quest. The system translates the word "FIND" into the appropriate com¬ mand, and may also translate the search request itself into the syntax of the particular database. In another example, the command "SHOW" may be followed by a specification of the documents desired to be displayed. Preferably, the system is programmed to display an error message to the user if the command does not include the required number or type of pa¬ rameters.
The present invention resides not in the specific commands which form the standardized command set, but in the concept of providing stand¬ ardization. Any other choice of a standardized command set could be used.
The standardized command set can be combined with other variations of the invention. For example, one way of using the standardized com- mands is in the embodiment of the cited patent wherein the user selects a particular database. The user may know the names and general coverage of a variety of databases, but may not know (and may not want to learn) the command sets appropriate for each database. The system of the present invention can therefore translate standardized commands Into the commands recognized by the selected database or database family. The user con¬ ducts the search while the system remains connected, or "on-line", with respect to the database. From the viewpoint of the user, it is as if he or she is directly connected to that database. However, the user employs the standardized command set, while the system acts as an Intermediary, translating all commands into the commands appropriate to the particular database.
Note that some of the standardized commands (such as "FIND", in the example given above), relate to the process of performing the search, whereas other commands (such as "SHOW", in the above example) relate to the displaying of results. Thus, not all of the standardized commands would be available to the user, in every variation of the invention. For example, if the system selects databases and performs searches entirely automatically, the user would never need to enter a "FIND" command, as the system would do this automatically. But the standardized command set could still be used for directing the display of search results, after the searching is concluded. After the system disconnects from the data¬ base and displays the number of documents retrieved, the user can then transmit, for example, a "SHOW" command, followed by the number of the document desired to be displayed. The system would then reconnect to the database, perform the search again, download the requested document, dis¬ connect from the database, and display the document to the user. The above-described arrangement could be used with the embodiment of Figure 2, wherein multiple databases are searched, as well as with the embodi¬ ments of the cited patent, which involves searching in a single database at one time. The above-described arrangement could also be done where the system remains connected to the database, and wherein it is not nec¬ essary to disconnect and reconnect.
In another alternative, one could provide a system in which the search request syntax is translated but the search commands are not translated. In this embodiment, the user does not know the syntax of a particular database, but does know the command set. This and other simi¬ lar alternatives are within the scope of the invention.
As stated above, the concept of a database "language" also includes commands relating to field structures. The documents stored in most databases are arranged in segments or "fields". It is usually possible to search a database by fields. Thus, one can request all documents con¬ taining the name "Smith" in the "author" field. One way of specifying a field is to use a command known as a "field tag". In the above example, one might enter "AU=SMITH". This command would cause the system to search each author field of each item in the database for the word "SMITH". In general, each database contains a different set of fields, and each database or database family uses a different set of field tags. For example, the "author", "title", and "date" fields could be represent¬ ed by "AU=", "TI=", and "DA=", respectively. Some database families em¬ ploy field tags which are suffixes, e.g. "SMITH/AU".
In addition to standardizing of commands for displaying of search results, the invention also includes providing a standardized set of field tags. When the user wants to search by author, or title, or by some other field, the user can enter the search command in a standardized format (such as "AU=SMITH") and using a standardized set of field tags. The expression "AU=SMITH" is called a "field command" in this specifica¬ tion. The system can then translate the command into the format appro¬ priate to the database being searched. Also, the system can be pro¬ grammed to translate the operand (e.g. "SMITH") into an format appropri¬ ate to a particular database. For example, some databases contain author information with the last name first, and others place the last name last. The system could be programmed to "know" that, for certain data¬ bases, the standardized command "AU=J0HN SMITH" must be translated into "AU=SMITH, JOHN".
The invention thus includes at least three possible levels of trans¬ lation. The system can translate search requests, commands, and field commands. Any combination of these types of translations can be incorpo¬ rated into a given embodiment of the invention. Thus, for example, one can provide a system in which the user, not the system, selects a data¬ base, wherein the user is expected to know the search syntax of that database, but wherein the system translates search commands from a stand¬ ardized set. This variation is appropriate for sophisticated users who are familiar with the search syntax of their favorite databases, but who do not want to memorize different sets of search commands. One can even provide standardization of field commands only. The invention is not limited to the above-described combinations, however.
One of the problems in database searching is the retrieval of too few or too many documents, or "postings". As used herein, a "posting" occurs when a word in the search request is found in a document contained in the database. The term "document" is used herein in a general sense, and includes a record retrievable from a database, whether it be an arti¬ cle, a patent, or information recorded in any other format. A search which yields no postings is usually of no value to the user, unless the user is trying to verify that a document does not exist. A search which yields a large number of postings, e.g. more than one hundred, is almost as valueless, because it is usually not economical to browse through all the retrieved documents.
In the embodiment to be described, it is assumed that the database (or host) provides a step-by-step display of search postings. That is, suppose that the search request is "DESKTOP AND PUBLISHING". Then the host will display not only the number of documents in the database con¬ taining both "DESKTOP" and "PUBLISHING", but will also separately display the number of documents containing "DESKTOP" and the number containing "PUBLISHING". Many of the databases now available provide this kind of intermediate display.
The following embodiment address the problems wherein a search yields either no postings or too many postings. The term "too many" is, of course, a subjective matter, but is defined, for purposes of discus¬ sion, as one hundred or more postings.
Searches which yield no postings can be grouped into one of two cat¬ egories. In the first category, called "Condition A", at least one of the units of the search request is not found in the database. It is as¬ sumed that any given search can be represented as one or more "units", or groups of terms, connected by an "AND" operator or its equivalent. A unit can be a word, or a group of words joined by the "OR" operator. For example the search requests "(ENERGY OR CRISIS) AND (OIL OR PETROLEUM)" and "ENERGY AND PETROLEUM" both contain two units. Thus, if the search request includes four words, joined by "AND", (or, alternatively, by a proximity connector specifying that the words must be within a certain number of words of each other), and 1f, say, the third word 1s not in the database, the search will yield no postings because of that third word. In the latter example, the system will know that it was the third word which caused the search to fail. The system then displays a message to the user, stating that there were no postings for this word. The system asks the user if the word is spelled correctly or whether 1t was entered in the proper format. By "format" it is meant that the word may have been restricted to a particular field (e.g. title, author, or abstract), and the field designator may be incorrectly entered. If the spelling or format is incorrect, the user can enter the corrected term, and the search will be executed again.
If the spelling and format are already correct, the system gives the user suggestions for modifying the search. These suggestions can include 1) entering a related term instead of the term that caused the search to fail, 2) relaxing field restrictions (e.g. searching for all occurrences of the word instead of limiting the search to the title, author, or ab¬ stract), and 3) deleting the term from the search. The user may also choose to abandon the search at this point. The suggestions are prefer¬ ably arranged in a menu, and the user can easily make one or more choic¬ es.
If there are more than one terms that would cause the search to fail, the system would attempt to resolve each problem separately before the search is resubmitted. That is, the system performs all the opera¬ tions described above for each search term or unit which yielded no post¬ ings, before the search can be performed again.
The second category of searches which yield no postings is called "Condition B". In Condition B, all of the units of the search request yield postings, but the full search request yields no postings. In many cases, the reason for the null result is that the restrictions on the units of the search request are too strict. Thus, if Condition B occurs, the system first displays to the user the number of postings for each of the intermediate steps of the search, so that the user has the opportuni¬ ty to make changes.
For example, suppose that the user wants to find articles about lev¬ eraged buyouts of large corporations. The search request entered by the user is "LEVERAGED BUYOUT". Now suppose that the system automatically translates this request as "LEVERAGED/TI AND BUYOUT/TI", i.e. the system performs the search for all documents containing the words "LEVERAGED" and "BUYOUT" in the title. The step-by-step results displayed to the user might look as follows:
500 LEVERAGED/TI 600 BUYOUT/TI 0 LEVERAGED/TI AND BUYOUT/TI That is, the system retrieved 500 documents with "LEVERAGED" in the ti¬ tle, and 600 documents with "BUYOUT" in the title, but found no documents containing both words in the title.
The user is now given several options, which can be selected from a menu. First, one or more search terms can be modified, replaced, or de¬ leted. Secondly, the field restrictions can be changed or modified. For example, the user may decide not to restrict the search to occurrences of the terms in the title. Thirdly, the user can also relax the combination restrictions. Thus, for example, if the system had chosen to try "LEVER¬ AGED (2W) BUYOUT", meaning that it searched for documents in which "LEV¬ ERAGED" occurs within two words of "BUYOUT", the user may want to relax this restriction to a larger "window". The user could even decide to replace the search with "LEVERAGED AND BUYOUT", which will retrieve all documents containing both words in any location within the document. Finally, the user is also given the opportunity to abandon the search entirely.
The above examples of search syntax are only hypothetical. In prac¬ tice, the system could use any other means of informing the user about the restrictions that were initially and automatically placed on the search, and can then give the user the chance to relax such restrictions. Note that in the cases of both Condition A and Condition B, the sys¬ tem identifies the probable points at which the search failed, and offers options for correction.
The system is programmed to display to the user only the options appropriate to a particular search. Thus, for example, if no field re¬ strictions were entered by the system, the user would not be given the option of broadening the field restrictions. If there are no proximity connectors in the search, the user would not be asked to broaden them. Also, if the search request consists of only one term, the option to de¬ lete a term would not be presented.
It is quite possible that the user could resolve a Condition A prob¬ lem and then encounter Condition B. Also, resolution of either Condition A or Condition B could result in the problem of too many postings, to be discussed below. In anticipation of either case, the system should be programmed to place a limit on the number of failed searches that can be performed by one user in one session.
The problem of too many postings is almost as serious as the problem of no postings at all. When the number of postings is greater than 99 (or any other arbitrarily selected number), the system presents a menu of the following choices to the user. First, the user can choose to view the first ten documents, usually arranged in reverse chronological order, with the most recent items first. Secondly, the user can be given the opportunity to add terms to the search, thereby narrowing its scope. Thirdly, the user can narrow the search by limiting one or more search words to a particular field, or by tightening proximity connectors. For example, instead of searching for "MONOCLONAL" and "ANTIBODIES", the user could search for only those items that contain these words in the title. Or instead of searching for items containing the words "PARTICLE" and "ACCELERATOR", the user could search for only those items in which these words are no more than two words apart. The user can be given the option of viewing the ten most recent items resulting from one term or phrase of the search request. Finally, the user can also be given the option of abandoning the search entirely.
Figure 5 is a flow chart which summarizes the embodiment described above, for assisting the user in the case of no postings or too many postings. The system executes a search in block 120. If there are no postings, as determined in test 122, the system determines, in test 124, whether each search term (or each unit of a search request) generated postings. If the answer is no, the system displays the number of post¬ ings for each unit, in block 126, and displays a menu of choices to the user, in block 128. The user is given the option of abandoning the search. Test 130 determines whether the user wants to abandon, and, if so, the system stops in block 132. If the user wants to modify the search, the modification is done in block 134, and the search is executed again.
If all the search terms or units generated postings, the system pro¬ ceeds through**blocks 136 and 138, and test 140, in similar fashion. The user may abandon the search, in block 142, or enter a modification, in block 144.
If the search yields a nonzero number of postings, the system checks whether there are too many postings, in test 146. If not, the system displays the results, in block 148. Block 148 can include any other dis¬ play steps that may be desired. If there are too many postings, then the system displays the number of postings in block 150 and asks the user for a choice, in block 152. The user may abandon the search, through test 154 and block 156, or may modify the search in block 158.
Another embodiment of the invention is useful both in the case of too many postings and in the case of a "successful" search. In this em¬ bodiment, the system ranks the retrieved documents in order of apparent relevance. The ranking is done without actually browsing through the documents.
The principle used in ranking retrieved documents can be illustrated with a simple example. Suppose the user wants information on laptop com¬ puters. The system searches for documents containing "LAPTOP" and "COM¬ PUTERS" within two words of each other. Suppose that 87 documents are retrieved. Now suppose that the search is narrowed by specifying that both words must appear in the title. Now, the number of documents may be reduced to 23. Suppose further that the search is again narrowed by specifying that the words be within one word of each other. Now, the number of retrieved documents is reduced to 12. It is very likely that these 12 documents are the most relevant of the original 87.
In general, it is apparent that a document that contains a word of the search request in its title is probably more relevant than a document containing the word in the abstract or elsewhere. A document in which two words of a search request are adjacent is probably more relevant than one in which the words are widely separated. It is therefore possible to construct a set of criteria which can be used to rank retrieved docu¬ ments, based only on the intermediate step information returned by the database or host.
A hypothetical set of criteria could be as follows. First, a docu¬ ment can be ranked according to whether a search term appears in the ti¬ tle and descriptor, the title alone, the descriptor alone, or the ab¬ stract. The "descriptor" field is a field containing key words from the document, and is provided in many databases. Thus, if a search term is found in both the title and the descriptor field of a document, the docu¬ ment is considered the most relevant. If the term is found only in the abstract, the document is considered the least relevant.
Secondly, the relevance of a retrieved document is related to the number of words of the search request which appear in the same field of the document. Thus, a document in which all of the words of the search request appear in the title is more relevant than one in which the words of the request appear in different portions.
Thirdly, if the search contains a proximity operator (i.e. if the search is trying to retrieve documents containing two or more words which are separated by a given number of words), a retrieved document is con¬ sidered most relevant if the words are adjacent, and least relevant if the words are far apart.
An algorithm can therefore be constructed which ranks the documents retrieved. The simplest such algorithm simply examines the field in which a search term appears (e.g. title, abstract, etc.), and ranks the documents as described above. A more complex algorithm takes into ac¬ count the other criteria described above. It has been found, in prac- tice, that only a relatively small number of indicators of relevance are necessary. The most useful criterion is to determine the field in which the search term appears. The number of terms appearing in a given field has also been found to be a useful criterion of relevance.
It is important to note that the ranking of documents by relevance is done without actually viewing the documents. On the contrary, the ranking is done only by determining which terms, or combinations of terms, appear in which portions of the documents. Such information can almost always be obtained without examining the document. Even if the database does not display intermediate steps, one can "probe" the data¬ base by entering a search for a given word, restricted to a given field, and observing the number of documents retrieved.
In general, in order to rank the documents by relevance, it may be necessary for the system to perform a search, for each given word, more than once. Thus, the system might search for occurrences of a given word, and then might repeat the same search, limiting the second search to, say, the "title" field. While the use of such duplicate searching adds somewhat to the cost of the search, the cost of search time is usu¬ ally relatively small compared to the cost of displaying search results. Thus, for a slightly higher cost of doing the search, one can obtain con¬ siderably more information about the relevance of the documents re¬ trieved, and can thereby avoid the more significant costs of displaying documents that turn out to be nonrelevant.
A method of ranking retrieved documents according to assumed rele¬ vance can be summarized as follows. Suppose that the searcher wants to find information about parallel processing computers. The system per¬ forms a search for documents containing the words "PARALLEL", "PROCES¬ SING", and "COMPUTERS". It also performs searches for documents which contain the' above-mentioned words in the title of the document. These extra searches are not "seen" by the user. Suppose that, in the database being searched, there are 1500 documents containing "PARALLEL", 1700 doc¬ uments containing "PROCESSING", 2500 documents containing "COMPUTERS", and 500 documents containing all three words. Suppose further that, for the searches which are restricted to the title, the corresponding numbers are 100, 150, and 300, respectively, and that ten documents contain all three words in the title. Note that the system has actually done eight searches, i.e. two searches for each word separately, and two searches for the combination of the three words. Clearly, the ten documents last retrieved are likely to be the most relevant. The search results are far more useful, as a result of the system having performed the extra search¬ es, as compared with the results obtained by simply searching for docu¬ ments containing all three words. Although more direct search time was expended, as compared with a simple search for the three words, the search results are much more useful, and it was not necessary to display a multiplicity of documents in order to find the relevant ones.
The following is a list of features, one or more of which can be included in a particular embodiment of the present invention:
1. Automatic selection of one or more databases
2. Automatic selection of host for a given database
3. Automatic generation of syntax to be used in conducting a search
4. Translation of search request into syntax of selected database or database family
5. Automatic generation of commands to be used in conducting a search
6. Translation of standardized commands into format appropriate to database or database family 7. Automatic generation of field commands to be used in conducting a search
8. Translation of standardized field commands into format of database or database family
9. Automatic substitution of alternative form of field commands
10. Automatic generation of suggestions for reformulation of search requests in cases where no documents or too many documents were retrieved
11. Automatic generation of list of retrieved documents in the order of their presumed relevance
To the above list, one can add alternative arrangements for format¬ ting and displaying the results of searches. Moreover, one can program the system to repeat the same search periodically to determine whether relevant documents may have been added to the databases under review.
The features of the above list can be present in virtually all pos¬ sible combinations. All of these features have been discussed above, or in the cited patent. Item No. 9 refers to the automatic provision for variant forms, such as "SMITH, JOHN" instead of "JOHN SMITH".
Thus, the invention should not be deemed limited to the particular embodiments described above. The arrangement of components in Figure 1 is only exemplary; many other combinations of computers, databases, and modems can be used, only some of which have been described explicitly. The embodiments of Figures 2, 3 and 4 can be used together or separately, and, as explained above, some features of each embodiment can be used alone. All such variations should be deemed within the spirit and scope of the following claims.

Claims

What is claimed is:
1. A method of supplying information to a user from databases, com¬ prising the steps of: a) accepting, from the user, a specification of the subject area in which the user desires to search, and a search request, the search request comprising at least one word for which the user desires to search, b) associating at least two databases with the specification of subject made by the user, c) automatically executing the search request for each of the databases identified in step (b), and d) displaying, to the user, the number of items retrieved, from each database, during each execution of the search request.
2. The method of Claim 1, further comprising the steps of: a) accepting, from the user, a designation of at least one of the retrieved items, and b) retrieving, from the respective databases, each item desig¬ nated by the user, and displaying each item to the user.
3. The method of Claim 2, wherein the retrieving step comprises the steps of establishing connection with a database, executing the selected search request, and downloading information obtained from the database.
4. The method of Claim 3, wherein the searches in each database are performed sequentially, and wherein each search includes the steps of establishing connection with a database, performing a search in that database, downloading the results of the search, and terminating the con¬ nection with the database.
5. The method of Claim 1, wherein the searches of step (c) are per- formed substantially simultaneously.
6. A method of supplying information to a user, from a plurality of databases, comprising the steps of accepting, from the user, a search request, the search request including at least one word for which the user desires to search, determining an area of interest relating to the search request, automatically selecting a set of databases, the set being determined by said area of interest, the databases in said set being re¬ lated to said area of interest, and automatically executing the search request in each member of said set of databases.
7. The method of Claim 6, wherein the step of determining an area of interest comprises selecting an area of interest from at least one menu.
8. The method of Claim 6, wherein the searches are performed se¬ quentially in each of said databases.
9. The method of Claim 6, further comprising the step of display¬ ing, to the user, the results of the searches performed.
10. The method of Claim 6, wherein the searches are performed sub¬ stantially simultaneously in each of said databases.
11. A system for supplying information to a user from databases, the user being located at a terminal, the system comprising a computer, the computer being connected through a first modem to the user's termi¬ nal, the computer being connected through at least one second modem to an outgoing telecommunications line, the computer being programmed to ac¬ cept, from the user, a search request, the search request comprising at least one word for which the user desires to search, the computer com¬ prising means for determining an area of interest related to the search request, the computer also comprising means for associating at least two databases with said area of interest, the computer also being programmed to execute automatically the search request for each of said databases, by establishing communication with said databases through the second modem and the telecommunications line, the computer also being programmed to display, to the user, information retrieved from the databases.
12. The system of Claim 11, wherein there are a plurality of second modems and telecommunications lines, each second modem being connected to the computer and to a telecommunications line, and wherein the computer includes means for connecting to a number of said second modems, the num¬ ber being equal to the number of databases which are to be searched, wherein the computer is capable of conducting the searches in the data¬ bases substantially simultaneously.
13. In a system for retrieving information from a database, the system being programmed to accept a search request and a choice of a database from a user, the system being capable of executing the search request in the selected database and displaying the results of the search in that database, each of the databases being associated with a set of commands, each set of commands being used to direct the operation of the searches and the displaying of results of the searches in a database, wherein the command sets for at least two of the databases are not iden¬ tical, the improvement wherein the system includes means for accepting, from the user, a command from a standardized set, in conjunction with a search request in a selected database, and means for translating the command from said standardized set into a corresponding command for the selected database.
14. The system of Claim 13, wherein the databases contain informa¬ tion arranged in searchable fields, and wherein the command set includes commands for retrieving information from specified fields.
15. A method of obtaining information from any one of a plurality of databases, each database having a command set for conducting the oper¬ ation of searches in the database and for displaying the results of such searches, the command sets for at least two databases being different, the method comprising the steps of: a) specifying a database, a search request and at least one command, the search request including at least one word desired to be searched in the database, the command being selected from a set of stand¬ ardized commands, b) translating the command selected from the standardized set into a command which can be recognized by the selected database, and c) executing the search request in the selected database, ac¬ cording to the translated command.
16. The method of Claim 15, wherein the databases contain informa¬ tion arranged in searchable fields, and wherein the standardized command set includes commands for retrieving information from specified fields, and wherein the translating step includes translating a standardized com¬ mand relating to information in one of said fields into a command which can be recognized by the selected database.
17. In a system for retrieval of information from a database, the database being located on at least two different database families, the system including a computer, a first modem connected to accept commands from a user located at a terminal, and a second modem connected to an outgoing telecommunications line, the computer being capable of estab¬ lishing connection with a plurality of database families, using the sec¬ ond modem and the telecommunications line, the computer being programmed to accept a search request from the user, the search request comprising at least one word for which the user desires to search, the computer com- prising means for selecting a database, the computer being programmed to transmit the search request to a selected database, the improvement wherein the computer is programmed to determine the database family to be used in gaining access to the selected database, and to gain access to said database through the selected database family.
18. The improvement of Claim 17, wherein the computer is programmed to gain access to an alternative database family which contains said database, if the first-selected database family is unavailable.
19. A system for searching multiple databases comprising a comput¬ er, means for connecting the computer to a user so that the user can di¬ rect a search request to the computer, a plurality of modems, each modem being connected to an outgoing telecommunications line, each modem being connected to the computer, wherein the computer is programmed to deter¬ mine a number of databases in which to execute the search request, and wherein the computer is programmed to operate said number of modems to execute the search request in each of said databases substantially simul¬ taneously, wherein the number of modems operated by the computer in re¬ sponse to the search request is not greater than the number of databases in which the search request is to be executed.
20. The system of Claim 19, wherein there are a plurality of users, and wherein the computer is programmed to execute searches substantially simultaneously for said users, and wherein the computer is programmed to take command of the number of modems necessary to execute a search for any one of said users, the computer being programmed to take command of a given modem only when said modem is not being operated in a search for another user.
21. A Method of retrieving information from a database, the infor- mation being supplied to a user, the method comprising the steps of: a) transmitting a search request to the database, the search request comprising at least one word for which the user desires to search, b) determining the number of documents retrieved from the database which fulfill the search request, c) determining whether the number of documents retrieved is zero or whether it is greater than a predetermined value, and d) displaying to the user, if the number of retrieved docu¬ ments is zero or greater than said predetermined value, the numbers of documents retrieved by at least one word of the search request, and sug¬ gesting to the user that the search can be modified and resub itted.
22. The method of Claim 21, wherein step (d) comprises the step of identifying whether any of the words of the search request were not found in any document of the database, and alerting the user if such condition is true.
23. The method of Claim 21, wherein the suggesting step is selected from the group consisting of suggesting that the user enter a related word instead of a word of the search request, suggesting that the user correct the spelling of a word of the search request, suggesting that the user delete a word from the search request, and suggesting that the user submit a search which seeks documents in which a word of the search re¬ quest appears in any portion of a document.
24. In a system for retrieving information from at least one data¬ base, the system including a computer which is programmed to accept a search request from a user, the search request including at least one word for which the user desires to search, the computer being programmed to retrieve documents which fulfill the user's search request, the im- provement wherein the computer is programmed to determine whether the number of documents retrieved is zero or whether it is greater than a predetermined value, and wherein the computer is programmed to display to the user, if the number of retrieved documents is zero or greater than said predetermined value, the numbers of documents retrieved by at least one word of the search request, and to suggest to the user at least one way in which the search can be modified.
25. The improvement of Claim 24, wherein the system is programmed to display the number of documents containing each word of the search request, and wherein the system is programmed to prompt the user to broaden or narrow portions of the search request according to whether the search retrieved zero documents or too many documents.
26. A method of retrieving information from a database, the infor¬ mation being supplied to a user, the method comprising the steps of: a) transmitting a search request to the database, the search request comprising at least one word for which the user desires to search, b) determining the number of documents retrieved from the database which fulfill the search request, c) ranking the documents retrieved in order of relevance, the ranking step being performed without examining the documents but only by determining the number of documents containing various words of the search request, and d) displaying, to the user, a predetermined number of the most relevant documents.
27. The method of Claim 26, wherein the documents are divided into a plurality of fields, and wherein the ranking step includes the step of determining the number of documents containing a word of the search re¬ quest in a given field of the documents, and ranking the documents in order of the presumed importance of each field.
28. The method of Claim 27, wherein the ranking step includes the step of determining the number of words of the search request which ap¬ pear in the same field of a document, and ranking the documents in order of said number of words.
29. The method of Claim 26, wherein the ranking step includes the step of determining the number of documents which contain combinations of words of the search request, separated by various numbers of words, and ranking the documents such that the most relevant documents are those in which the words of the search request appear closest together.
30. In a system for retrieving information from at least one data¬ base, the system including a computer which is programmed to accept a search request from a user, the search request including at least one word for which the user desires to search, the computer being programmed to retrieve documents which fulfill the user's search request, the im¬ provement wherein the computer is programmed to rank the retrieved docu¬ ments in order of relevance, without examining the documents but only by determining the number of documents containing various words of the search request.
PCT/US1990/000037 1989-01-12 1990-01-09 System and method for retrieving information from a plurality of databases WO1990008360A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US29614689A 1989-01-12 1989-01-12
US296,146 1989-01-12

Publications (1)

Publication Number Publication Date
WO1990008360A1 true WO1990008360A1 (en) 1990-07-26

Family

ID=23140810

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1990/000037 WO1990008360A1 (en) 1989-01-12 1990-01-09 System and method for retrieving information from a plurality of databases

Country Status (2)

Country Link
AU (1) AU5024990A (en)
WO (1) WO1990008360A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0673135A1 (en) * 1994-03-16 1995-09-20 BRITISH TELECOMMUNICATIONS public limited company Network support system
WO1996003703A1 (en) * 1994-07-25 1996-02-08 Apple Computer, Inc. Method and apparatus for searching for information in a data processing system
WO1996003702A1 (en) * 1994-07-25 1996-02-08 Apple Computer, Inc. Method and apparatus for searching for information in a network
WO1997033239A1 (en) * 1996-03-05 1997-09-12 Sofmap Future Design Co., Ltd. Database systems having single-association structures and method for searching data in the database systems
WO1997033241A1 (en) * 1996-03-05 1997-09-12 Information Projects Group, Inc. System and apparatus for loading and retrieving information
WO1997038378A1 (en) * 1996-04-10 1997-10-16 At & T Corp. Method of organizing information retrieved from the internet using knowledge based representation
AU683985B2 (en) * 1991-09-30 1997-11-27 Omron Corporation Fuzzy retrieval apparatus and method, and apparatus for creating membership functions
WO1998026357A1 (en) * 1996-12-09 1998-06-18 Practical Approach Corporation Natural language meta-search system and method
EP0855658A1 (en) * 1997-01-22 1998-07-29 AT&T Corp. Query translation system
WO1999049401A1 (en) * 1998-03-24 1999-09-30 Bull S.A. Extensive request server
WO2000065486A2 (en) * 1999-04-09 2000-11-02 Sandpiper Software, Inc. A method of mapping semantic context to enable interoperability among disparate sources
GB2314178B (en) * 1995-05-17 2000-12-27 Infoseek Corp Document retrieval over networks
WO2001001277A2 (en) * 1999-06-30 2001-01-04 Winstar New Media System and method for conducting and coordinating search queries over information exchange networks and private databases
EP1109115A1 (en) * 1999-12-14 2001-06-20 Sun Microsystems, Inc. Merging driver for accessing multiple database sources
WO2001052111A2 (en) * 2000-01-13 2001-07-19 Interlink Network Resources, Inc. System and method for internet broadcast searching
WO2001065412A2 (en) * 2000-02-29 2001-09-07 Fact City, Inc. Automatically determining a response to an inquiry using structured information
EP1176521A2 (en) * 2000-07-28 2002-01-30 International Business Machines Corporation System and method for providing decentralised e-commerce
GB2378774A (en) * 2001-05-01 2003-02-19 One Stop To Ltd Searching procedures

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4604686A (en) * 1984-01-27 1986-08-05 Martin Marietta Corporation Associative data access method (ADAM) and its means of implementation
US4769772A (en) * 1985-02-28 1988-09-06 Honeywell Bull, Inc. Automated query optimization method using both global and parallel local optimizations for materialization access planning for distributed databases
US4774655A (en) * 1984-10-24 1988-09-27 Telebase Systems, Inc. System for retrieving information from a plurality of remote databases having at least two different languages
US4829423A (en) * 1983-01-28 1989-05-09 Texas Instruments Incorporated Menu-based natural language understanding system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4829423A (en) * 1983-01-28 1989-05-09 Texas Instruments Incorporated Menu-based natural language understanding system
US4604686A (en) * 1984-01-27 1986-08-05 Martin Marietta Corporation Associative data access method (ADAM) and its means of implementation
US4774655A (en) * 1984-10-24 1988-09-27 Telebase Systems, Inc. System for retrieving information from a plurality of remote databases having at least two different languages
US4769772A (en) * 1985-02-28 1988-09-06 Honeywell Bull, Inc. Automated query optimization method using both global and parallel local optimizations for materialization access planning for distributed databases

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SEARCHING DIALOG THE COMPLETE GUIDE, "Chapter 1 Introduction to Dialog", August 1987. See pages (1-2) and (1-5). *
SEARCHING DIALOG THE COMPLETE GUIDE, "Chapter 8 Dialog Commands". See pages (8-Sort 1) to (8-sort 6). *
SEARCHING DIALOG THE COMPLETE GUIDE, "Chapter 9 Seraching Multiple Files", January 1988. See pp. (9-1), (9-4), (9.6), (9-14), (9-15), (9-24), (9-25). *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU683985B2 (en) * 1991-09-30 1997-11-27 Omron Corporation Fuzzy retrieval apparatus and method, and apparatus for creating membership functions
EP0673135A1 (en) * 1994-03-16 1995-09-20 BRITISH TELECOMMUNICATIONS public limited company Network support system
US5671408A (en) * 1994-03-16 1997-09-23 British Telecommunications Public Limited Company Network support system
US5715443A (en) * 1994-07-25 1998-02-03 Apple Computer, Inc. Method and apparatus for searching for information in a data processing system and for providing scheduled search reports in a summary format
WO1996003703A1 (en) * 1994-07-25 1996-02-08 Apple Computer, Inc. Method and apparatus for searching for information in a data processing system
WO1996003702A1 (en) * 1994-07-25 1996-02-08 Apple Computer, Inc. Method and apparatus for searching for information in a network
US5623652A (en) * 1994-07-25 1997-04-22 Apple Computer, Inc. Method and apparatus for searching for information in a network and for controlling the display of searchable information on display devices in the network
US6161102A (en) * 1994-07-25 2000-12-12 Apple Computer, Inc. Method and apparatus for searching for information in a data processing system and for providing scheduled search reports in a summary format
GB2314178B (en) * 1995-05-17 2000-12-27 Infoseek Corp Document retrieval over networks
WO1997033239A1 (en) * 1996-03-05 1997-09-12 Sofmap Future Design Co., Ltd. Database systems having single-association structures and method for searching data in the database systems
US5842212A (en) * 1996-03-05 1998-11-24 Information Project Group Inc. Data modeling and computer access record memory
US5903890A (en) * 1996-03-05 1999-05-11 Sofmap Future Design, Inc. Database systems having single-association structures
WO1997033241A1 (en) * 1996-03-05 1997-09-12 Information Projects Group, Inc. System and apparatus for loading and retrieving information
WO1997038378A1 (en) * 1996-04-10 1997-10-16 At & T Corp. Method of organizing information retrieved from the internet using knowledge based representation
WO1998026357A1 (en) * 1996-12-09 1998-06-18 Practical Approach Corporation Natural language meta-search system and method
US6078914A (en) * 1996-12-09 2000-06-20 Open Text Corporation Natural language meta-search system and method
US5987452A (en) * 1997-01-22 1999-11-16 At&T Corp Query translation system
EP0855658A1 (en) * 1997-01-22 1998-07-29 AT&T Corp. Query translation system
WO1999049401A1 (en) * 1998-03-24 1999-09-30 Bull S.A. Extensive request server
FR2776789A1 (en) * 1998-03-24 1999-10-01 Bull Sa GENERALIZED REQUEST SERVER
WO2000065486A2 (en) * 1999-04-09 2000-11-02 Sandpiper Software, Inc. A method of mapping semantic context to enable interoperability among disparate sources
WO2000065486A3 (en) * 1999-04-09 2003-12-04 Sandpiper Software Inc A method of mapping semantic context to enable interoperability among disparate sources
WO2001001277A3 (en) * 1999-06-30 2002-06-13 Winstar New Media System and method for conducting and coordinating search queries over information exchange networks and private databases
WO2001001277A2 (en) * 1999-06-30 2001-01-04 Winstar New Media System and method for conducting and coordinating search queries over information exchange networks and private databases
EP1109115A1 (en) * 1999-12-14 2001-06-20 Sun Microsystems, Inc. Merging driver for accessing multiple database sources
US7406697B2 (en) 1999-12-14 2008-07-29 Sun Microsystems, Inc. System and method including a merging driver for accessing multiple data sources
WO2001052111A3 (en) * 2000-01-13 2003-12-24 Interlink Network Resources In System and method for internet broadcast searching
US7000007B1 (en) 2000-01-13 2006-02-14 Valenti Mark E System and method for internet broadcast searching
WO2001052111A2 (en) * 2000-01-13 2001-07-19 Interlink Network Resources, Inc. System and method for internet broadcast searching
WO2001065412A2 (en) * 2000-02-29 2001-09-07 Fact City, Inc. Automatically determining a response to an inquiry using structured information
WO2001065412A3 (en) * 2000-02-29 2004-02-26 Fact City Inc Automatically determining a response to an inquiry using structured information
EP1176521A2 (en) * 2000-07-28 2002-01-30 International Business Machines Corporation System and method for providing decentralised e-commerce
EP1176521A3 (en) * 2000-07-28 2005-06-22 International Business Machines Corporation System and method for providing decentralised e-commerce
GB2378774A (en) * 2001-05-01 2003-02-19 One Stop To Ltd Searching procedures

Also Published As

Publication number Publication date
AU5024990A (en) 1990-08-13

Similar Documents

Publication Publication Date Title
WO1990008360A1 (en) System and method for retrieving information from a plurality of databases
US4774655A (en) System for retrieving information from a plurality of remote databases having at least two different languages
US5671404A (en) System for querying databases automatically
US5634051A (en) Information management system
US6345268B1 (en) Method and system for resolving temporal descriptors of data records in a computer system
EP0901661B1 (en) Method and system for selecting an information item in an information processing system, local station in such a system
US7752218B1 (en) Method and apparatus for providing comprehensive search results in response to user queries entered over a computer network
US5404507A (en) Apparatus and method for finding records in a database by formulating a query using equivalent terms which correspond to terms in the input query
US6625595B1 (en) Method and system for selectively presenting database results in an information retrieval system
Marcus An experimental comparison of the effectiveness of computers and humans as search intermediaries
US6055531A (en) Down-line transcription system having context sensitive searching capability
US6839704B2 (en) Information storage, retrieval and delivery system and method operable with a computer network
US6499017B1 (en) Method for provisioning communications devices and system for provisioning same
EP0720108A1 (en) Method for storing and retrieving digital data transmissions
WO2000062264A2 (en) Method and system for retrieving data from multiple data sources using a search routing database
US20020120775A1 (en) Browser apparatus, address registering method, browser system, and recording medium
JP2001510607A (en) Intelligent network browser using indexing method based on proliferation concept
US20020046203A1 (en) Method and apparatus for providing ratings of web sites over the internet
JPH10222542A (en) Collation conversion system
US20020138337A1 (en) Question and answering apparatus, question and answering method, and question and answering program
US20110040782A1 (en) Context Sensitive Searching Front End
US5907320A (en) Time-based method of human-computer interaction for controlling storage and retrieval of multimedia information
JP2004192521A (en) Contact center system
JP3833543B2 (en) Electronic form distribution apparatus and electronic form distribution program
WO1995000911A1 (en) Computer-based classified ad system and method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU BB BG BR CA DK FI HU JP KP KR LK MC MG MW NO RO SD SU

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BF BJ CF CG CH CM DE DK ES FR GA GB IT LU ML MR NL SE SN TD TG

NENP Non-entry into the national phase

Ref country code: CA