US20030225757A1 - Displaying portions of text from multiple documents over multiple database related to a search query in a computer network - Google Patents

Displaying portions of text from multiple documents over multiple database related to a search query in a computer network Download PDF

Info

Publication number
US20030225757A1
US20030225757A1 US10/387,747 US38774703A US2003225757A1 US 20030225757 A1 US20030225757 A1 US 20030225757A1 US 38774703 A US38774703 A US 38774703A US 2003225757 A1 US2003225757 A1 US 2003225757A1
Authority
US
United States
Prior art keywords
query
documents
database
text
databases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/387,747
Inventor
David Evans
Michael McInerny
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JustSystems Evans Research Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/387,747 priority Critical patent/US20030225757A1/en
Assigned to CLARITECH CORPORATION reassignment CLARITECH CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EVANS, DAVID A., MCINERY, MICHAEL J.
Publication of US20030225757A1 publication Critical patent/US20030225757A1/en
Assigned to CLAIRVOYANCE CORPORATION reassignment CLAIRVOYANCE CORPORATION CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: CLARITECH CORPORATION
Assigned to CLAIRVOYANCE CORPORATION reassignment CLAIRVOYANCE CORPORATION CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: CLARITECH CORPORATION
Assigned to JUSTSYSTEMS EVANS RESEARCH INC. reassignment JUSTSYSTEMS EVANS RESEARCH INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: CLAIRVOYANCE CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Definitions

  • This invention relates in general to computer databases.
  • this invention relates to locating and generating connections between concepts identified in a source document and data objects distributed throughout multiple databases in a computer network.
  • Hyperlinks are ways of connecting the text of two documents together. Hyperlinks operate on a page image shown to a database user. A phrase or text section on the page image will be highlighted. When a user selects this phrase (clicks on it with a mouse), the user is immediately shown related text from another document.
  • These hyperlinks are hardcoded links between a specific term and a specific set of text within a database or text on another network.
  • the hyperlinks are useful because they allow a user to quickly retrieve documents related to the highlighted phrase without manually constructing and executing different searches.
  • An example of conventional hyperlinks is U.S. Pat. No. 5,603,025 to Tabb, et al.
  • a hypertext report writing module is created in which hypertext links are automatically embedded in documents from the database.
  • hyperlinks are pre-determined relationships between specified terms in databases, it is generally not feasible to categorize many large databases to make predetermined relationships for all items of potential interest.
  • conventional hypertext links are normally static. That is, even if there were enough resources to hardcode enough hypertext links to make them useful in a database, the process of hardcoding the links would only occur once. Thus, databases with hardcoded hyperlinks would not be linked to new data. These hyperlinks miss updates in the data. They also miss the addition of new databases to networks.
  • the pre-determined and static nature of the hyperlinks as they currently exist makes them inappropriate for dynamically changing databases and difficult to use in distributed databases for information retrieval on wide ranging subjects. Accordingly, conventional search techniques have failed to address the need for a process capable of automatically generating connections between texts in different documents across multiple databases. Additionally, conventional search techniques have failed to provide a connection generating technique that can adapt to databases that are modified on a real time basis.
  • the system of the present invention provides a method of and apparatus for displaying portions of text from multiple documents over multiple databases related to a search query.
  • the initial step in this method is to identify a search query. Based on this identification, a search against multiple databases is initiated.
  • the computer system identifies auxiliary databases either within a network or between networks that are likely to contain documents relating to terms in the search query.
  • the databases are then searched to identify those documents relating to the identified query.
  • the various sets of identified documents from multiple databases are then returned and processed to create an ordered ranking for the returned documents. Text portions from the highest ranking documents across the multiple databases are then automatically displayed to the user.
  • FIG. 1 is an illustration of a computer system that operates according to the present invention for displaying text portions from multiple databases.
  • FIG. 2 is a flowchart that illustrates a process according to an embodiment of the present invention for displaying text portions relating to a query from multiple databases.
  • FIG. 3 is a flowchart that illustrates a process according to an embodiment of the present invention for inverting a database.
  • FIG. 4 is an illustration of a listing of text that results from a noun phrase parsing process.
  • FIG. 5 is a flowchart that illustrates a process according to an embodiment of the present invention for scoring subdocuments.
  • FIG. 6 is a flowchart that illustrates a process according to an embodiment of the present invention for sorting.
  • FIG. 1 illustrates a computer system for searching databases.
  • the computer 220 is connected to a display 210 , an input system 205 (including for example, a keyboard and mouse) a memory system 230 and a communications link 280 .
  • the communications link is a simple modem. It could also be a higher rate direct connection between computers or another device for interconnecting computer systems.
  • the communications link 280 is in turn connected to a network of M other computers each having their own memory systems.
  • the memory system 230 associated with computer 220 has a memory section 240 that contains a target database and it includes N memory sections that store a series of N auxiliary databases.
  • the target database in memory section 240 stores information that a user is currently interested in searching.
  • the remaining N memory sections store auxiliary databases related to a variety of topics.
  • the M computers attached to communications link 280 each have similar memory sections that store N auxiliary databases.
  • memory section 250 of memory system 230 stores a list of database addresses and identifiers.
  • the computer system of FIG. 1 operates to display information from a target file or database to a user.
  • a user will often recognize a specific idea or concept from the displayed information that may or may not be directly relevant to the general information currently being displayed. The user will desire to access or link to information about this specific concept without losing access to the general information currently being displayed.
  • the computer system of FIG. 1 operates to provide links between identified concepts and information contained in multiple databases.
  • the computer system of FIG. 1 provides these links by causing the computer 220 to receive a query and identify databases having information relevant to the query. Once the databases are identified computer 220 causes them to be searched such that they return documents or passages of documents relevant to the query.
  • the computer 220 then organizes the returned documents or passages thereof and displays at least a portion of the text associated with those documents.
  • FIG. 2 illustrates a process for operating the computer system of FIG. 1 according to the present invention.
  • a query is identified in Step 10 of FIG. 2. This can be done by highlighting and selecting (through a conventional graphical user interface) a portion of text that the computer is already displaying. The query could also just be an input to the computer 220 made through a keyboard.
  • the text is converted into a search request in step 20 of FIG. 2. Converting the identified query text into a search request involves the conventional steps of parsing the query text into terms and then making use of the terms to form a query. The form of the query will depend on the type of search technique that will be used to search the databases.
  • search techniques use Boolean combinations of terms as the query. As a result, these techniques ‘AND’ the query terms together to form a query.
  • Other search techniques make use of vector space analysis. In this case, the list of terms forms a query because the vector space algorithm does not use logical operators to form the query.
  • step 30 of FIG. 2 selects the databases that will be searched.
  • the computer system of FIG. 1 includes a memory space 250 that stores information to identify databases (and the types of information they store) or general database search engines. Since general database search engines, such as the LycosTM engine on the World Wide Web have their own resources for selecting the particular databases to search for a given query, Step 30 merely transmits a Boolean combination of query terms to these search engines (unless a user opts out of such a selection). For other databases identified in memory space 250 of FIG. 1, a Boolean combination of query terms is compared against the description of the databases listed in memory space 250 . As a result of this comparison, a set of auxiliary databases is selected that will be searched against the query.
  • Step 40 begins the search process for the auxiliary databases selected in Step 30 .
  • the target database will not be searched because the user is, presumably, already searching that database for the concepts of interest.
  • the target database could also be selected in Step 30 and searched as well.
  • the search process is started by transmitting a query to each of the selected auxiliary databases that are associated with computer 220 .
  • Computer 220 will also transmit instructions and one or more forms of the search query to the M computers through the communications link 280 .
  • the instructions sent by computer 220 could, for example, instruct computer 300 to use the LycosTM search engine to search databases on the World Wide Web for documents having a Boolean combination of the terms in the search query.
  • the instructions sent by computer 220 could also, for example, instruct computer 400 to use a vector space search technique to search its associated auxiliary database N to retrieve documents related to the list of query terms.
  • the documents retrieved in Step 40 from the auxiliary databases associated with the M computers are returned to computer 220 through communication link 280 .
  • Step 50 of FIG. 2 determines a rank order of the documents for display.
  • the processing of step 50 is completely independent of the processing used to retrieve the documents.
  • the retrieved documents in effect, form an independent database that is analyzed by the computer 220 .
  • various search techniques for retrieving documents across computer networks can be utilized, but all the returned documents are analyzed according to an independent process.
  • the processing of step 50 can be as simple as selecting the documents for display that are returned first.
  • the processing of Step 50 ranks the order of the returned documents according to a hierarchy of the databases in which the documents were located.
  • Step 50 Still another processing alternative for Step 50 is to perform a vector space analysis on the returned documents. This analysis will rank the returned documents based on their relevance to the query.
  • a vector space analysis computes a similarity score between the terms in the query and each of the returned documents can be computed by evaluating the shared and disjoint features of the query terms and a document over an orthogonal space of T terms of the document.
  • FIG. 3 illustrates a process for inverting a database.
  • a document from the database is selected.
  • the document is broken into subdocuments.
  • each subdocument generally corresponds to a paragraph of the document. Long paragraphs may consist of multiple subdocuments and several short paragraphs may be included in a single subdocument. The subdocuments all have approximately the same length.
  • a subdocument is selected and parsed.
  • the parsing process is a noun phrase parsing process.
  • linguistic structure is assigned to sequences of words in a sentence. Those terms, including noun phrases, that have semantic meaning are listed.
  • This parsing process can be implemented by a variety of techniques known in the art such as the use of lexicons, morphological analyzers or natural language grammar structures.
  • FIG. 4 is an example listing of text passed for noun phrases. As is evident from the list of FIG. 4, the phrases tagged with a ‘T’ are noun phrases, words tagged with a ‘V’ are verbs, words tagged with an ‘X’ are quantities, words tagged with an ‘A’ are adverbs and so on.
  • a term list containing noun phrases and their associated subdocument is generated in step 140 . All the subdocuments for each document are processed in this way and the list of terms and subdocuments is updated. Finally, all the documents of a database are processed according to steps 132 - 140 . The result of this inversion process is a term list identifying all the terms (specifically noun phrases in this example) of a database and their associated subdocuments.
  • step 310 the term list of the inverted database is searched to identify all the subdocuments that are associated with each term of the query that was identified in step 10 of FIG. 2.
  • step 320 computes a partial similarity score (according to the general formula discussed above) for the query term and the subdocument. The computation process repeats for each query term and subdocument.
  • step 330 the partial scores for each subdocument are added or otherwise combined. As a result, when all the subdocuments have been scored for all the query terms, a subdocument score list is created in which each subdocument has an accumulated score.
  • the subdocument score list contains a number of subdocument entries that are not sorted relative to their scores.
  • the process of step 50 sorts the subdocuments by their score.
  • This sort operation is a modified heap sort on the subdocument score list.
  • a heap sort process is a process in which a heap is first created and then the documents with the highest scores are selected off the top of the heap to make the final sort order.
  • the N subdocument scores are in heap form when the root (highest or lowest score magnitude on the subdocument score list represented by vector a(N)) is stored at a(1), the children of a[i] are a[2i] and a[2i+1] and the magnitude of a[i/2]>a[i] for 1 ⁇ i/2 ⁇ i ⁇ N.
  • a[1] max (a[i]) for 1 ⁇ i ⁇ N. That is, the highest subdocument score is in the first position (a[1]) of the heap.
  • step 50 merely selects this subdocument for further processing by the computer 220 .
  • the computer 220 displays the document text associated with this highest ranked subdocument.
  • the computer 220 can also display the text of the entire document associated with this subdocument.
  • the computer 220 is also processing in the background (according to step 50 of FIG. 2) the remaining entries in the subdocument score list to reheapify them (i.e., reorganize them back into a heap form after the highest value subdocument has been removed).
  • the next highest order subdocument is sought by computer 220 , it can be merely selected off the top of the heap and displayed.
  • the remaining entries in the subdocument list would then be reheapified again.
  • the computer system automatically connects the user to text portions of documents that are specifically related to the query. These text portions are retrieved from databases that do not have any particular structure or coded links in them. Additionally, these links are provided in spite of the fact that the set of returned documents may have been generated by different search techniques from different sources. Moreover, since the returned documents are automatically displayed, the user avoids the necessity of reorganizing the returned documents which may have been retrieved based on a variety of database search techniques.

Abstract

The system of the present invention provides for a method and apparatus of displaying portions of text from multiple documents over multiple databases related to a search query. The initial step in this method is to identify a search query. Based on this identification, a search against multiple databases is initiated. In particular, the computer system identifies auxiliary databases either within a network or between networks that are likely to contain documents relating to terms in the search query. Upon identification of these databases, the databases are then searched to identify those documents relating to the identified query. The various sets of identified documents from multiple databases are then returned and processed to create an ordered ranking for the returned documents. Text portions from the highest ranking documents across the multiple databases are then automatically displayed to the user.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This is a continuation of patent application Ser. No. 09/295,840 filed Apr. 21, 1999, which is a division of patent application Ser. No. 08/900,639 filed Jul. 25, 1997, now issued as U.S. Pat. No. 5,926,808.[0001]
  • FIELD OF THE INVENTION
  • This invention relates in general to computer databases. In particular, this invention relates to locating and generating connections between concepts identified in a source document and data objects distributed throughout multiple databases in a computer network. [0002]
  • BACKGROUND OF THE INVENTION
  • The volume of documents in databases is rapidly expanding. It has been estimated that in excess of 90% of all desired intelligence information is available in documents residing in accessible databases. Additionally, the number and size of computer databases available to computer users is expanding rapidly. This expansion is due both to the availability of multiple databases within a single network and the availability of multiple networks to a single computer. A major concern facing the user of a computer system that has access to multiple databases both within a network and between networks is the ability to conveniently locate relevant information. This problem is compounded in computer networks because the user is likely to be unaware of a number of databases across a network that contain relevant information. [0003]
  • Typically, document retrieval from databases involves multiple user-driven searches across many different databases. The problem with this search technique is that it is often cumbersome because it requires significant interaction by the user to access many different databases. To cope with the ever-increasing expansion of databases across networks, recent attempts have been made at automating search processes. These improved systems have employed the generation of hyperlinks. Hyperlinks are ways of connecting the text of two documents together. Hyperlinks operate on a page image shown to a database user. A phrase or text section on the page image will be highlighted. When a user selects this phrase (clicks on it with a mouse), the user is immediately shown related text from another document. These hyperlinks are hardcoded links between a specific term and a specific set of text within a database or text on another network. The hyperlinks are useful because they allow a user to quickly retrieve documents related to the highlighted phrase without manually constructing and executing different searches. An example of conventional hyperlinks is U.S. Pat. No. 5,603,025 to Tabb, et al. In this patent, a hypertext report writing module is created in which hypertext links are automatically embedded in documents from the database. [0004]
  • Although useful, conventional hypertext links are difficult to implement and use because these hypertext links have to be coded into the database itself. This fact renders conventional hypertext links inadequate for general purpose use in a computer network housing large quantities of distributed data. This is because the volume of potential hyperlinks is extremely large and the manual generation of such hardcoded links is, as a result, time consuming and expensive in large text databases. [0005]
  • Also, since hyperlinks are pre-determined relationships between specified terms in databases, it is generally not feasible to categorize many large databases to make predetermined relationships for all items of potential interest. Moreover, conventional hypertext links are normally static. That is, even if there were enough resources to hardcode enough hypertext links to make them useful in a database, the process of hardcoding the links would only occur once. Thus, databases with hardcoded hyperlinks would not be linked to new data. These hyperlinks miss updates in the data. They also miss the addition of new databases to networks. The pre-determined and static nature of the hyperlinks as they currently exist makes them inappropriate for dynamically changing databases and difficult to use in distributed databases for information retrieval on wide ranging subjects. Accordingly, conventional search techniques have failed to address the need for a process capable of automatically generating connections between texts in different documents across multiple databases. Additionally, conventional search techniques have failed to provide a connection generating technique that can adapt to databases that are modified on a real time basis. [0006]
  • OBJECTS OF THE INVENTION
  • It is the object of the present invention to analyze documents in a database system. [0007]
  • It is a further object of the present invention to analyze documents in a database system by making connections between parts of related text in different documents. [0008]
  • It is still a further object of the present invention to analyze documents in a database system by automating the process of connecting related text between different documents over multiple databases. [0009]
  • It is still a further object of the present invention to analyze documents in a database system by automating the process of connecting related text between different documents across multiple computer networks. [0010]
  • SUMMARY OF THE INVENTION
  • The system of the present invention provides a method of and apparatus for displaying portions of text from multiple documents over multiple databases related to a search query. The initial step in this method is to identify a search query. Based on this identification, a search against multiple databases is initiated. In particular, the computer system identifies auxiliary databases either within a network or between networks that are likely to contain documents relating to terms in the search query. Upon identification of these databases, the databases are then searched to identify those documents relating to the identified query. The various sets of identified documents from multiple databases are then returned and processed to create an ordered ranking for the returned documents. Text portions from the highest ranking documents across the multiple databases are then automatically displayed to the user. [0011]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is an illustration of a computer system that operates according to the present invention for displaying text portions from multiple databases. [0012]
  • FIG. 2 is a flowchart that illustrates a process according to an embodiment of the present invention for displaying text portions relating to a query from multiple databases. [0013]
  • FIG. 3 is a flowchart that illustrates a process according to an embodiment of the present invention for inverting a database. [0014]
  • FIG. 4 is an illustration of a listing of text that results from a noun phrase parsing process. [0015]
  • FIG. 5 is a flowchart that illustrates a process according to an embodiment of the present invention for scoring subdocuments. [0016]
  • FIG. 6 is a flowchart that illustrates a process according to an embodiment of the present invention for sorting. [0017]
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 1 illustrates a computer system for searching databases. The [0018] computer 220 is connected to a display 210, an input system 205 (including for example, a keyboard and mouse) a memory system 230 and a communications link 280. Normally, the communications link is a simple modem. It could also be a higher rate direct connection between computers or another device for interconnecting computer systems. The communications link 280 is in turn connected to a network of M other computers each having their own memory systems. The memory system 230 associated with computer 220 has a memory section 240 that contains a target database and it includes N memory sections that store a series of N auxiliary databases. The target database in memory section 240 stores information that a user is currently interested in searching. The remaining N memory sections store auxiliary databases related to a variety of topics. The M computers attached to communications link 280 each have similar memory sections that store N auxiliary databases. In addition, memory section 250 of memory system 230 stores a list of database addresses and identifiers.
  • In general, the computer system of FIG. 1 operates to display information from a target file or database to a user. In the course of that general display of information, a user will often recognize a specific idea or concept from the displayed information that may or may not be directly relevant to the general information currently being displayed. The user will desire to access or link to information about this specific concept without losing access to the general information currently being displayed. The computer system of FIG. 1 operates to provide links between identified concepts and information contained in multiple databases. The computer system of FIG. 1 provides these links by causing the [0019] computer 220 to receive a query and identify databases having information relevant to the query. Once the databases are identified computer 220 causes them to be searched such that they return documents or passages of documents relevant to the query. The computer 220 then organizes the returned documents or passages thereof and displays at least a portion of the text associated with those documents.
  • Specifically, FIG. 2 illustrates a process for operating the computer system of FIG. 1 according to the present invention. Initially, a query is identified in [0020] Step 10 of FIG. 2. This can be done by highlighting and selecting (through a conventional graphical user interface) a portion of text that the computer is already displaying. The query could also just be an input to the computer 220 made through a keyboard. Once the text of the query has been identified, the text is converted into a search request in step 20 of FIG. 2. Converting the identified query text into a search request involves the conventional steps of parsing the query text into terms and then making use of the terms to form a query. The form of the query will depend on the type of search technique that will be used to search the databases. Most search techniques use Boolean combinations of terms as the query. As a result, these techniques ‘AND’ the query terms together to form a query. Other search techniques make use of vector space analysis. In this case, the list of terms forms a query because the vector space algorithm does not use logical operators to form the query.
  • Once a query has been formed, step [0021] 30 of FIG. 2 selects the databases that will be searched. The computer system of FIG. 1 includes a memory space 250 that stores information to identify databases (and the types of information they store) or general database search engines. Since general database search engines, such as the Lycos™ engine on the World Wide Web have their own resources for selecting the particular databases to search for a given query, Step 30 merely transmits a Boolean combination of query terms to these search engines (unless a user opts out of such a selection). For other databases identified in memory space 250 of FIG. 1, a Boolean combination of query terms is compared against the description of the databases listed in memory space 250. As a result of this comparison, a set of auxiliary databases is selected that will be searched against the query.
  • Once the set of auxiliary databases is selected in [0022] Step 30 of FIG. 2, Step 40 begins the search process for the auxiliary databases selected in Step 30. Normally the target database will not be searched because the user is, presumably, already searching that database for the concepts of interest. However, the target database could also be selected in Step 30 and searched as well. Referring to FIG. 1, the search process is started by transmitting a query to each of the selected auxiliary databases that are associated with computer 220. Computer 220 will also transmit instructions and one or more forms of the search query to the M computers through the communications link 280. The instructions sent by computer 220 could, for example, instruct computer 300 to use the Lycos™ search engine to search databases on the World Wide Web for documents having a Boolean combination of the terms in the search query. The instructions sent by computer 220 could also, for example, instruct computer 400 to use a vector space search technique to search its associated auxiliary database N to retrieve documents related to the list of query terms. The documents retrieved in Step 40 from the auxiliary databases associated with the M computers are returned to computer 220 through communication link 280.
  • Once the documents retrieved from the auxiliary databases have been returned, [0023] computer 220 processes them in Step 50 of FIG. 2 to determine a rank order of the documents for display. The processing of step 50 is completely independent of the processing used to retrieve the documents. The retrieved documents, in effect, form an independent database that is analyzed by the computer 220. As a result, various search techniques for retrieving documents across computer networks can be utilized, but all the returned documents are analyzed according to an independent process. The processing of step 50 can be as simple as selecting the documents for display that are returned first. Alternatively, the processing of Step 50 ranks the order of the returned documents according to a hierarchy of the databases in which the documents were located.
  • Still another processing alternative for [0024] Step 50 is to perform a vector space analysis on the returned documents. This analysis will rank the returned documents based on their relevance to the query. In particular, a vector space analysis computes a similarity score between the terms in the query and each of the returned documents can be computed by evaluating the shared and disjoint features of the query terms and a document over an orthogonal space of T terms of the document. The score can be computed by the following formula: S ( Q i , D j ) = Q i , D j Q · D = k = 1 t ( q i k · d i k ) k = 1 t q i k 2 · k = 1 t q i k 2
    Figure US20030225757A1-20031204-M00001
  • Where Q[0025] i refers to terms in the query and Dj refers to terms in the document.
  • In order to score the retrieved documents, the set of retrieved documents is treated as a database and this database is inverted. The inversion step is a technique for creating a listing of all the terms of the database and the portions of the documents associated with those terms. FIG. 3 illustrates a process for inverting a database. In [0026] step 132, a document from the database is selected. In step 134, the document is broken into subdocuments. In this process, for example, each subdocument generally corresponds to a paragraph of the document. Long paragraphs may consist of multiple subdocuments and several short paragraphs may be included in a single subdocument. The subdocuments all have approximately the same length.
  • In [0027] steps 136 and 138 of FIG. 3 respectively, a subdocument is selected and parsed. In this example, the parsing process is a noun phrase parsing process. In this process, linguistic structure is assigned to sequences of words in a sentence. Those terms, including noun phrases, that have semantic meaning are listed. This parsing process can be implemented by a variety of techniques known in the art such as the use of lexicons, morphological analyzers or natural language grammar structures. FIG. 4 is an example listing of text passed for noun phrases. As is evident from the list of FIG. 4, the phrases tagged with a ‘T’ are noun phrases, words tagged with a ‘V’ are verbs, words tagged with an ‘X’ are quantities, words tagged with an ‘A’ are adverbs and so on.
  • Once the subdocument has been parsed, a term list containing noun phrases and their associated subdocument is generated in [0028] step 140. All the subdocuments for each document are processed in this way and the list of terms and subdocuments is updated. Finally, all the documents of a database are processed according to steps 132-140. The result of this inversion process is a term list identifying all the terms (specifically noun phrases in this example) of a database and their associated subdocuments.
  • Once the retrieved document database has been inverted, the subdocuments of that database are scored. FIG. 5 is an illustration of the scoring process. In [0029] step 310, the term list of the inverted database is searched to identify all the subdocuments that are associated with each term of the query that was identified in step 10 of FIG. 2. For each of the identified subdocuments, step 320 computes a partial similarity score (according to the general formula discussed above) for the query term and the subdocument. The computation process repeats for each query term and subdocument. In step 330, the partial scores for each subdocument are added or otherwise combined. As a result, when all the subdocuments have been scored for all the query terms, a subdocument score list is created in which each subdocument has an accumulated score.
  • After [0030] step 330 of FIG. 5, the subdocument score list contains a number of subdocument entries that are not sorted relative to their scores. At this point, the process of step 50 sorts the subdocuments by their score. This sort operation is a modified heap sort on the subdocument score list. A heap sort process is a process in which a heap is first created and then the documents with the highest scores are selected off the top of the heap to make the final sort order. FIG. 6 illustrates a general algorithm for a heap sort process. This process is initialized by setting l=(N/2)+1 and r=N, where N is the number of subdocuments in the subdocument score list. Then, the process of FIG. 6 is operated until l=1 or r<N. This process places the N subdocument scores in a heap form. The N subdocument scores are in heap form when the root (highest or lowest score magnitude on the subdocument score list represented by vector a(N)) is stored at a(1), the children of a[i] are a[2i] and a[2i+1] and the magnitude of a[i/2]>a[i] for 1<i/2<i<N. When the subdocument score list is in a heap form, a[1]=max (a[i]) for 1<i<N. That is, the highest subdocument score is in the first position (a[1]) of the heap.
  • Since subdocuments are ranked by score to quickly select the most relevant subdocuments and since the most relevant subdocument is at the top of the heap, the process of step [0031] 50 (of FIG. 2) merely selects this subdocument for further processing by the computer 220. In step 60 of FIG. 2, the computer 220 then displays the document text associated with this highest ranked subdocument. The computer 220 can also display the text of the entire document associated with this subdocument. While the computer 220 is displaying the text of the highest ranking subdocument, the computer 220 is also processing in the background (according to step 50 of FIG. 2) the remaining entries in the subdocument score list to reheapify them (i.e., reorganize them back into a heap form after the highest value subdocument has been removed). As a result, when the next highest order subdocument is sought by computer 220, it can be merely selected off the top of the heap and displayed. The remaining entries in the subdocument list would then be reheapified again.
  • According to the process illustrated in FIG. 2, once a user has selected a query (through highlighting text or otherwise), the computer system automatically connects the user to text portions of documents that are specifically related to the query. These text portions are retrieved from databases that do not have any particular structure or coded links in them. Additionally, these links are provided in spite of the fact that the set of returned documents may have been generated by different search techniques from different sources. Moreover, since the returned documents are automatically displayed, the user avoids the necessity of reorganizing the returned documents which may have been retrieved based on a variety of database search techniques. [0032]
  • While the invention has been particularly described and illustrated with reference to a preferred embodiment, it will be understood by one of skill in the art that changes in the above description or illustrations may be made with respect to formal detail without departing from the spirit and scope of the invention. [0033]

Claims (18)

We claim:
1. A method for automatically displaying text from a database related to a query, comprising the steps of:
generating a query on a computer;
prior to communicating with a database, automatically selecting at least one of a plurality of databases related to said query from a list stored in computer memory, wherein said list includes a list of a plurality of databases, corresponding addresses for said databases, and a description of said databases;
automatically searching the selected database(s) for documents related to said query;
organizing documents returned from said search in a relevance order corresponding to the relevance of said returned documents to said query; and
displaying portions of text on said computer, wherein said portions of text are related to said query from a plurality of said returned documents in said relevance order.
2. A method for automatically displaying text from a database related to a query, as in claim 1, wherein:
the step of generating said query comprises selecting a region of text from a document.
3. A method for automatically displaying text from a database related to a query, as in claim 2, wherein:
said document from which said region of text is selected is stored in a database unrelated to said query.
4. A method for automatically displaying text from a database related to a query, as in claim 2, wherein:
the step of organizing documents returned from said search in a relevance order comprises computing a relevance score for said returned documents and rank ordering the returned documents according to said relevance score.
5. A method for automatically displaying text from a database related to a query, as in claim 2, wherein:
the step of automatically searching the selected database(s) comprises comparing document text of said selected database to boolean combinations of keywords.
6. A method for automatically displaying text from a database related to a query, as in claim 5, wherein:
the step of organizing documents returned from said search in a relevance order comprises computing a relevance score for said returned documents and rank ordering the returned documents according to said relevance score.
7. A system for displaying text from a database related to a query, comprising:
a computer coupled to an input/output device for generating a query;
said computer coupled to a disk storage unit, prior to communicating with a database, said computer automatically selects at least one of a plurality of databases related to said query from a list stored in computer memory, wherein said list includes a list of a plurality of databases, corresponding addresses for said databases, and a description of said databases;
said computer automatically searches said selected database(s) for documents related to said query;
said computer organizes documents returned from said search in a relevance order corresponding to the relevance of said returned documents to said query; and
said computer coupled to a display unit for displaying portions of text related to said query from a plurality of said returned documents in said relevance order.
8. A system for displaying text from a database related to a query, as in claim 7, wherein:
said query is generated by selecting a region of text from a document.
9. A system for displaying text from a database related to a query, as in claim 8, wherein:
said document incorporating said selected region of text is stored in a database that is not related to said query.
10. A system for displaying text from a database related to a query, as in claim 8, wherein:
said organization of said returned documents computes a relevance score for said returned documents and rank orders the returned documents according to said relevance score.
11. A system for displaying text from a database related to a query, as in claim 8, wherein:
said automatic searching of the selected database compares document text of said selected database to boolean combinations of keywords.
12. A system for displaying text from a database related to a query, as in claim 11, wherein:
said organization of said returned documents computes a relevance score for said returned documents and rank orders the returned documents according to said relevance score.
13. A computer readable medium bearing sequences of instructions for searching database, said sequences of instructions comprising:
generating a query;
prior to communicating with a database, automatically selecting at least one of a plurality of databases related to said query from a list stored in computer memory, wherein said list includes a list of a plurality of databases, corresponding addresses for said databases, and a description of said databases;
automatically searching the selected database(s) for documents related to said query;
organizing documents returned from said search in a relevance order corresponding to the relevance of said returned documents to said query; and
displaying portions of text related to said query from a plurality of said returned documents in said relevance order.
14. The computer readable medium of claim 13, wherein said sequence of instructions for generating a query includes:
selecting a region of text from a document to generate said query.
15. The computer readable medium of claim 14, wherein said sequence of instructions further comprises:
storing said document incorporating said selected region in a database that is not related to said query.
16. The computer readable medium of claim 14, wherein said sequence of instructions for organizing documents returned in said relevance order includes:
computing a relevance score for said returned documents and ordering the returned documents according to said relevance score.
17. The computer readable medium of claim 14, wherein said sequence of instructions for automatically searching the selected database includes:
comparing document text of said selected databases to boolean combinations of keywords in said query.
18. The computer readable medium of claim 17, wherein said sequence of instructions for organizing documents returned in said relevance order includes:
computing a relevance score for said returned documents and ordering the returned documents according to said relevance score.
US10/387,747 1997-07-25 2003-03-13 Displaying portions of text from multiple documents over multiple database related to a search query in a computer network Abandoned US20030225757A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/387,747 US20030225757A1 (en) 1997-07-25 2003-03-13 Displaying portions of text from multiple documents over multiple database related to a search query in a computer network

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US08/900,639 US5926808A (en) 1997-07-25 1997-07-25 Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US29584099A 1999-04-21 1999-04-21
US10/387,747 US20030225757A1 (en) 1997-07-25 2003-03-13 Displaying portions of text from multiple documents over multiple database related to a search query in a computer network

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US29584099A Continuation 1997-07-25 1999-04-21

Publications (1)

Publication Number Publication Date
US20030225757A1 true US20030225757A1 (en) 2003-12-04

Family

ID=25412850

Family Applications (2)

Application Number Title Priority Date Filing Date
US08/900,639 Expired - Fee Related US5926808A (en) 1997-07-25 1997-07-25 Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US10/387,747 Abandoned US20030225757A1 (en) 1997-07-25 2003-03-13 Displaying portions of text from multiple documents over multiple database related to a search query in a computer network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US08/900,639 Expired - Fee Related US5926808A (en) 1997-07-25 1997-07-25 Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network

Country Status (2)

Country Link
US (2) US5926808A (en)
JP (1) JPH11102376A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050210003A1 (en) * 2004-03-17 2005-09-22 Yih-Kuen Tsay Sequence based indexing and retrieval method for text documents
US20060004724A1 (en) * 2004-06-03 2006-01-05 Oki Electric Industry Co., Ltd. Information-processing system, information-processing method and information-processing program
US20070192442A1 (en) * 2001-07-24 2007-08-16 Brightplanet Corporation System and method for efficient control and capture of dynamic database content
US20090064104A1 (en) * 2007-08-31 2009-03-05 Tom Baeyens Method and apparatus for supporting multiple business process languages in BPM
US20090063225A1 (en) * 2007-08-31 2009-03-05 Tom Baeyens Tool for automated transformation of a business process definition into a web application package
US20090070362A1 (en) * 2007-09-12 2009-03-12 Alejandro Guizar BPM system portable across databases
US20090144729A1 (en) * 2007-11-30 2009-06-04 Alejandro Guizar Portable business process deployment model across different application servers
US7908260B1 (en) 2006-12-29 2011-03-15 BrightPlanet Corporation II, Inc. Source editing, internationalization, advanced configuration wizard, and summary page selection for information automation systems
US20120173566A1 (en) * 2010-12-31 2012-07-05 Quora, Inc. Multi-functional navigation bar
US20130097494A1 (en) * 2011-10-17 2013-04-18 Xerox Corporation Method and system for visual cues to facilitate navigation through an ordered set of documents
US8914804B2 (en) 2007-09-12 2014-12-16 Red Hat, Inc. Handling queues associated with web services of business processes
US10816623B2 (en) 2013-05-22 2020-10-27 General Electric Company System and method for reducing acoustic noise level in MR imaging

Families Citing this family (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822720A (en) 1994-02-16 1998-10-13 Sentius Corporation System amd method for linking streams of multimedia data for reference material for display
US6154757A (en) * 1997-01-29 2000-11-28 Krause; Philip R. Electronic text reading environment enhancement method and apparatus
US5926808A (en) * 1997-07-25 1999-07-20 Claritech Corporation Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US6278990B1 (en) * 1997-07-25 2001-08-21 Claritech Corporation Sort system for text retrieval
US6353824B1 (en) * 1997-11-18 2002-03-05 Apple Computer, Inc. Method for dynamic presentation of the contents topically rich capsule overviews corresponding to the plurality of documents, resolving co-referentiality in document segments
US6209007B1 (en) * 1997-11-26 2001-03-27 International Business Machines Corporation Web internet screen customizing system
JP3571201B2 (en) * 1997-12-12 2004-09-29 富士通株式会社 Database search device and computer-readable recording medium storing database search program
JP3571515B2 (en) * 1997-12-19 2004-09-29 富士通株式会社 Computer-readable storage medium storing a knowledge collection / storage / retrieval program
US20080028292A1 (en) * 1997-12-22 2008-01-31 Ricoh Company, Ltd. Techniques to facilitate reading of a document
JP4286345B2 (en) * 1998-05-08 2009-06-24 株式会社リコー Search support system and computer-readable recording medium
US7272604B1 (en) * 1999-09-03 2007-09-18 Atle Hedloy Method, system and computer readable medium for addressing handling from an operating system
NO984066L (en) * 1998-09-03 2000-03-06 Arendi As Computer function button
US7496854B2 (en) * 1998-11-10 2009-02-24 Arendi Holding Limited Method, system and computer readable medium for addressing handling from a computer program
US6582475B2 (en) * 1998-09-09 2003-06-24 Ricoh Company Limited Automatic adaptive document printing help system
IL126373A (en) 1998-09-27 2003-06-24 Haim Zvi Melman Apparatus and method for search and retrieval of documents
US7228492B1 (en) * 1999-07-06 2007-06-05 Ricoh Company, Ltd. 2D graph displaying document locations of user-specified concept of interest
US7013300B1 (en) * 1999-08-03 2006-03-14 Taylor David C Locating, filtering, matching macro-context from indexed database for searching context where micro-context relevant to textual input by user
US7219073B1 (en) * 1999-08-03 2007-05-15 Brandnamestores.Com Method for extracting information utilizing a user-context-based search engine
US6775665B1 (en) * 1999-09-30 2004-08-10 Ricoh Co., Ltd. System for treating saved queries as searchable documents in a document management system
US7127500B1 (en) * 1999-11-10 2006-10-24 Oclc Online Computer Library Center, Inc. Retrieval of digital objects by redirection of controlled vocabulary searches
AU7339700A (en) * 1999-11-16 2001-05-30 Searchcraft Corporation Method for searching from a plurality of data sources
KR100362381B1 (en) * 1999-12-27 2002-11-23 한국전자통신연구원 Web filtering system and method using thereof in internet
US7567958B1 (en) 2000-04-04 2009-07-28 Aol, Llc Filtering system for providing personalized information in the absence of negative data
JP2001318948A (en) * 2000-05-09 2001-11-16 Hitachi Ltd Method and device for retrieving document and medium having processing program for the method stored thereon
JP2002024243A (en) * 2000-07-07 2002-01-25 Shimadzu Corp Scientific information browse system and host computer and browsing computer used for the same
US6691107B1 (en) * 2000-07-21 2004-02-10 International Business Machines Corporation Method and system for improving a text search
KR20020010226A (en) * 2000-07-28 2002-02-04 정명수 Internet Anything Response System
KR20000063712A (en) * 2000-07-31 2000-11-06 강원식 Real time question & answer method and it's system using by Internet.
KR20000064067A (en) * 2000-08-18 2000-11-06 서영호 Business Solution on Customer Support Service Response of Web based
KR20020050401A (en) * 2000-12-21 2002-06-27 한문철 Method of answering questions concerning traffic accidents
US7013312B2 (en) * 2001-06-21 2006-03-14 International Business Machines Corporation Web-based strategic client planning system for end-user creation of queries, reports and database updates
US7130861B2 (en) 2001-08-16 2006-10-31 Sentius International Corporation Automated creation and delivery of database content
US8799489B2 (en) * 2002-06-27 2014-08-05 Siebel Systems, Inc. Multi-user system with dynamic data source selection
KR20040031990A (en) * 2002-10-08 2004-04-14 한국과학기술정보연구원 System and Method for finding the original, and Storage media having program source thereof
KR20040042927A (en) * 2002-11-14 2004-05-22 주식회사 드리머 Information searching service method using short message service and thereof
US8095500B2 (en) * 2003-06-13 2012-01-10 Brilliant Digital Entertainment, Inc. Methods and systems for searching content in distributed computing networks
US7729992B2 (en) 2003-06-13 2010-06-01 Brilliant Digital Entertainment, Inc. Monitoring of computer-related resources and associated methods and systems for disbursing compensation
US20060168012A1 (en) * 2004-11-24 2006-07-27 Anthony Rose Method and system for electronic messaging via distributed computing networks
US20070094308A1 (en) * 2004-12-30 2007-04-26 Ncr Corporation Maintaining synchronization among multiple active database systems
US20070094237A1 (en) * 2004-12-30 2007-04-26 Ncr Corporation Multiple active database systems
US20070208753A1 (en) * 2004-12-30 2007-09-06 Ncr Corporation Routing database requests among multiple active database systems
US20060149707A1 (en) * 2004-12-30 2006-07-06 Mitchell Mark A Multiple active database systems
US7567990B2 (en) * 2004-12-30 2009-07-28 Teradata Us, Inc. Transfering database workload among multiple database systems
US20070174349A1 (en) * 2004-12-30 2007-07-26 Ncr Corporation Maintaining consistent state information among multiple active database systems
US8027876B2 (en) 2005-08-08 2011-09-27 Yoogli, Inc. Online advertising valuation apparatus and method
US8429167B2 (en) 2005-08-08 2013-04-23 Google Inc. User-context-based search engine
WO2007032095A1 (en) * 2005-09-16 2007-03-22 Bits Co., Ltd. Document data managing method, managing system, and computer software
US20070067849A1 (en) * 2005-09-21 2007-03-22 Jung Edward K Reviewing electronic communications for possible restricted content
US8214394B2 (en) 2006-03-01 2012-07-03 Oracle International Corporation Propagating user identities in a secure federated search system
US8027982B2 (en) * 2006-03-01 2011-09-27 Oracle International Corporation Self-service sources for secure search
US7941419B2 (en) * 2006-03-01 2011-05-10 Oracle International Corporation Suggested content with attribute parameterization
US8875249B2 (en) * 2006-03-01 2014-10-28 Oracle International Corporation Minimum lifespan credentials for crawling data repositories
US9177124B2 (en) * 2006-03-01 2015-11-03 Oracle International Corporation Flexible authentication framework
US8433712B2 (en) * 2006-03-01 2013-04-30 Oracle International Corporation Link analysis for enterprise environment
US8707451B2 (en) 2006-03-01 2014-04-22 Oracle International Corporation Search hit URL modification for secure application integration
US8868540B2 (en) * 2006-03-01 2014-10-21 Oracle International Corporation Method for suggesting web links and alternate terms for matching search queries
US8332430B2 (en) * 2006-03-01 2012-12-11 Oracle International Corporation Secure search performance improvement
US20070214129A1 (en) * 2006-03-01 2007-09-13 Oracle International Corporation Flexible Authorization Model for Secure Search
US8005816B2 (en) * 2006-03-01 2011-08-23 Oracle International Corporation Auto generation of suggested links in a search system
US7996392B2 (en) 2007-06-27 2011-08-09 Oracle International Corporation Changing ranking algorithms based on customer settings
US8316007B2 (en) * 2007-06-28 2012-11-20 Oracle International Corporation Automatically finding acronyms and synonyms in a corpus
JP5376625B2 (en) * 2008-08-05 2013-12-25 学校法人東京電機大学 Iterative fusion search method in search system
US9092517B2 (en) * 2008-09-23 2015-07-28 Microsoft Technology Licensing, Llc Generating synonyms based on query log data
US8719249B2 (en) * 2009-05-12 2014-05-06 Microsoft Corporation Query classification
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
US9229924B2 (en) 2012-08-24 2016-01-05 Microsoft Technology Licensing, Llc Word detection and domain dictionary recommendation
US20150142444A1 (en) * 2013-11-15 2015-05-21 International Business Machines Corporation Audio rendering order for text sources
US9740748B2 (en) 2014-03-19 2017-08-22 International Business Machines Corporation Similarity and ranking of databases based on database metadata
US9230028B1 (en) * 2014-06-18 2016-01-05 Fmr Llc Dynamic search service
CN109376174B (en) * 2018-12-30 2021-04-27 北京奇艺世纪科技有限公司 Method and device for selecting database

Citations (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5050071A (en) * 1988-11-04 1991-09-17 Harris Edward S Text retrieval method for texts created by external application programs
US5263159A (en) * 1989-09-20 1993-11-16 International Business Machines Corporation Information retrieval based on rank-ordered cumulative query scores calculated from weights of all keywords in an inverted index file for minimizing access to a main database
US5265065A (en) * 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US5280573A (en) * 1989-03-14 1994-01-18 Sharp Kabushiki Kaisha Document processing support system using keywords to retrieve explanatory information linked together by correlative arcs
US5379366A (en) * 1993-01-29 1995-01-03 Noyes; Dallas B. Method for representation of knowledge in a computer as a network database system
US5454105A (en) * 1989-06-14 1995-09-26 Hitachi, Ltd. Document information search method and system
US5465353A (en) * 1994-04-01 1995-11-07 Ricoh Company, Ltd. Image matching and retrieval by multi-access redundant hashing
US5488725A (en) * 1991-10-08 1996-01-30 West Publishing Company System of document representation retrieval by successive iterated probability sampling
US5598557A (en) * 1992-09-22 1997-01-28 Caere Corporation Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
US5600835A (en) * 1993-08-20 1997-02-04 Canon Inc. Adaptive non-literal text string retrieval
US5603025A (en) * 1994-07-29 1997-02-11 Borland International, Inc. Methods for hypertext reporting in a relational database management system
US5634051A (en) * 1993-10-28 1997-05-27 Teltech Resource Network Corporation Information management system
US5671404A (en) * 1994-03-31 1997-09-23 Martin Lizee System for querying databases automatically
US5675788A (en) * 1995-09-15 1997-10-07 Infonautics Corp. Method and apparatus for generating a composite document on a selected topic from a plurality of information sources
US5717913A (en) * 1995-01-03 1998-02-10 University Of Central Florida Method for detecting and extracting text data using database schemas
US5721906A (en) * 1994-03-24 1998-02-24 Ncr Corporation Multiple repositories of computer resources, transparent to user
US5724571A (en) * 1995-07-07 1998-03-03 Sun Microsystems, Inc. Method and apparatus for generating query responses in a computer-based document retrieval system
US5742816A (en) * 1995-09-15 1998-04-21 Infonautics Corporation Method and apparatus for identifying textual documents and multi-mediafiles corresponding to a search topic
US5748954A (en) * 1995-06-05 1998-05-05 Carnegie Mellon University Method for searching a queued and ranked constructed catalog of files stored on a network
US5761497A (en) * 1993-11-22 1998-06-02 Reed Elsevier, Inc. Associative text search and retrieval system that calculates ranking scores and window scores
US5802515A (en) * 1996-06-11 1998-09-01 Massachusetts Institute Of Technology Randomized query generation and document relevance ranking for robust information retrieval from a database
US5826260A (en) * 1995-12-11 1998-10-20 International Business Machines Corporation Information retrieval system and method for displaying and ordering information based on query element contribution
US5826261A (en) * 1996-05-10 1998-10-20 Spencer; Graham System and method for querying multiple, distributed databases by selective sharing of local relative significance information for terms related to the query
US5848373A (en) * 1994-06-24 1998-12-08 Delorme Publishing Company Computer aided map location system
US5857184A (en) * 1996-05-03 1999-01-05 Walden Media, Inc. Language and method for creating, organizing, and retrieving data from a database
US5859972A (en) * 1996-05-10 1999-01-12 The Board Of Trustees Of The University Of Illinois Multiple server repository and multiple server remote application virtual client computer
US5864845A (en) * 1996-06-28 1999-01-26 Siemens Corporate Research, Inc. Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy
US5873107A (en) * 1996-03-29 1999-02-16 Apple Computer, Inc. System for automatically retrieving information relevant to text being authored
US5903890A (en) * 1996-03-05 1999-05-11 Sofmap Future Design, Inc. Database systems having single-association structures
US5907840A (en) * 1997-07-25 1999-05-25 Claritech Corporation Overlapping subdocuments in a vector space search process
US5920854A (en) * 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
US5920856A (en) * 1997-06-09 1999-07-06 Xerox Corporation System for selecting multimedia databases over networks
US5926808A (en) * 1997-07-25 1999-07-20 Claritech Corporation Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US5933822A (en) * 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
US5983221A (en) * 1998-01-13 1999-11-09 Wordstream, Inc. Method and apparatus for improved document searching
US6009442A (en) * 1997-10-08 1999-12-28 Caere Corporation Computer-based document management system
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US6078914A (en) * 1996-12-09 2000-06-20 Open Text Corporation Natural language meta-search system and method

Patent Citations (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5050071A (en) * 1988-11-04 1991-09-17 Harris Edward S Text retrieval method for texts created by external application programs
US5280573A (en) * 1989-03-14 1994-01-18 Sharp Kabushiki Kaisha Document processing support system using keywords to retrieve explanatory information linked together by correlative arcs
US5454105A (en) * 1989-06-14 1995-09-26 Hitachi, Ltd. Document information search method and system
US5263159A (en) * 1989-09-20 1993-11-16 International Business Machines Corporation Information retrieval based on rank-ordered cumulative query scores calculated from weights of all keywords in an inverted index file for minimizing access to a main database
US5488725A (en) * 1991-10-08 1996-01-30 West Publishing Company System of document representation retrieval by successive iterated probability sampling
US5265065A (en) * 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US5598557A (en) * 1992-09-22 1997-01-28 Caere Corporation Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
US5379366A (en) * 1993-01-29 1995-01-03 Noyes; Dallas B. Method for representation of knowledge in a computer as a network database system
US5600835A (en) * 1993-08-20 1997-02-04 Canon Inc. Adaptive non-literal text string retrieval
US5634051A (en) * 1993-10-28 1997-05-27 Teltech Resource Network Corporation Information management system
US5761497A (en) * 1993-11-22 1998-06-02 Reed Elsevier, Inc. Associative text search and retrieval system that calculates ranking scores and window scores
US5721906A (en) * 1994-03-24 1998-02-24 Ncr Corporation Multiple repositories of computer resources, transparent to user
US5671404A (en) * 1994-03-31 1997-09-23 Martin Lizee System for querying databases automatically
US5465353A (en) * 1994-04-01 1995-11-07 Ricoh Company, Ltd. Image matching and retrieval by multi-access redundant hashing
US5848373A (en) * 1994-06-24 1998-12-08 Delorme Publishing Company Computer aided map location system
US5603025A (en) * 1994-07-29 1997-02-11 Borland International, Inc. Methods for hypertext reporting in a relational database management system
US5717913A (en) * 1995-01-03 1998-02-10 University Of Central Florida Method for detecting and extracting text data using database schemas
US5748954A (en) * 1995-06-05 1998-05-05 Carnegie Mellon University Method for searching a queued and ranked constructed catalog of files stored on a network
US5724571A (en) * 1995-07-07 1998-03-03 Sun Microsystems, Inc. Method and apparatus for generating query responses in a computer-based document retrieval system
US5675788A (en) * 1995-09-15 1997-10-07 Infonautics Corp. Method and apparatus for generating a composite document on a selected topic from a plurality of information sources
US5742816A (en) * 1995-09-15 1998-04-21 Infonautics Corporation Method and apparatus for identifying textual documents and multi-mediafiles corresponding to a search topic
US5826260A (en) * 1995-12-11 1998-10-20 International Business Machines Corporation Information retrieval system and method for displaying and ordering information based on query element contribution
US5903890A (en) * 1996-03-05 1999-05-11 Sofmap Future Design, Inc. Database systems having single-association structures
US5873107A (en) * 1996-03-29 1999-02-16 Apple Computer, Inc. System for automatically retrieving information relevant to text being authored
US5857184A (en) * 1996-05-03 1999-01-05 Walden Media, Inc. Language and method for creating, organizing, and retrieving data from a database
US5859972A (en) * 1996-05-10 1999-01-12 The Board Of Trustees Of The University Of Illinois Multiple server repository and multiple server remote application virtual client computer
US5826261A (en) * 1996-05-10 1998-10-20 Spencer; Graham System and method for querying multiple, distributed databases by selective sharing of local relative significance information for terms related to the query
US5802515A (en) * 1996-06-11 1998-09-01 Massachusetts Institute Of Technology Randomized query generation and document relevance ranking for robust information retrieval from a database
US5864845A (en) * 1996-06-28 1999-01-26 Siemens Corporate Research, Inc. Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy
US5920854A (en) * 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
US6078914A (en) * 1996-12-09 2000-06-20 Open Text Corporation Natural language meta-search system and method
US5920856A (en) * 1997-06-09 1999-07-06 Xerox Corporation System for selecting multimedia databases over networks
US6012053A (en) * 1997-06-23 2000-01-04 Lycos, Inc. Computer system with user-controlled relevance ranking of search results
US5933822A (en) * 1997-07-22 1999-08-03 Microsoft Corporation Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision
US5907840A (en) * 1997-07-25 1999-05-25 Claritech Corporation Overlapping subdocuments in a vector space search process
US5926808A (en) * 1997-07-25 1999-07-20 Claritech Corporation Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US6009442A (en) * 1997-10-08 1999-12-28 Caere Corporation Computer-based document management system
US5983221A (en) * 1998-01-13 1999-11-09 Wordstream, Inc. Method and apparatus for improved document searching

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676555B2 (en) 2001-07-24 2010-03-09 Brightplanet Corporation System and method for efficient control and capture of dynamic database content
US20070192442A1 (en) * 2001-07-24 2007-08-16 Brightplanet Corporation System and method for efficient control and capture of dynamic database content
US8380735B2 (en) 2001-07-24 2013-02-19 Brightplanet Corporation II, Inc System and method for efficient control and capture of dynamic database content
US20050210003A1 (en) * 2004-03-17 2005-09-22 Yih-Kuen Tsay Sequence based indexing and retrieval method for text documents
US20060004724A1 (en) * 2004-06-03 2006-01-05 Oki Electric Industry Co., Ltd. Information-processing system, information-processing method and information-processing program
US7908260B1 (en) 2006-12-29 2011-03-15 BrightPlanet Corporation II, Inc. Source editing, internationalization, advanced configuration wizard, and summary page selection for information automation systems
US20090063225A1 (en) * 2007-08-31 2009-03-05 Tom Baeyens Tool for automated transformation of a business process definition into a web application package
US20090064104A1 (en) * 2007-08-31 2009-03-05 Tom Baeyens Method and apparatus for supporting multiple business process languages in BPM
US8423955B2 (en) 2007-08-31 2013-04-16 Red Hat, Inc. Method and apparatus for supporting multiple business process languages in BPM
US9058571B2 (en) 2007-08-31 2015-06-16 Red Hat, Inc. Tool for automated transformation of a business process definition into a web application package
US20090070362A1 (en) * 2007-09-12 2009-03-12 Alejandro Guizar BPM system portable across databases
US8825713B2 (en) * 2007-09-12 2014-09-02 Red Hat, Inc. BPM system portable across databases
US8914804B2 (en) 2007-09-12 2014-12-16 Red Hat, Inc. Handling queues associated with web services of business processes
US20090144729A1 (en) * 2007-11-30 2009-06-04 Alejandro Guizar Portable business process deployment model across different application servers
US8954952B2 (en) 2007-11-30 2015-02-10 Red Hat, Inc. Portable business process deployment model across different application servers
US20120173566A1 (en) * 2010-12-31 2012-07-05 Quora, Inc. Multi-functional navigation bar
US20130097494A1 (en) * 2011-10-17 2013-04-18 Xerox Corporation Method and system for visual cues to facilitate navigation through an ordered set of documents
US8881007B2 (en) * 2011-10-17 2014-11-04 Xerox Corporation Method and system for visual cues to facilitate navigation through an ordered set of documents
US10816623B2 (en) 2013-05-22 2020-10-27 General Electric Company System and method for reducing acoustic noise level in MR imaging

Also Published As

Publication number Publication date
US5926808A (en) 1999-07-20
JPH11102376A (en) 1999-04-13

Similar Documents

Publication Publication Date Title
US5926808A (en) Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US6523030B1 (en) Sort system for merging database entries
US7587387B2 (en) User interface for facts query engine with snippets from information sources that include query terms and answer terms
US6701310B1 (en) Information search device and information search method using topic-centric query routing
US6205443B1 (en) Overlapping subdocuments in a vector space search process
US6286000B1 (en) Light weight document matcher
US6947920B2 (en) Method and system for response time optimization of data query rankings and retrieval
US7398201B2 (en) Method and system for enhanced data searching
US7725424B1 (en) Use of generalized term frequency scores in information retrieval systems
US6725217B2 (en) Method and system for knowledge repository exploration and visualization
US6564210B1 (en) System and method for searching databases employing user profiles
US6385602B1 (en) Presentation of search results using dynamic categorization
US5920859A (en) Hypertext document retrieval system and method
US8285724B2 (en) System and program for handling anchor text
US7283997B1 (en) System and method for ranking the relevance of documents retrieved by a query
US6446066B1 (en) Method and apparatus using run length encoding to evaluate a database
US20080027918A1 (en) Method of generating a distributed text index for parallel query processing
US6505198B2 (en) Sort system for text retrieval
US20040015485A1 (en) Method and apparatus for improved internet searching
US20020040363A1 (en) Automatic hierarchy based classification
WO2008127263A1 (en) Methods and systems for formulating and executing concept-structured queries of unorganized data
Attardi et al. Theseus: categorization by context
US6473755B2 (en) Overlapping subdocuments in a vector space search process
Yang et al. Dynamic clustering of web search results
Ferragina et al. The anatomy of a Clustering Engine for Web Snippets

Legal Events

Date Code Title Description
AS Assignment

Owner name: CLARITECH CORPORATION, PENNSYLVANIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EVANS, DAVID A.;MCINERY, MICHAEL J.;REEL/FRAME:013867/0230

Effective date: 19970723

AS Assignment

Owner name: CLAIRVOYANCE CORPORATION, PENNSYLVANIA

Free format text: CHANGE OF NAME;ASSIGNOR:CLARITECH CORPORATION;REEL/FRAME:018446/0708

Effective date: 20000621

AS Assignment

Owner name: CLAIRVOYANCE CORPORATION, PENNSYLVANIA

Free format text: CHANGE OF NAME;ASSIGNOR:CLARITECH CORPORATION;REEL/FRAME:020493/0569

Effective date: 20000621

AS Assignment

Owner name: JUSTSYSTEMS EVANS RESEARCH INC., PENNSYLVANIA

Free format text: CHANGE OF NAME;ASSIGNOR:CLAIRVOYANCE CORPORATION;REEL/FRAME:020571/0270

Effective date: 20070316

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION