WO1995002221A1 - Case-based organizing and querying of a database - Google Patents

Case-based organizing and querying of a database Download PDF

Info

Publication number
WO1995002221A1
WO1995002221A1 PCT/US1994/007569 US9407569W WO9502221A1 WO 1995002221 A1 WO1995002221 A1 WO 1995002221A1 US 9407569 W US9407569 W US 9407569W WO 9502221 A1 WO9502221 A1 WO 9502221A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
objects
database
cluster
hits
Prior art date
Application number
PCT/US1994/007569
Other languages
French (fr)
Inventor
Bradley P. Allen
David J. Lee
Roger D. Carasso
John R. Perry
Original Assignee
Inference Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inference Corporation filed Critical Inference Corporation
Priority to AU73236/94A priority Critical patent/AU7323694A/en
Publication of WO1995002221A1 publication Critical patent/WO1995002221A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • This invention relates to case-based organizing and querying of a database.
  • Prior art methods of retrieving information generally require preparation of a query, in which objects to be searched for are described in some formal manner. This imposes additional effort on the searcher, and generally also requires that the searcher be familiar with the subject matter to be searched, with the organization and indexing of the database, and with a formal query language. Accordingly, it would be advantageous for the searcher to be able to describe the query in a natural and relatively informal or unstructured manner, such as a description in a natural language.
  • the response may be organized by quality of match. In another aspect, the response may be organized into clusters of related objects.
  • the invention provides a system for case-based organizing and querying of a database.
  • the database may comprise a set of objects, such as a set of documents including text.
  • the database may be organized by examining each object and associating that object with a set of property values, such as (in the case of text documents) a set of keywords or other indicators of content.
  • a document may be associated with those words which appear more frequently in the document than in the database at large, or which appear in early text of the document, or which appear in a title.
  • the system may be responsive to a query by associating the query with a similar set of property values and performing case-based matching or other fuzzy associative matching on the objects of the database for objects which are similar.
  • the query may be natural-language text and may be associated with keywords or other indicators of its content.
  • the system may present matched objects in response to the query, may respond to iterative refinement of the query (in similar manner to iterative case-based methods shown in those co-pending applications which have been incorporated by reference) , and may order matched objects by quality of match.
  • the system may also examine the collection of matched objects and organize them for presentation ; for example, the system may group matched objects into clusters of objects which have similar properties, which relate to similar content, or which have similar likelihood to be of relevance to the query or of interest to an operator posing the query.
  • the system may respond to the result of organizing matched objects for presentation with suggestions for iterative refinement of the query.
  • the system may therefore be capable of producing improved recall and precision over prior art techniques.
  • Figure 1 shows a block diagram of a database explorer and filter system.
  • Figure 2 shows a data flow diagram of a method of filtering documents.
  • Figure 3 shows a data flow diagram of a method of processing queries.
  • Figure 4 shows a data flow diagram of a method of processing hit tables.
  • Figure 5 shows a process flow diagram of a method of clustering hit tables.
  • Figure 6 shows an example explorer user interface screen as viewed by an operator.
  • Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
  • Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
  • Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
  • Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta” from Microsoft Corporation of Redmond, Washington.
  • the invention may operate in conjunction with a computing system, including a processor and a memory, generally configured as is well known in the art; the memory may include primary memory for stored programs and for data and secondary memory for extensive storage of large numbers of objects.
  • the memory may comprise a sizable database of objects, as is well known in the art of databases, and such objects may comprise various types of computing and data-storage structures.
  • the database may be a relational database, an unstructured collection of objects, or some other database format.
  • Such other types of objects may include source code, object code, binary values, numeric values, text or other symbolic values, representations of sound and/or picture signals or other signals, multimedia, data structures for rule-based or case-based systems, artificial neural networks, linked data structures such as linked lists, mathematical structures such as equations, polynomials, matrices or tensors, and other data types known in at least one of the many fields of computing.
  • Figure 1 shows a block diagram of a database explorer and filter system.
  • a system 101 for case-based organizing and querying of a database 102 may comprise a filter 103, for organizing the database 102 so as to be responsive to a query 104, an explorer 105, for selecting a set of objects 106 in the database 102 which are responsive to that query 104, and an object file system 107, for accessing the database 102.
  • the database 102 may generally be of a type which is known in the art, such as a collection of text objects supported by Cairo Milestone 4 running under the Windows NT system version 297, available from Microsoft Corporation of Redmond, Washington, and may be accessed in conjunction with the object file system 107 of that product.
  • the filter 103 may operate at an initialization time, such as when the processor is first started or before the first query 104 is presented to the explorer 105.
  • the filter 103 may also operate in an incremental mode, e.g., by updating its organization of the database 102 periodically, such as upon the passage of a fixed period of time, when a fixed number of objects 106 are changed or added to the database 102, when the operation of the explorer 105 is degraded below some predetermined level, when triggered by an operator 108 in conjunction with a user interface 109 (e.g., when a query is presented, by a specific command to do so, or as a side effect of another operation) , or otherwise as determined by the database 102 or an external manager.
  • the filter 103 may examine each of the objects 106 (or some predetermined subset of objects 106) in the database 102 and associate each object 106 it examines (or some predetermined subset of those objects 106) with a set of properties.
  • those properties may be keywords or phrases which are found in the object 106, but may also comprise other property values, such as the language the text is written in, the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
  • the objects 106 with their properties may be treated as a set of cases to be matched by a CBR engine 110 (operating with the object file system 107) with a test case generated from the query 104.
  • Each case may generally comprise an object 106 plus the properties that object 106 was associated with, e.g., key words and phrases found in that object.
  • these properties may include a lexicon of words and noun phrases found in the object 106, including at least some of these words labelled as a set of "header words" or "relevant words” .
  • the explorer 105 may generally operate at a question time, such as when one or more queries 104 is presented to the explorer 105.
  • the ej ⁇ lorer 105 may be invoked by the operator 108 in conjunction with the user interface 109, which user interface 109 may allow the operator to trigger operation of the explorer 105 and to present one or more queries 104 to the explorer 105.
  • the user interface 109 may be one such as the user interface presented by the Windows NT system referred to herein.
  • the operator 108 may be a human being, but those of ordinary skill with recognize, after perusal of the application, that the operator 108 may comprise a network connection, an external management program, or an Al program.
  • the explorer 105 may generate a response 111 including a set of matching cases (i.e., objects 106 with their properties) , which may be presented to the operator 108 by means of the user interface 109, such as the user interface presented by the Windows NT system referred to herein. I augmented by features described herein.
  • the filter 103 and the explorer 105 may operate in conjunction with the object file system 107 (and in particular the CBR engine 110 thereof) , which may respond to a set of properties formed into a vector query 112 directed at the database 102, and may return a hit table 113 of those objects 106 in the database 102 which have the indicated properties.
  • the CBR engine 110 may use case-based matching and other techniques such as those shown in those co- pending applications which have been incorporated by reference.
  • Figure 2 shows a data flow diagram of a method of filtering documents.
  • a document 201 (an object 106 which comprises text, such as a pure text document or a text document formatted for a word-processing program) may be input to the filter 103 for examination.
  • the filter 103 may process the text by a tag-and-segment-text process 202, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique.
  • the tag-and-segment-text process 202 may extract a set of single terms 203 and generate a set of header words 204 found in the document 201.
  • the header words 204 may comprise those words which occur in an initial part of the object 106, or in a title, subject line, topical paragraph, or abstract.
  • the header words 204 may comprise the first three things mentioned in the document 201.
  • the tag-and-segment-text process 202 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 205.
  • the sentences 205 may be input to an extract-noun-phrases process 206, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 207 and generate a lexicon 208 thereof.
  • the tag-and-segment-text process 202 may use a grammar of the English language, but other natural languages, and even formal specification languages such as programming languages, would also be suitable.
  • the tag-and-segment-text process 202 may also recognize and generate a set of proper nouns 209.
  • the set of proper nouns 209 may be determined by known rules, e.g., that proper nouns generally comprise strings of words each starting with an upper-case letter, or by reference to a dictionary of known proper names.
  • the set of proper nouns 209 may be input, along with at least some of the single terms 203, to a determine-relevant-words process 210, which may extract a set of relevant words 211.
  • the set of relevant words 211 may be determined with reference to the frequency of those words in the object 106 (with respect to the entire text found in the object 106) and with reference to the frequency of those words in the database 102, with respect to the text corpus of the database 102.
  • the ratio for each word (frequency in the object 106) divided by (frequency in the database 102) may be computed, and the set of relevant words 211 may comprise those words whose relative frequency exceeds a threshold, e.g., a predetermined threshold such as a 1:1 ratio.
  • the filter 103 is described herein for a specific set of properties of the text which may be extracted. However, it would be clear to those of ordinary skill, after perusal of this application, that extraction of other properties could be readily accomplished, and is within the scope and spirit of the invention. Such other properties could include the language the text is written in (or for English-language text, the number of foreign words used) , the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
  • the extract-noun-phrases process 206 and the determine-relevant-words process 211 may proceed in parallel, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
  • the filter 103 may mark each object 106 with the properties it determines (or alternatively may create a separate object 106 relating each documentary object 106 to its properties) , so that the object 106 and its properties may be treated as a case in a case-base.
  • the set of cases may be matched to a test case by a CBR engine 110, using techniques like those described in copending applications (1) Serial No. 07/ 664,561, filed March 4, 1991 in the name of inventors Bradley P. Allen and S. Daniel Lee, titled “CASE-BASED REASONING SYSTEM”; (2) Serial No. 07/ 869,935, filed April 15, 1992 in the name of inventor Bradley P.
  • Figure 3 shows a data flow diagram of a method of processing queries.
  • the query 104 entered in free text by the operator 108, may be input to the explorer 105 for examination.
  • the explorer 105 may process the text by a tag- and-segment-text process 301, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique, similarly to the tag-and-segment-text process 202 of the filter 103.
  • the tag-and-segment-text process 301 may extract a set of single terms 302, similarly to the tag-and-segment-text process 202 and the set of single terms 203 of the filter 103.
  • the tag-and-segment-text process 301 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 303, similarly to the tag-and-segment- text process 202 and the sentences 205 of the filter 103.
  • the sentences 303 may be input to an extract-noun-phrases process 304, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 305, similarly to the extract-noun-phrases process 206 and the noun phrases 207 of the filter 103.
  • the tag-and-segment-text process 301 may also recognize and generate a set of proper nouns 306, similarly to the tag-and- segment-text process 202 and the proper nouns 209 of the filter 103.
  • the noun phrases 305, single terms 302, and proper nouns 306, a rank threshold 307, and a set of selected subtopics 308 (subtopics selected by the operator 108 to refine the query 104) may be input to a generate-query process 309, which may generate a set of query terms 310 and a query parse tree 311.
  • the tag-and-segment-text process 301, the extract-noun-phrases process 304, and the generate-query process 309 may proceed as asynchronously as possible, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
  • the query terms 310 and the query parse tree 311 may be input to the CBR engine 110 in the object file system 107, and may perform case-based matching or other fuzzy associative matching on the objects 106 in the database 102 for objects which are similar to the query 104, as described by the query terms 310 and the query parse tree 311, and which have a match quality at least as good as the rank threshold 307. (As noted with regard to the user interface 109, the selected subtopics 308 are added to the text of the query 104.)
  • the object file system 107 may generate the hit table 113 of matched objects 106.
  • Figure 4 shows a data flow diagram of a method of processing hit tables.
  • the hit table 113 and the relevant words 211 may be input to a cluster hits process 401, which (if clustering is enabled) collects the matched objects 106 into clusters, and may output a set of clusters 402 in response.
  • Each cluster 402 may comprise a set of objects 106, selected for collective closeness with regard to all objects 106 in the hit table 113.
  • the cluster hits process 401 is further described with regard to figure 5.
  • the hit table 113, the relevant words 211, and the lexicon 208 may be input to a first generate-topics (from relevant words) process 403, while the lexicon 208 and the query terms 310 may be input to a second generate-topics (from query words) process 403. Together the two generate-topics processes 403 may output a set of topics 404 and subtopics 405.
  • the generate-topics process 403 may examine the lexicon 208 of noun phrases 207 with a rule- based inference engine (not shown) .
  • a rule- based inference engine is the ART-IM system, available from Inference Corporation in El Segundo, California.
  • the inference engine may detect particular patterns in the noun phrases 207 which indicate semantic relations between the words in those noun phrases 207. For example, the noun phrase
  • the generate-topics process 403 may thus construct a phrase lattice., showing each noun phrase 207 as being inclusive of (above) , included in (below) , or incommensurate with (neither above nor below) each other noun phrase 207.
  • the generate-topics (from relevant words) process 403 may restrict the phrase lattice to those noun phrases 207 which include relevant words 211 of the objects 106 in the hit table 113.
  • the second generate-topics (from query words) process 403 may operate in similar manner as the first generate-topics (from relevant words) process 403 and may restrict the phrase lattice to those noun phrases 305 which include relevant words 211 of the query.
  • Figure 5 shows a process flow diagram of a method of clustering hit tables.
  • the cluster hits process 401 may operate by means of a genetic algorithm, in which an initial configuration and a set of genetic operators are specified, and the set of solutions is formed by simulation of random "evolution" of a population of possible solutions, using the method of steady-state reproduction without duplicates.
  • Genetic algorithms are well known in the art, and are described in further detail in "Foundations of Genetic Algorithms", ed. Gregory J.E. Rawlins (Morgan Kaufmann Publishers: San Mateo, California 1991). It would be clear to those of ordinary skill in the art that the parameters of the genetic algorithm, and even the type of genetic algorithm performed could be varied substantially and still remain within the scope and spirit of the invention.
  • a number of clusters 402 is selected.
  • the number of clusters 402 may vary from a known minimum to a known maximum, settable by the operator 108.
  • the genetic algorithm of the following steps is repeated for each permissible number of clusters 402, and the best solution adopted.
  • an initiate-clusters step 502 a set of possible clusters 402 is selected; this is a single "gene”. A random population of genes is selected-. Each cluster 402 is represented by the centroid of the objects 106 which would comprise that cluster 402. Thus, when a solution of clusters 402 is selected, each object 106 is assigned to the cluster 402 which it best matches.
  • the genetic algorithm of the following steps is repeated for a known period of time, settable by the operator 108.
  • the best available solution i.e., the gene with the best quality
  • Each object 106 is assigned to the cluster 402 to which it is the closest.
  • all genes in the population are evaluated for quality, and the gene with the least quality is removed.
  • the statistical measure "category utility" is computed; i.e., the utility of each cluster 402 in distinguishing between an object 106 in one cluster 402 from an object in another cluster 402.
  • matching for clusters 402 is performed using relevant words 211, it would be clear to those of ordinary skill, after perusal of this application, that other properties of the objects 106 could be used as well, such as the read/write date of the object 106, and that doing so would be within the scope and spirit of the invention.
  • a genetic-operator step 504 one of three operators is selected and employed to create a new gene: (1) Mutation-1. The new gene is randomly created. (2) Mutation-2. An existing gene is copied, except that one of its clusters 402 is mutated by replacing it with a randomly created cluster 402. (3) Crossover. Two genes have their n-tuples of clusters 402 paired off and one cluster 402 is selected at random from each pair to form the new gene. Alternatively, a new gene is created by selecting N clusters 402 at random from the 2N clusters 402 specified by the two old genes. USER INTERFACE
  • Figure 6 shows an example ej ⁇ lorer user interface screen as viewed by an operator. While the invention is described primarily with regard to a specific user interface, it would be clear to those of ordinary skill in the art that another user interface of equal or greater flexibility would be suitable, and would be within the scope and spirit of the invention.
  • the user interface 109 may be combined with a user interface for a generalized file system exploration program, such as in the Windows NT system referred to herein.
  • the user interface 109 may comprise a query window 601 in which the operator may enter the query 104 in free text, and a results window 602 in which the system 101 may display a set of matched objects 106 found in response to the query 104.
  • the operator 108 may enter the query 104 in the query window 601.
  • the query 104 is input to the explorer 105, which processes it as described herein, and generates the vector query 112.
  • the vector query 112 is input to the object file system 107, and generates the hit table 113 of matched objects 106.
  • the hit table 113 is input to the user interface 109, which displays the matched objects 106.
  • the operator may select a displayed matched object 106 to view its contents.
  • the user interface 109, the explorer 105, and the object file system 107 may operate as asynchronously as possible.
  • the object file system 107 may search the database 102 for matched objects 106 independently, once it has sufficient information from the ej ⁇ lorer 105; the user interface 109 may display matched objects 106 from the hit table 113 as they are generated by the object file system 107.
  • the operator 108 has entered the query 104 "who invented the light bulb?" in a content field 603 of the query window 601, and the system 101 has responded with a set of matched objects 106 in the results window 602.
  • the matched objects are displayed one per line, in columns labelled "rank”, “query”, “header”, and "relevant words”.
  • a rank field 604 displays the quality of match for each displayed matched object 106.
  • the system 101 may order the matched objects 106 by rank. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of a "sort" command 605 in the query window 601.
  • the rank field 604 may also be color-coded by value.
  • a query field 606 displays the relevant words of the query which are most related to the displayed matched object 106.
  • a header field 607 displays the header words 204 of the displayed matched object 106.
  • a relevant words field 608 displays the most common relevant words 211 of the displayed matched object 106.
  • a topics field 609 of the query window 601 displays suggested topics for refinement of the query 104 which the system 101 has identified.
  • the operator 108 may select a topic in the topics field 609, and the system will display a subtopics window 610 (overlaid on the query window 601 and the results window 602) showing the subtopics which the system 101 has identified for that topic.
  • the operator 108 may refine the query 104 in response to the matched objects 106, and the ej ⁇ lorer 105 may attempt to match objects 106 using the query 104 as refined. This may occur at the request of the operator 108, e.g., by means of a "refresh" command 611 in the query window 601.
  • the operator 108 may select one or more subtopics 405 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to with a pointing device such as a mouse) one or more subtopics 405 in the subtopics window 610. The selected subtopics 308 may be "added" to the query 104 and the explorer 105 may attempt to match objects 106 using the query 104 as refined.
  • the operator 108 may also select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g. by pointing to) the relevant words field 608 for a particular matched object 106 and "drag" that relevant words field 608 to the content field 603; the system 101 will display a relevance feedback window 612 (overlaid on the query window 601 and the results window 602) showing the relevant words 211 for that matched object 106.
  • the operator 108 may select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to) one or more relevant words 211 in the relevance feedback window 612. The selected relevant words 211 may be "added" to the query 104 and the ej ⁇ lorer 105 may attempt to match objects 106 using the query 104 as refined.
  • the query 104 as refined (like the original query 104) is presented as a vector query 104 to the CBR engine 110.
  • selected subtopics 308 or relevant words 211 are “added” to the query, they are properties which the CBR engine 110 must match to objects 106, as described for methods of iterative refinement of case-based matching shown in those co-pending applications which have been incorporated by reference. (Thus, the CBR engine 110 must match to objects 106 as if the operator 108 had answered a query refining question in a case-based system.)
  • a query 104 as refined may be further refined, allowing the operator to iteratively refine the query 104 until desired objects 106 are located.
  • Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
  • the operator 108 may select a "cluster" command (figure 6) or "uncluster” (figure 7) command 701 in the query window 601, and the system 101 will display a set of clusters 402, each a set of related matched objects 106, in place of displaying matched objects 106 themselves.
  • the operator has selected the "cluster" command 701 for the same query 104 as in the example of figure 6.
  • an expand field 702 displays whether the cluster 402 can be expanded (shown by a "+” symbol) to display individual matched objects 106, or can be collapsed (shown by a "-" symbol) to display a single identifier for the cluster 402.
  • the rank field 703 displays the best rank for all matched objects 106 in the cluster 402.
  • the system 101 may order the clusters 402 by this rank field 703. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of the "sort" command 605 in the query window 601.
  • this rank field 703 may also be color-coded by value.
  • the relevant words field 608 displays the most common relevant words 211 in the cluster 402.
  • the operator 108 may also choose to cluster all objects 106 in a specific set, e.g., a specific directory in the object file system 107.
  • the operator 108 may restrict the scope of the explorer 105 to a specific directory and issue the "cluster" command 701; the system 101 will display the objects 106 in that directory in clusters 402.
  • Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
  • the operator 108 may select settings appropriate for the system 101.
  • the operator 108 may select a "properties" command 801 in the query window 601 (figure 6) , and the system 101 will display a properties window 802 with a set of property values 803 which may be set.
  • a "minimum rank of returned hits" property 804 is a threshold value for including matched objects 106; matched objects 106 whose rank falls below this value are not displayed in the results window 602 and are not used in further processing.
  • the rank of a matched object 106 is calculated by the CBR engine 110. In the example, this value is set to 80.
  • a "maximum clustered hits" property 805 is a maximum number of matched objects 106 which are included in a single cluster 402. Those matched objects 106 not included in clusters 402 are placed in a special cluster 402 labelled "Other". In the example, this value is set to 400.
  • a "clustering time” property 806 is the elapsed real time devoted to clustering. In the example, this value is set to 2500 milliseconds.
  • a "minimum number of clusters" property 807 is the lower bound for the number of clusters 402 generated. In the example, this value is set to 2 clusters.
  • a "maximum number of clusters" property 808 is the upper bound for the number of clusters 402 generated. In the example, this value is set to 8 clusters. The system 101 attempts to generate a number of clusters 402 between the minimum and maximum number selected.
  • a "maximum topics” property 809 is the maximum number of topics displayed in the topics field 609 in the query window 601. In the example, this value is set to 7 topics.
  • a "maximum subtopics" property 810 is the maximum number of subtopics displayed in the subtopics window 610. In the example, this value is set to 250 subtopics.
  • a "do/don't cluster” property 811 sets whether or not clustering is performed. In the example, this value is set to YES.
  • a "do/don't generate query topics" property 812 sets whether or not topics and subtopics are generated in response to query terms 310. In the example, this value is set to YES.
  • a "do/don't generate salient topics” property 813 sets whether or not topics and subtopics are generated in response to relevant words 211. In the example, this value is set to YES.
  • a "boolean/vector query” property 814 sets whether the object file system 107 performs a boolean query or a vector query in response to the ej ⁇ lorer 105. In the example, this value is set to vector queries.
  • a boolean query would have boolean connectors (e.g., "AND”, "OR”) coupling the query terms 310, so that the query 104 would not be as flexibly matched. Search using boolean queries is well known in the art.
  • Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
  • Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta” from Microsoft Corporation of Redmond, Washington.
  • LDOCE is basically a dictionary of British English, so we found a lot of words we wasn't familiar with, as well as a lot of double entries to account for American spellings (e.g. color and colour) .
  • the lexical ⁇ categories we were able to extract out of LDOCE and WordNet were limited to nouns, verbs, adjectives, adverbs, conjunctions, determiners, predeterminers, prepositions, pronouns, and phrases. Since we don't use a phrasal lexicon, we threw the phrases away.
  • noun-phrase -> determiner noun-phrase (e.g. "The person)
  • noun-phrase -> quantifier noun-phrase e.g. "Three people”.
  • noun-phrase -> adverb noun-phrase e.g. "maddeningly fluffy clouds"
  • noun-phrase -> noun-phrase relative-clause (e.g. "The car that hit me)
  • noun-phrase -> noun-phrase [, noun-phrase]* [,] or noun-phrase e.g. "England, France, or Germany
  • the Find Taxonomic Relations process uses ART-IM rules to capture patterns of words which indicate taxonomic relationships between the words. For example, it detects patterns like:
  • NP such as (NP.) * ⁇ (and ⁇ or) ) NP
  • NP ⁇ , ⁇ including (NP,) * ⁇ (and ⁇ or) ⁇ NP
  • Clustering file afl. txt Non-empty clusters 5 Clusters : 5 I Hits Vals Seed, Value: Count
  • Marijuana Mixture, Leave, drugs, alcohols, syndromes, psycho Passes: 334, best pass.- 158, best score: 0.307, worst score: 0.132 Cluster 0, has 15 hits: '(OTHER), bloods, vitaminS, tissues, poisonS, suga
  • Thermometer, Instrument, Measure Wine, Beverage, Juice Wood, Substance, Trunk Cluster 1 has 22 hits: 'alcohol I, acid:7, ethyl:7, liquid: , examples, chemi Acetaldehyde, Volatile, Liquid Antifreeze, Chemica1, Substance Azeotropic Mixture, Solution, Ratio Butyl Alcohol, Chemical, Formula Cannizzaro, Stanislao, Italian Disease, Medicine, Health Ester, Chemistry, Compound Ether, Chemistry, Ethyl Fermentation, Chemical, Change Formaldehyde, Compound, Carbon Glycerin, Glycerol, C3h8o3 Gum, Substance, Plant Iodine, Element, Symbol Lipid, Group, Substance Salicylic Acid, White, Solid Solution, Chemistry, Mixture Tannin, Acid, Name Turpentine, Name, Semifluid Vinegar, Condiment, Preservative Wax, Name, Ester Whiskey, Liquor, Mash Zym
  • Vodka, Beverage, Known Cluster 3 has 6 hits: 'fuel:5, alcohols, methanolS, combustions, coals, en
  • Rocket, Term, Propulsion Cluster 4 has 4 hits: 'drugS, alcohols, syndromes, psychoactive drugs:2, ma
  • Cluster 0 has 9 hits: '(OTHER), plants, united statesS, seeds, gardenings,
  • Rhizome Stem, Organ.
  • Ray, Radiation, Wavelength Cluster 2 has 3 hits: 'lampS, glassS, neonS, arcS, bulbS, argonS, lights
  • Neon Lamp, Glass, Bulb Cluster 3 has 5 hits: 'bulb:5, liliaceae:4, herb , lily:3, pistilS, heights.
  • Tuberose, Herb, Polianth Cluster 4 has 6 hits: 'temperature:4, atmospheres, points, humidityS, bulb
  • Cluster 0 has 4 hits: '(OTHER), century:2'
  • Velzquez, Diego, Soldier Cluster 2 has 5 hits: 'spanish:4, island:3, spain:2, de:2, Christopher columbu
  • Cluster 1 has 5 hits: 'mind:5, philosophe , philosophy:3, matters, universe
  • Clustering file israel.txt Non-empty clusters: 4 Clusters: 4 II Hits Vals Seed, Value:Count
  • Cluster 0 has 22 hits: '(OTHER), governments, war:4, centuryS, french revolut Achille Lauro, Italian, Cruise Anti-semi ism, Social, Agitation Asia, Continent, Island Assyria, Ashur, Ashshur Bahai, Persian, Glory Buber, Martin, Religious Cabala, Hebrew, tradition Crusade, Expedition, Undertaken Eschatology, Discourse, Last Espionage, Collection, Information Iran, Islamic Republic, Republic Jewish Art, Architect c Jew Jewish Music, Religic o , Music Nationalalism, History, Movement Portuguese Literature, Literature, Portuguese Refugee, Person, Country Romania, Republic, Europe Saudi Arabia, Monarchy, Southwest Asia
  • Clustering file marx.txt Non-empty clusters: 6 Clusters: 6 ⁇ Hits Vals Seed, Value:Count
  • Marx Brothers, 20th-century, Comedian Cluster 4 has 4 hits: 'capitalists, class:3, appreciation:2, communist:2, firmly
  • Marx, Karl, German Cluster 5 has 6 hits: 'social 3, marx:3, labor:2, world war ii:2, german:2, ce
  • Clustering file muslim.txt Non-empty clusters: 4 Clusters: 4 if Hits Vals Seed, Value:Count
  • Cluster 0 has 41 hits: '(OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam Alfonso Viii, King, Castile Arabia, Desert, Peninsula Arabic Literature, Literature, People Archaeology, Greek, Archaio Averros, Arabic, Abu
  • Cluster 0 has 50 hits: "(OTHER), church:12, henry:8, king:7, english:6, roman:6
  • Tyndall John, Physicist Ultrasonics, Branch, Physic Ventriloquism, Art, Sound Violin, Instrument, Member Viscount Melville Sound, Arm, Arctic Ocean Voiceprint Identification, Method, Person Warner Brothers, Motion, Picture Xylophone, Greek, Xylon Cluster 2, has 8 hits: 'sound:6, long:3, letter:3, sign:2, atlantic ocean:2, mi Animal Behavior The, Behavior, Animal C, English, Romance-language Diacritic Mark, Sign, Mark Island Sound, Body, Salt Letter, Vowel, Engli-
  • Cluster 0 has 6 hits: '(OTHER), electron:2, beam:2, tube:2, television- ⁇ ' Baseball, Game, Skill Cathode-ray Tube, El*- : , Tube
  • Warfare,. Use, Force Cluster 1 has 11 hits: 'strike:10, united states:3, presidents, injunctions,
  • Cluster 0 has 2 hits: '(OTHER), states'

Abstract

A system for case-based organizing and querying of a database (102). The database (102) may comprise a set of objects (106), such as text documents. The database (102) may be organized by examining each object (106) and associating that object (106) with a set of property values, such as keywords. A document may be associated with those words which appear more frequently in the document than in the database (102) at large, or which appear in the early text of the document, or which appear in the title. The system may be responsive to a query (104) by associating the query with a similar set of property values and performing case-based matching on the objects (106) of the database (102) for similar objects (106). The query (104) may be natural-language text and may be associated with keywords. The system may present matched objects in response to the query (104), may respond to iterative refinement of the query and may order matched objects by quality of match. The system may also respond to the result of organizing matched objects for presentation with suggestions for iterative refinement of the query (104).

Description

CASE-BASEDORGANIZINGANDQUERYINGOFADATABASE
1. Field of the Invention
This invention relates to case-based organizing and querying of a database.
2. Description of Related Art
As storage capability grows for computing devices, many databases have become larger, and large databases have become more common. One problem which has become apparent in the art is the difficulty of retrieving information from large databases when the location of that desired information is not already known. For example, a search for information in a large library may be hampered by the size of the library, because of the large number of items which must be examined. This can be exacerbated if the information searched for is not well-described by the searcher, if the searcher is unfamiliar with that subject matter, or if the information searched for is not well indexed.
Large databases of objects may sometimes be generated without the original intent to organize them into a database. For example, newspaper articles may generally be written without the consideration that they may be collected into a single database for later search. When they eventually are collected into a database, the effort required to organize those objects into a database for information retrieval can be formidable. It would be advantageous to provide a system in which a large amount of information may be collected into a database without having to expend a comparable amount of effort on organization and indexing, e.g., where such organization and indexing can be done by an automated process.
Prior art methods of retrieving information generally require preparation of a query, in which objects to be searched for are described in some formal manner. This imposes additional effort on the searcher, and generally also requires that the searcher be familiar with the subject matter to be searched, with the organization and indexing of the database, and with a formal query language. Accordingly, it would be advantageous for the searcher to be able to describe the query in a natural and relatively informal or unstructured manner, such as a description in a natural language.
Work with case-based systems has shown that incremental refinement of problem descriptions can be valuable in improving a automated system's recall (ability to retrieve objects which are related to the query) and precision (ability to rule out objects which are not related to the query) . It would be advantageous to be able to incrementally refine the query after a response. But when the query itself is unstructured, the original response may provide so much information that valuable material is lost in the size of the response. Accordingly, it would be advantageous to provide suggestions for incremental refinement. In one aspect of the invention, the response may be organized by quality of match. In another aspect, the response may be organized into clusters of related objects.
SUMMARY OF THE INVENTION
The invention provides a system for case-based organizing and querying of a database. The database may comprise a set of objects, such as a set of documents including text. In a preferred embodiment, the database may be organized by examining each object and associating that object with a set of property values, such as (in the case of text documents) a set of keywords or other indicators of content. For example, a document may be associated with those words which appear more frequently in the document than in the database at large, or which appear in early text of the document, or which appear in a title. The system may be responsive to a query by associating the query with a similar set of property values and performing case-based matching or other fuzzy associative matching on the objects of the database for objects which are similar. In a preferred embodiment, the query may be natural-language text and may be associated with keywords or other indicators of its content.
In a preferred embodiment, the system may present matched objects in response to the query, may respond to iterative refinement of the query (in similar manner to iterative case-based methods shown in those co-pending applications which have been incorporated by reference) , and may order matched objects by quality of match. The system may also examine the collection of matched objects and organize them for presentation; for example, the system may group matched objects into clusters of objects which have similar properties, which relate to similar content, or which have similar likelihood to be of relevance to the query or of interest to an operator posing the query. The system may respond to the result of organizing matched objects for presentation with suggestions for iterative refinement of the query.
The system may therefore be capable of producing improved recall and precision over prior art techniques.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a block diagram of a database explorer and filter system.
Figure 2 shows a data flow diagram of a method of filtering documents.
Figure 3 shows a data flow diagram of a method of processing queries.
Figure 4 shows a data flow diagram of a method of processing hit tables.
Figure 5 shows a process flow diagram of a method of clustering hit tables. Figure 6 shows an example explorer user interface screen as viewed by an operator.
Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta" from Microsoft Corporation of Redmond, Washington.
DESCRIPTION OF THE PREFERRED EMBODIMENT
An embodiment of this invention may be used together with inventions which are disclosed in a copending application titled "AUTONOMOUS LEARNING AND REASONING AGENT", application Serial No. 07/ 869,926, filed April 15, 1992 in the name of Bradley P. Allen, hereby incorporated by reference as if fully set forth herein.
In a preferred embodiment, the invention may operate in conjunction with a computing system, including a processor and a memory, generally configured as is well known in the art; the memory may include primary memory for stored programs and for data and secondary memory for extensive storage of large numbers of objects. Preferably, the memory may comprise a sizable database of objects, as is well known in the art of databases, and such objects may comprise various types of computing and data-storage structures. However, no particular structure is required for the database itself; the database may be a relational database, an unstructured collection of objects, or some other database format.
Although the invention is disclosed herein primarily with respect to textual objects, it would be clear to those of ordinary skill in the art, after perusal of the application, that extension of the concepts disclosed to other types of objects is within the scope and spirit of the invention, and would not requite undue experimentation. Such other types of objects may include source code, object code, binary values, numeric values, text or other symbolic values, representations of sound and/or picture signals or other signals, multimedia, data structures for rule-based or case-based systems, artificial neural networks, linked data structures such as linked lists, mathematical structures such as equations, polynomials, matrices or tensors, and other data types known in at least one of the many fields of computing. Although when the invention is applied to textual objects, appearance of a text string in an object is considered pertinent, when the invention is applied to other types of objects, other measures of closeness or pertinence, such as numerical closeness, would be workable, and are within the scope and spirit of the invention.
FILTER AND EXPLORER SYSTEM
Figure 1 shows a block diagram of a database explorer and filter system.
In a preferred embodiment, a system 101 for case-based organizing and querying of a database 102 may comprise a filter 103, for organizing the database 102 so as to be responsive to a query 104, an explorer 105, for selecting a set of objects 106 in the database 102 which are responsive to that query 104, and an object file system 107, for accessing the database 102. In a preferred embodiment, the database 102 may generally be of a type which is known in the art, such as a collection of text objects supported by Cairo Milestone 4 running under the Windows NT system version 297, available from Microsoft Corporation of Redmond, Washington, and may be accessed in conjunction with the object file system 107 of that product.
The filter 103 may operate at an initialization time, such as when the processor is first started or before the first query 104 is presented to the explorer 105. The filter 103 may also operate in an incremental mode, e.g., by updating its organization of the database 102 periodically, such as upon the passage of a fixed period of time, when a fixed number of objects 106 are changed or added to the database 102, when the operation of the explorer 105 is degraded below some predetermined level, when triggered by an operator 108 in conjunction with a user interface 109 (e.g., when a query is presented, by a specific command to do so, or as a side effect of another operation) , or otherwise as determined by the database 102 or an external manager.
The filter 103 may examine each of the objects 106 (or some predetermined subset of objects 106) in the database 102 and associate each object 106 it examines (or some predetermined subset of those objects 106) with a set of properties. For a textual database 102 as primarily described herein, those properties may be keywords or phrases which are found in the object 106, but may also comprise other property values, such as the language the text is written in, the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
The objects 106 with their properties may be treated as a set of cases to be matched by a CBR engine 110 (operating with the object file system 107) with a test case generated from the query 104. Each case may generally comprise an object 106 plus the properties that object 106 was associated with, e.g., key words and phrases found in that object. In a preferred embodiment, these properties may include a lexicon of words and noun phrases found in the object 106, including at least some of these words labelled as a set of "header words" or "relevant words" .
The explorer 105 may generally operate at a question time, such as when one or more queries 104 is presented to the explorer 105. In a preferred embodiment, the ejφlorer 105 may be invoked by the operator 108 in conjunction with the user interface 109, which user interface 109 may allow the operator to trigger operation of the explorer 105 and to present one or more queries 104 to the explorer 105. In a preferred embodiment, the user interface 109 may be one such as the user interface presented by the Windows NT system referred to herein. In a preferred embodiment, the operator 108 may be a human being, but those of ordinary skill with recognize, after perusal of the application, that the operator 108 may comprise a network connection, an external management program, or an Al program.
In a preferred embodiment, the explorer 105 may generate a response 111 including a set of matching cases (i.e., objects 106 with their properties) , which may be presented to the operator 108 by means of the user interface 109, such as the user interface presented by the Windows NT system referred to herein. I augmented by features described herein. The filter 103 and the explorer 105 may operate in conjunction with the object file system 107 (and in particular the CBR engine 110 thereof) , which may respond to a set of properties formed into a vector query 112 directed at the database 102, and may return a hit table 113 of those objects 106 in the database 102 which have the indicated properties. In a preferred embodiment, the CBR engine 110 may use case-based matching and other techniques such as those shown in those co- pending applications which have been incorporated by reference.
FILTERING DOCUMENTS
Figure 2 shows a data flow diagram of a method of filtering documents.
In a preferred embodiment, a document 201 (an object 106 which comprises text, such as a pure text document or a text document formatted for a word-processing program) may be input to the filter 103 for examination. The filter 103 may process the text by a tag-and-segment-text process 202, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique.
The tag-and-segment-text process 202 may extract a set of single terms 203 and generate a set of header words 204 found in the document 201. The header words 204 may comprise those words which occur in an initial part of the object 106, or in a title, subject line, topical paragraph, or abstract. In a preferred embodiment, the header words 204 may comprise the first three things mentioned in the document 201.
The tag-and-segment-text process 202 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 205. The sentences 205 may be input to an extract-noun-phrases process 206, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 207 and generate a lexicon 208 thereof. In a preferred embodiment, the tag-and-segment-text process 202 may use a grammar of the English language, but other natural languages, and even formal specification languages such as programming languages, would also be suitable.
The tag-and-segment-text process 202 may also recognize and generate a set of proper nouns 209. In a preferred embodiment, the set of proper nouns 209 may be determined by known rules, e.g., that proper nouns generally comprise strings of words each starting with an upper-case letter, or by reference to a dictionary of known proper names. The set of proper nouns 209 may be input, along with at least some of the single terms 203, to a determine-relevant-words process 210, which may extract a set of relevant words 211.
The set of relevant words 211 may be determined with reference to the frequency of those words in the object 106 (with respect to the entire text found in the object 106) and with reference to the frequency of those words in the database 102, with respect to the text corpus of the database 102. In a preferred embodiment, the ratio for each word (frequency in the object 106) divided by (frequency in the database 102) may be computed, and the set of relevant words 211 may comprise those words whose relative frequency exceeds a threshold, e.g., a predetermined threshold such as a 1:1 ratio. However, it would be clear to those of ordinary skill, after perusal of this application, that other measures (e.g., statistical measures) relating to frequency could be used to determine relevant words, such as clustering of relevant words in paragraphs, correlation with other relevant words, or relative frequency of word pairs or n-tuples, and that such other measures are within the scope and spirit of the invention.
The filter 103 is described herein for a specific set of properties of the text which may be extracted. However, it would be clear to those of ordinary skill, after perusal of this application, that extraction of other properties could be readily accomplished, and is within the scope and spirit of the invention. Such other properties could include the language the text is written in (or for English-language text, the number of foreign words used) , the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) . In a preferred embodiment, the extract-noun-phrases process 206 and the determine-relevant-words process 211 may proceed in parallel, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
The filter 103 may mark each object 106 with the properties it determines (or alternatively may create a separate object 106 relating each documentary object 106 to its properties) , so that the object 106 and its properties may be treated as a case in a case-base. In a preferred embodiment, the set of cases may be matched to a test case by a CBR engine 110, using techniques like those described in copending applications (1) Serial No. 07/ 664,561, filed March 4, 1991 in the name of inventors Bradley P. Allen and S. Daniel Lee, titled "CASE-BASED REASONING SYSTEM"; (2) Serial No. 07/ 869,935, filed April 15, 1992 in the name of inventor Bradley P. Allen, titled "MACHINE LEARNING WITH A RELATIONAL DATABASE"; and (3) Serial No. 07/ 869,926, filed April 15, 1992 in the name of Bradley P. Allen, titled "AUTONOMOUS LEARNING AND REASONING AGENT"; each of which is hereby incorporated by reference as if fully set forth herein, or other case-based reasoning techniques which may be known in the art.
PROCESSING QUERIES
Figure 3 shows a data flow diagram of a method of processing queries. In a preferred embodiment, the query 104, entered in free text by the operator 108, may be input to the explorer 105 for examination. The explorer 105 may process the text by a tag- and-segment-text process 301, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique, similarly to the tag-and-segment-text process 202 of the filter 103.
The tag-and-segment-text process 301 may extract a set of single terms 302, similarly to the tag-and-segment-text process 202 and the set of single terms 203 of the filter 103.
The tag-and-segment-text process 301 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 303, similarly to the tag-and-segment- text process 202 and the sentences 205 of the filter 103. The sentences 303 may be input to an extract-noun-phrases process 304, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 305, similarly to the extract-noun-phrases process 206 and the noun phrases 207 of the filter 103.
The tag-and-segment-text process 301 may also recognize and generate a set of proper nouns 306, similarly to the tag-and- segment-text process 202 and the proper nouns 209 of the filter 103. The noun phrases 305, single terms 302, and proper nouns 306, a rank threshold 307, and a set of selected subtopics 308 (subtopics selected by the operator 108 to refine the query 104) may be input to a generate-query process 309, which may generate a set of query terms 310 and a query parse tree 311.
In a preferred embodiment, the tag-and-segment-text process 301, the extract-noun-phrases process 304, and the generate-query process 309 may proceed as asynchronously as possible, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
The query terms 310 and the query parse tree 311 may be input to the CBR engine 110 in the object file system 107, and may perform case-based matching or other fuzzy associative matching on the objects 106 in the database 102 for objects which are similar to the query 104, as described by the query terms 310 and the query parse tree 311, and which have a match quality at least as good as the rank threshold 307. (As noted with regard to the user interface 109, the selected subtopics 308 are added to the text of the query 104.) The object file system 107 may generate the hit table 113 of matched objects 106.
PROCESSING HIT TABLES
Figure 4 shows a data flow diagram of a method of processing hit tables. The hit table 113 and the relevant words 211 may be input to a cluster hits process 401, which (if clustering is enabled) collects the matched objects 106 into clusters, and may output a set of clusters 402 in response. Each cluster 402 may comprise a set of objects 106, selected for collective closeness with regard to all objects 106 in the hit table 113. The cluster hits process 401 is further described with regard to figure 5.
The hit table 113, the relevant words 211, and the lexicon 208 may be input to a first generate-topics (from relevant words) process 403, while the lexicon 208 and the query terms 310 may be input to a second generate-topics (from query words) process 403. Together the two generate-topics processes 403 may output a set of topics 404 and subtopics 405.
In a preferred embodiment, the generate-topics process 403 may examine the lexicon 208 of noun phrases 207 with a rule- based inference engine (not shown) . (One such inference engine is the ART-IM system, available from Inference Corporation in El Segundo, California.) The inference engine may detect particular patterns in the noun phrases 207 which indicate semantic relations between the words in those noun phrases 207. For example, the noun phrase
"kangaroos, wallabies, and other marsupials"
would be detected and would generate the relations kangaroo IS-A marsupial wallaby IS-A marsupial
The generate-topics process 403 may thus construct a phrase lattice., showing each noun phrase 207 as being inclusive of (above) , included in (below) , or incommensurate with (neither above nor below) each other noun phrase 207.
The generate-topics (from relevant words) process 403 may restrict the phrase lattice to those noun phrases 207 which include relevant words 211 of the objects 106 in the hit table 113. In a preferred embodiment, the second generate-topics (from query words) process 403 may operate in similar manner as the first generate-topics (from relevant words) process 403 and may restrict the phrase lattice to those noun phrases 305 which include relevant words 211 of the query.
Figure 5 shows a process flow diagram of a method of clustering hit tables.
The cluster hits process 401 may operate by means of a genetic algorithm, in which an initial configuration and a set of genetic operators are specified, and the set of solutions is formed by simulation of random "evolution" of a population of possible solutions, using the method of steady-state reproduction without duplicates. Genetic algorithms are well known in the art, and are described in further detail in "Foundations of Genetic Algorithms", ed. Gregory J.E. Rawlins (Morgan Kaufmann Publishers: San Mateo, California 1991). It would be clear to those of ordinary skill in the art that the parameters of the genetic algorithm, and even the type of genetic algorithm performed could be varied substantially and still remain within the scope and spirit of the invention.
In a cluster-count step 501, a number of clusters 402 is selected. The number of clusters 402 may vary from a known minimum to a known maximum, settable by the operator 108. The genetic algorithm of the following steps is repeated for each permissible number of clusters 402, and the best solution adopted.
In an initiate-clusters step 502, a set of possible clusters 402 is selected; this is a single "gene". A random population of genes is selected-. Each cluster 402 is represented by the centroid of the objects 106 which would comprise that cluster 402. Thus, when a solution of clusters 402 is selected, each object 106 is assigned to the cluster 402 which it best matches.
After the initiate-clusters step 502, the genetic algorithm of the following steps is repeated for a known period of time, settable by the operator 108. When that time ej ires, the best available solution (i.e., the gene with the best quality) is selected as the solution and specifies the set of clusters 402. Each object 106 is assigned to the cluster 402 to which it is the closest, In an evaluation step 503, all genes in the population are evaluated for quality, and the gene with the least quality is removed. In a preferred embodiment, the statistical measure "category utility" is computed; i.e., the utility of each cluster 402 in distinguishing between an object 106 in one cluster 402 from an object in another cluster 402. Thus, if the centroid of a cluster 402 has high quality of match for several objects 106, those objects are reasonably clustered together.
Although in a preferred embodiment, matching for clusters 402 is performed using relevant words 211, it would be clear to those of ordinary skill, after perusal of this application, that other properties of the objects 106 could be used as well, such as the read/write date of the object 106, and that doing so would be within the scope and spirit of the invention.
In a genetic-operator step 504, one of three operators is selected and employed to create a new gene: (1) Mutation-1. The new gene is randomly created. (2) Mutation-2. An existing gene is copied, except that one of its clusters 402 is mutated by replacing it with a randomly created cluster 402. (3) Crossover. Two genes have their n-tuples of clusters 402 paired off and one cluster 402 is selected at random from each pair to form the new gene. Alternatively, a new gene is created by selecting N clusters 402 at random from the 2N clusters 402 specified by the two old genes. USER INTERFACE
Figure 6 shows an example ejφlorer user interface screen as viewed by an operator. While the invention is described primarily with regard to a specific user interface, it would be clear to those of ordinary skill in the art that another user interface of equal or greater flexibility would be suitable, and would be within the scope and spirit of the invention.
In a preferred embodiment, the user interface 109 may be combined with a user interface for a generalized file system exploration program, such as in the Windows NT system referred to herein. The user interface 109 may comprise a query window 601 in which the operator may enter the query 104 in free text, and a results window 602 in which the system 101 may display a set of matched objects 106 found in response to the query 104.
In a preferred embodiment, the operator 108 may enter the query 104 in the query window 601. The query 104 is input to the explorer 105, which processes it as described herein, and generates the vector query 112. The vector query 112 is input to the object file system 107, and generates the hit table 113 of matched objects 106. The hit table 113 is input to the user interface 109, which displays the matched objects 106. The operator may select a displayed matched object 106 to view its contents. In a preferred embodiment, the user interface 109, the explorer 105, and the object file system 107, may operate as asynchronously as possible. Accordingly, the object file system 107 may search the database 102 for matched objects 106 independently, once it has sufficient information from the ejφlorer 105; the user interface 109 may display matched objects 106 from the hit table 113 as they are generated by the object file system 107.
In the example, the operator 108 has entered the query 104 "who invented the light bulb?" in a content field 603 of the query window 601, and the system 101 has responded with a set of matched objects 106 in the results window 602. The matched objects are displayed one per line, in columns labelled "rank", "query", "header", and "relevant words".
In the example, a rank field 604 displays the quality of match for each displayed matched object 106. In a preferred embodiment, the system 101 may order the matched objects 106 by rank. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of a "sort" command 605 in the query window 601. In a preferred embodiment, the rank field 604 may also be color-coded by value.
In the example, a query field 606 displays the relevant words of the query which are most related to the displayed matched object 106. In the example, a header field 607 displays the header words 204 of the displayed matched object 106.
In the example, a relevant words field 608 displays the most common relevant words 211 of the displayed matched object 106.
In the example, a topics field 609 of the query window 601 displays suggested topics for refinement of the query 104 which the system 101 has identified. In a preferred embodiment, the operator 108 may select a topic in the topics field 609, and the system will display a subtopics window 610 (overlaid on the query window 601 and the results window 602) showing the subtopics which the system 101 has identified for that topic.
QUERY REFINEMENT
The operator 108 may refine the query 104 in response to the matched objects 106, and the ejφlorer 105 may attempt to match objects 106 using the query 104 as refined. This may occur at the request of the operator 108, e.g., by means of a "refresh" command 611 in the query window 601.
In a preferred embodiment, the operator 108 may select one or more subtopics 405 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to with a pointing device such as a mouse) one or more subtopics 405 in the subtopics window 610. The selected subtopics 308 may be "added" to the query 104 and the explorer 105 may attempt to match objects 106 using the query 104 as refined.
In a preferred embodiment, the operator 108 may also select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g. by pointing to) the relevant words field 608 for a particular matched object 106 and "drag" that relevant words field 608 to the content field 603; the system 101 will display a relevance feedback window 612 (overlaid on the query window 601 and the results window 602) showing the relevant words 211 for that matched object 106.
In a preferred embodiment, the operator 108 may select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to) one or more relevant words 211 in the relevance feedback window 612. The selected relevant words 211 may be "added" to the query 104 and the ejφlorer 105 may attempt to match objects 106 using the query 104 as refined.
The query 104 as refined (like the original query 104) is presented as a vector query 104 to the CBR engine 110. When selected subtopics 308 or relevant words 211 are "added" to the query, they are properties which the CBR engine 110 must match to objects 106, as described for methods of iterative refinement of case-based matching shown in those co-pending applications which have been incorporated by reference. (Thus, the CBR engine 110 must match to objects 106 as if the operator 108 had answered a query refining question in a case-based system.) A query 104 as refined may be further refined, allowing the operator to iteratively refine the query 104 until desired objects 106 are located.
VIEWING CLUSTERS
Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
The operator 108 may select a "cluster" command (figure 6) or "uncluster" (figure 7) command 701 in the query window 601, and the system 101 will display a set of clusters 402, each a set of related matched objects 106, in place of displaying matched objects 106 themselves. In the example, the operator has selected the "cluster" command 701 for the same query 104 as in the example of figure 6.
In the example, an expand field 702 displays whether the cluster 402 can be expanded (shown by a "+" symbol) to display individual matched objects 106, or can be collapsed (shown by a "-" symbol) to display a single identifier for the cluster 402.
In the example, the rank field 703 displays the best rank for all matched objects 106 in the cluster 402. In a preferred embodiment, the system 101 may order the clusters 402 by this rank field 703. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of the "sort" command 605 in the query window 601. In a preferred embodiment, this rank field 703 may also be color-coded by value.
In the example, the relevant words field 608 displays the most common relevant words 211 in the cluster 402.
Other fields and windows remain similar to the example of figure 6.
The operator 108 may also choose to cluster all objects 106 in a specific set, e.g., a specific directory in the object file system 107. In a preferred embodiment, the operator 108 may restrict the scope of the explorer 105 to a specific directory and issue the "cluster" command 701; the system 101 will display the objects 106 in that directory in clusters 402.
SETTING PARAMETERS
Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
In a preferred embodiment, the operator 108 may select settings appropriate for the system 101. The operator 108 may select a "properties" command 801 in the query window 601 (figure 6) , and the system 101 will display a properties window 802 with a set of property values 803 which may be set.
A "minimum rank of returned hits" property 804 is a threshold value for including matched objects 106; matched objects 106 whose rank falls below this value are not displayed in the results window 602 and are not used in further processing. The rank of a matched object 106 is calculated by the CBR engine 110. In the example, this value is set to 80.
A "maximum clustered hits" property 805 is a maximum number of matched objects 106 which are included in a single cluster 402. Those matched objects 106 not included in clusters 402 are placed in a special cluster 402 labelled "Other". In the example, this value is set to 400.
A "clustering time" property 806 is the elapsed real time devoted to clustering. In the example, this value is set to 2500 milliseconds.
A "minimum number of clusters" property 807 is the lower bound for the number of clusters 402 generated. In the example, this value is set to 2 clusters.
A "maximum number of clusters" property 808 is the upper bound for the number of clusters 402 generated. In the example, this value is set to 8 clusters. The system 101
Figure imgf000028_0001
attempts to generate a number of clusters 402 between the minimum and maximum number selected.
A "maximum topics" property 809 is the maximum number of topics displayed in the topics field 609 in the query window 601. In the example, this value is set to 7 topics.
A "maximum subtopics" property 810 is the maximum number of subtopics displayed in the subtopics window 610. In the example, this value is set to 250 subtopics.
A "do/don't cluster" property 811 sets whether or not clustering is performed. In the example, this value is set to YES.
A "do/don't generate query topics" property 812 sets whether or not topics and subtopics are generated in response to query terms 310. In the example, this value is set to YES.
A "do/don't generate salient topics" property 813 sets whether or not topics and subtopics are generated in response to relevant words 211. In the example, this value is set to YES.
A "boolean/vector query" property 814 sets whether the object file system 107 performs a boolean query or a vector query in response to the ejφlorer 105. In the example, this value is set to vector queries. A boolean query would have boolean connectors (e.g., "AND", "OR") coupling the query terms 310, so that the query 104 would not be as flexibly matched. Search using boolean queries is well known in the art.
APPENDICES
Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta" from Microsoft Corporation of Redmond, Washington.
Alternative Embodiments
While preferred embodiments are disclosed herein, many variations are possible which remain within the concept and scope of the invention, and these variations would become clear to one of ordinary skill in the art after perusal of the specification, drawings and claims herein.
APPENDIX A
LEX2.TXT
Number of original entries from LDOCE and WordNet:
2466 lines of the form: Ability: skill, faculty, aptitude 11624 total terms on the right (downward relationships) Terms never have their parents as children (no loops)
Parts of speech represented:
I
A - Adjective strong, vivid, real
ADV - Adverb weakly, dimly, very
AUX - Auxiliary Verb can, shall, will
AXN - AUX not can't, won't doesn't
BE - be is, are, be, was
BTH - PQT/Double Conj. both
CLN - Colon
CMA - Comma
CON - Connective and, or, but O CRD - Cardinal three, 3.14, twenty-two o D - Determiner the, a, that
DAT - Date &/or Time friday, 3:00, Christmas
DDC - D/Double Conj. either, neither
DO - Do (aux) do, did, does
ENS - End Of Sentence ? I
ETC - "And Others" ... , e^c. , e^. .
GEN - Genitive his, her, their
HAV - Have (aux) have, had, has,, having
IJ - Interjection Oh, shucks, well
INF - Infinitive marker to
N - Noun frog, pride, year
NEG - Negation not
ORD — Ordinal first, 2nd, last
P - Preposition by, around, with, from
PA - Open Paren ( , [, { , <
PD - Post Determiner many, several, next,
PN - Proper Noun Zippy, Brad Allen
PQL - Pre-Qualifier quite, rather, such
LEX2.TXT
PQT - Pre-Quantifier nary, many, half, all
PRN - Pronoun him, she, we
PRT - Participial Verb running, thinking
QA - Quantifier/Article that, this
QL - Qualifier some, many, every, QLP - Post-Qualifier enough, 'nuff, indeed
QN - Quantified Noun everybody, nothing REN - Close Paren , ). ], >, >
RP - Relative Pronoun that, which SOS - Start of Sentence, or « V - Verb (inf or past) eat, voted, surf WHD - Wh-Determiner what, which WHQ - Wh-Qualifier who , hy
XT - Existential Term it, there
Total number of phrase recognition rules: ω 5 for the filter:
CRD GEN|N|ORD, N, ~N
GEN, PRT
ADV CRD GEN N ORD, A CRD ORD, N, "N ADV CRD GEN N ORD, A CRD ORD, A|CRD|N|ORD, N, 'N CRD N | ORD, CON, A | CRD | N | ORD , N , *N Additional 10 for the Explorer (original 5 used as well ) :
Figure imgf000033_0001
LEX2.TXT
N, RP, AUX|AXN|COP|DO|HAV, P|PRT|V, N|PN note: "X means not X or nothing at all (end of sentence)
Total number of automatically acquired lexicon entries:
For Encarta, including base LDOCE/Wordnet entries:
184904 unique words / base phrases
51623 parents involved in 445025 relationships
151850 children involved in 445025 relationships
Average number of terms per automatically acquired phrase:
445025 / 51623 ■ 8.6 445025 / 151850 = 2.9 r Average number of children phrases from original LDOCE entries:
11624 / 2466 = 4.7
NOTE from Perry:
You asked how many things we got out of WordNet and LDOCE. The number that David responded was the number of taxonyms we extracted from those two sources (mostly WordNet) . If you were asking the number of words we extracted, it was initially in the neighborhood of 85,000. The current number of tagged words in the lexicon is 25915.
There are some additional phrase lattice rules that David didn't mention, since they are currently stubbed out. They involve noun phrases where a prepositional phrase or relative clause attatches to the right: of a noun:
Queen of England girl from Ipanema
LEX2.TXT
man who hit Dave Adam car that didn't stop The reason why we don't use them is because of the right attatchment.
Our current representation in the phrase lattice file is: base-word, extl, ext2, ... , extn where extl through extn all attatch to the LEFT of base-word. Bear in mind, of course, that unstubbing the code and fixing the reps of this fiTe will add this form of phrase lattice entry, but it will also increase the size of the phrase lattice file (perhaps double it) .
LDOCE is basically a dictionary of British English, so we found a lot of words we weren't familiar with, as well as a lot of double entries to account for American spellings (e.g. color and colour) . The lexical ω categories we were able to extract out of LDOCE and WordNet were limited to nouns, verbs, adjectives, adverbs, conjunctions, determiners, predeterminers, prepositions, pronouns, and phrases. Since we don't use a phrasal lexicon, we threw the phrases away.
All other categories of words (including the different categories of verbs: do, be, have, participial) were hand tagged. This tagging was greatly aided by two books: DeRose's Dissertation and the book by Kucera and Francis. The past tenses for all verbs were also done by hand, which was something of a waste as most of them (the regular ones) were eventually thrown away, once we implemented rules that tag based on word endings.
Figure imgf000035_0001
Figure imgf000035_0002
The following are the current set of rules used for determining noun phrases:
1. noun-phrase — > proper-noun (e.g. "Elvis")
2. noun-phrase - pronoun (e.g. "he")
3. noun-phrase -> noun (e.g. "cars")
4. noun-phrase -> gerund (e.g. "running")
5. noun-phrase -> determiner noun-phrase (e.g. "The person")
6. noun-phrase -> quantifier noun-phrase (e.g. "Three people")
7. noun-phrase -> adjective noun-phrase (e.g. "fluffy clouds")
8. noun-phrase -> adverb noun-phrase (e.g. "maddeningly fluffy clouds")
9. noun-phrase — > noun noun-phrase (e.g. "printer ribbons")
10. noun-phrase -> noun-phrase relative-clause (e.g. "The car that hit me")
11. noun-phrase — > noun-phrase prepositional-phrase
(e.g. "The person with the most toys")
12. noun-phrase — > noun-phrase that sentence
(e.g. "The candidate that I will vote for")
13. noun-phrase — > noun-phrase [, noun-phrase]* [,] and noun-phrase (e.g. "Larry, Moe and Curly")
14. noun-phrase -> noun-phrase [, noun-phrase]* [,] or noun-phrase (e.g. "England, France, or Germany")
15. noun-phrase — > comparative noun-phrase than noun-phrase (e.g. "more tea than China")
The Find Taxonomic Relations process (process 2.2 in figure 4) uses ART-IM rules to capture patterns of words which indicate taxonomic relationships between the words. For example, it detects patterns like:
"... kangaroos, wallabies, and other marsupials ..."
From this particular phrase, one could reasonably extract the relations
IS_A(kangaroo,marsupial) and IS_A(wallaby,marsupial)
Other patterns which detect this type of relation extracted from [14] are :
1. NP such as (NP.) * {(and \ or) ) NP
2. such NP as (NP,) * X(and \ or) ) NP
3. NP {, NP)* {,) and other NP
4. NP (, NP}* {.) or other NP
5. NP {,} including (NP,) * {(and \ or) } NP
6. NP (,) especially (NP.) * {(and \ or) ) NP
APPENDIX B
Mar 16 17 : 39 1993 test. log Emacs buffer Page 1
Clustering file afl. txt Non-empty clusters : 5 Clusters : 5 I Hits Vals Seed, Value: Count
0 1 0 NONE
1 2 0 Reuther, Walter Philip, Labor, labor:2, presidents, wage:2
2 2 0 Railroad Labor Organizations, Brotherhood, Union, united statesS
3 7 0 Hillman, Sidney, Labor, labor:7, afl:7, union:4, american federat
4 2 0 Kirkland, Lane, Labor, directors
Passes: 1029, best pass: 830, best score: 0.955, worst score: 0.170 Cluster 0, has 1 hits: "
Football, Type, United States Cluster 1, has 2 hits: 'labors, presidents, wage:2'
Meany, George, Labor
Reuther, Walter Philip, Labor Cluster 2, has 2 hits: 'united statesS, unionS, managements'
Railroad Labor Organizations, Brotherhood, Union
Teamsters Union, Full, International Brotherhood Cluster 3, has 7 hits: 'labors, afl:7, union:4, american federation^, cio:3,
American Federation, Labor, Congress
Gomper, Samuel, Labor
Green, William, Labor
Hillman, Sidney, Labor
Knight, Labor, Union
Lewi, John L, Labor
Strike, Labor, Relation Cluster 4, has 2 hits: 'directors'
Kirkland, Lane, Labor
Rozelle, Pete, Full
Clustering file alcohol.txt Non-empty clusters: 5 Clusters: 5 I Hits Vals Seed, Value:Count
0 15 0 (OTHER), blood , vitamins, tissues, poisons, sugar metabolis
1 22 0 Antifreeze, Chemical, Substance, alcoholSI, acid:7, ethyl:7, li
2 10 0 Vodka, Beverage, Known, alcohol:9, percent:5, beverages, use:3,
3 6 0 Gasohol, Blend, Part, fuel:5, alcohols, methanolS, combustion
4 4 0 Marijuana, Mixture, Leave, drugs, alcohols, syndromes, psycho Passes: 334, best pass.- 158, best score: 0.307, worst score: 0.132 Cluster 0, has 15 hits: '(OTHER), bloods, vitaminS, tissues, poisonS, suga
Birth Defects, Disorder, Structure
Cancer, Medicine, Growth
Corn, Maize, Cereal
Crop Farming, Cultivation, Plant
First Aid, Emergency, Measure
Fungi, Group, Organism
Liver, Organ, Vertebrate
Nutrition, Human, Science
Paint, Varnish, Liquid
Pennsylvania, Full, Commonwealth
Poison, Substance, Produce Sugar, Term, Number
Mar 16 17:39 1993 test.log E acs buffer Page 2
Thermometer, Instrument, Measure Wine, Beverage, Juice Wood, Substance, Trunk Cluster 1, has 22 hits: 'alcohol I, acid:7, ethyl:7, liquid: , examples, chemi Acetaldehyde, Volatile, Liquid Antifreeze, Chemica1, Substance Azeotropic Mixture, Solution, Ratio Butyl Alcohol, Chemical, Formula Cannizzaro, Stanislao, Italian Disease, Medicine, Health Ester, Chemistry, Compound Ether, Chemistry, Ethyl Fermentation, Chemical, Change Formaldehyde, Compound, Carbon Glycerin, Glycerol, C3h8o3 Gum, Substance, Plant Iodine, Element, Symbol Lipid, Group, Substance Salicylic Acid, White, Solid Solution, Chemistry, Mixture Tannin, Acid, Name Turpentine, Name, Semifluid Vinegar, Condiment, Preservative Wax, Name, Ester Whiskey, Liquor, Mash Zymology, Zymurgy, Biochemistry Cluster 2, has 10 hits: 'alcohols, percentS, beverages, useS, liquor , dist Beer, Term, Beverage Cider, Sweet, Juice Cosmetic, Term, Preparation
Distillation, Process, Liquid
Distilled Liquors, Beverage, Alcohol Gin, Liquor, Grain
Liqueur, Beverage, Spirit
Police, Agency, Community
Prohibition, Ban, Manufacture
Vodka, Beverage, Known Cluster 3, has 6 hits: 'fuel:5, alcohols, methanolS, combustions, coals, en
Alcohol, Arabic, Al-kuhul
Automobile, Greek, Auto
Combustion, Process, Oxidation
Energy Supply, World, Resource
Gasohol, Blend, Part
Rocket, Term, Propulsion Cluster 4, has 4 hits: 'drugS, alcohols, syndromes, psychoactive drugs:2, ma
Alcoholism, Illness, Ingestion
Drug Dependence, State, Compulsion
Marijuana, Mixture, Leave
Psychoactive Drugs, Chemical, Substance Clustering file bulb.txt Non-empty clusters: 5 Clusters: 5 I Hits Vals Seed, Value:Count
Mar 16 17:39 1993 test.log Emacs buffer Page 3
0 9 0 (OTHER), plants, united statesS, seeds, gardenings, flowerS
1 10 0 Radiometer, Instrument, Intensity, bulb:7, light:4, tuber:3, stem:
2 3 0 Electric Lighting, Illumination, Mean, lamp:3, glassS, neonS, ar
3 5 0 Autumn Crocus, Name, Herb, bulb:5, liliaceae:4, herb:3, lilyS, pi
4 6 0 Hygrometer, Type, Instrument, temperature:4, atmosphere , points Passes: 598, best pass: 333, best score: 0.491, worst score: 0.208
Cluster 0, has 9 hits: '(OTHER), plants, united statesS, seeds, gardenings,
Disease, Plant, Deviation
Gardening, Cultivation, Plant
Garlic, Name, Herb
Genetics, Study, Trait
Gopher, French, Gauffre
Horticulture, Latin, Hortu
Peanut Worm, Name, Small
Spice, Flavoring, Part
Technology, Term, Process Cluster 1, has 10 hits: 'bulbS, light:4, tuberS, stem , rhizomeS, electrons
Bulb, Mass, Leave
Edison, Township, Middlesex County
Edison, Thomas Alva, Inventor
Onion, Name, Herb
Photoelectric Cell, Phototube, Electron
Photography, Technique, Permanent
Radiometer, Instrument, Intensity
Rhizome, Stem, Organ.
Tuber, Stem, Plant
Ray, Radiation, Wavelength Cluster 2, has 3 hits: 'lampS, glassS, neonS, arcS, bulbS, argonS, lights
Argon, Element, Symbol
Electric Lighting, Illumination, Mean
Neon Lamp, Glass, Bulb Cluster 3, has 5 hits: 'bulb:5, liliaceae:4, herb , lily:3, pistilS, heights.
Autumn Crocus, Name, Herb
Hyacinth, Plant, Genu
Soap Plant, Amole, Native
Star-of-bethlehem, Name, Herb
Tuberose, Herb, Polianth Cluster 4, has 6 hits: 'temperature:4, atmospheres, points, humidityS, bulb
Blood Pressure, Pressure, Blood
Humidity, Moisture, Content
Hygrometer, Type, Instrument
Meteorology, Study, Atmosphere
Thermometer, Instrument, Measure Vapor, Physic, Term
Clustering file columbus.txt Non-empty clusters: 7 Clusters: 7 I Hits Vals Seed, Value:Count
0 4 0 (OTHER), century:2
1 4 0 Pinzn, Name, Family, expedition^, voyage:2, hispaniola:2, pinta:2
2 5 0 Puerto Rico, Commonwealth, Spanish Estado Libre Asociado, Spanish:
3 2 0 Samana Cay, Island, Bahama, atlantic ocean:2, landfall:2, san sal
4 6 0 Mississippi, East South Central, U.S., state:5, river:3, city:3,
Mar 16 17:39 1993 test.log Emacs buffer Page 4
5 5 0 Santiago, Dominican Republic, Name, cacao:3, city:3, Caribbean:2,
6 4 0 South America, Continent, Asia, death valley:2, south:2, slavery: Passes: 614, best pass: 65, best score: 0.520, worst score: 0.189
Cluster 0, has 4 hits: '(OTHER), century:2'
American Literature, Literature, English
Coin, Geography, City
Europe, Continent, World
Knight, Columbu, Organization Cluster 1, has 4 hits: 'expedition:3, voyage:2, hispaniola:2, pinta:2, ship:2'
Columbu, Christopher, Italian Cristoforo Colombo
Pinzn, Name, Family
Ship, Type, Construction
Velzquez, Diego, Soldier Cluster 2, has 5 hits: 'spanish:4, island:3, spain:2, de:2, Christopher columbu
Bobadilla, Francisco, De
Cuba, Island, West Indies
Dsirade, Island, West Indies
Ferdinand V, The Catholic, King
Puerto Rico, Commonwealth, Spanish Estado Libre Asociado Cluster 3, has 2 hits: 'atlantic ocean:2, landfall:2, san Salvador:2, island:2,
Samana Cay, Island, Bahama
San Salvador, Island, Watling Island Cluster 4, has 6 hits: 'state:5, river:3, city:3, american civil war:2, ohio:2,
Columbu, Georgia, City
Columbu, Mississippi,__City
Columbu, Ohio, City
Georgia, State, South Atlantic
Mississippi, East South Central, U.S.
Ohio, East North Central, U.S. Cluster 5, has 5 hits: 'cacao:3, city:3, Caribbean:2, dominican:2, Santiago:2,
Columbu, Indiana, City
Santiago, Dominican Republic, Name
Santo Domingo, Trujillo, City
Spanish Town, City, Jamaica
Tobago, Republic, Commonwealth Cluster 6, has 4 hits: 'death valley:2, south:2, slavery:2, brazil:2, continen
Black, America, Immigration North America, Continv c Canada South America, Continent, Asia United States, America, Republic
Clustering file dualism.txt Non-empty clusters: 5 Clusters: 5 f Hits Vals Seed, Value:Count
0 2 0 NONE
1 5 0 Dualism, Philosophy, Theory, mind:5, philosophers, philosophy ,
2 3 0 Devil, Hebrew, Belief, evil:3, god:3, goods, humanS, middle age
3 3 0 Paulician, Church, History, dualisms, sects, bogomilsS, old te
4 2 0 Docetism, Christian, Heresy, doctrines, human:2 Passes: 1050, best pass: 312, best score: 1.003, worst score: 0.397 Cluster 0, has 2 hits: ' '
Austria, German, sterreich Zoroastrianism, Religion, Persia
Mar 16 17:39 1993 test.log E acs buffer Page 5
Cluster 1, has 5 hits: 'mind:5, philosophe , philosophy:3, matters, universe
Dualism, Philosophy, Theory
Metaphysics, Branch, Philosophy
Monism, Greek, Mono
Occasionalism, Term, System
Philosophy, Greek, Philosophia Cluster 2, has 3 hits: 'evils, godS, good:2, human:2, middle agesS, middle e
Albigens, Follower, Single
Devil, Hebrew, Belief
Evil, Wrong, Harm Cluster 3, has 3 hits: 'dualisms, sects, bogomilsS, old testaments, century
Basilide, Teacher, Alexandria
Bogomils, Member, Sect
Paulician, Church, History Cluster 4, has 2 hits: 'doctrine:2, human:2'
Docetism, Christian, Heresy
Neoplatonism, Designation, Doctrine
Clustering file infant.txt Non-empty clusters: 7 Clusters: 7 S Hits Vals Seed, Value:Count
0 4 0 NONE
1 3 0 Gesell, Arnold Lucius, Psychologist, infants, developments
2 2 0 Incubator, Apparatu, Chamber, growths
3 2 0 Pregnancy, Childbirth, Term, births, pregnancyS, infants, chi
4 2 0 Hondura, Republic, Central America, countryS, 1980s:2
5 3 0 Baptism, Greek, Baptein, rite:2, baptisms
6 2 0 Japan, Japanese Dai, Great, manchuriaS, governments, partyS Passes: 835, best pass: -> best score: 0.795, worst s_ T. ?T\ Cluster 0, has 4 hits:
Free Trade, Interchange, Frontier
Human, Name, Individual
Perception, Process, Stimulation
Scotland, Division, Kingdom Cluster 1, has 3 hits: 'infants, developmen s'
Gesell, Arnold Lucius, Psychologist
Infancy, Period, Birth
Sudden Infant Death Syndrome, Sid, Death Cluster 2, has 2 hits: "growths'
Incubator, Apparatu, Chamber
Population, Term, Human Cluster 3, has 2 hits: 'birthS, pregnancy:2, infants, childbirth:2, women:2'
Obstetrics, Branch, Medicine
Pregnancy, Childbirth, Term Cluster 4, has 2 hits: 'country:2, 1980s:2'
Hondura, Republic, Central America
Sierra Leone, Nation, Africa Cluster 5, has 3 hits: 'rite:2, baptisms'
Baptism, Greek, Baptein
Circumcision, Removal, Part
Hennonite, Religious, Group Cluster 6, has 2 hits: 'manchuria:2, government:2, party:2'
China, Chinese Zhonghua Renmin Gongheguo, People Republic
Mar 16 17:39 1993 test.log Emacs buffer Page 6
Japan, Japanese Dai, Great
Clustering file israel.txt Non-empty clusters: 4 Clusters: 4 II Hits Vals Seed, Value:Count
0 22 0 (OTHER), governments, war:4, centuryS, french revolutions, coun
1 66 0 Judah, Old Testament, Name, israel:64, judahSO, old testamentSO,
2 39 0 Nasser, Gamal Abdel, Egyptian, israel:32, arab:26, israeliSO, pal
3 11 0 Song, Solomon, Book, book:10, old testaments, israelS, chap:5, b Passes: 127, best pass:_117, best score: 0.213, worst score: 0.083
Cluster 0, has 22 hits: '(OTHER), governments, war:4, centuryS, french revolut Achille Lauro, Italian, Cruise Anti-semi ism, Social, Agitation Asia, Continent, Island Assyria, Ashur, Ashshur Bahai, Persian, Glory Buber, Martin, Religious Cabala, Hebrew, Tradition Crusade, Expedition, Undertaken Eschatology, Discourse, Last Espionage, Collection, Information Iran, Islamic Republic, Republic Jewish Art, Architect c Jew Jewish Music, Religic o , Music Nationalism, History, Movement Portuguese Literature, Literature, Portuguese Refugee, Person, Country Romania, Republic, Europe Saudi Arabia, Monarchy, Southwest Asia
Union, Soviet Socialist Republics, Russian Soyuz Sovyetskikh Sotsialisticheski United Nations, Organization, Nation-state United States, America, Republic Woman Suffrage, Right, Women Cluster 1, has 66 hits: 'israel:64, judahSO, old testamentSO, king:18, bc:12, Abner, Old Testament, Cousin Ahab, King, Israel Amaziah, Hebrew, King Ammonite, People, Region Amo, Book, Old Testament Angel, Greek, Aggelo Apostle, Greek, Apostolo Ashqelon, Town, Palestine Balaam, Old Testament, Prophet Kokhba, Simon, Name Bene Israel, Community, Jew Ben-zvi, Itzhak, Second Bethlehem, Jordan, Hebrew Bible, Holy Bible, Book Carmel, Mount, Mountan Diaspora, Greek, Dispersion David, King, Be Edom, Old Testament, Times Elat, Eilat, City
Mar 16 17:39 1993 test.log Emacs buffer Page 7
Elia, Century, Be
Elisha, Old Testament, See
Ephraim, Hebrew, Old Testament
Esdraelon, Plain, Jezreel
Ezekiel, Book, Old Testament
Falasha, Sect, Ethiopia
Galilee, Galil, Circle
Gideon, Hebrew, Hewer
Habima Theater, Former, Name
Hebron, City, Israeli-occupied Jordan
Herzog, Chaim, President
High Priest, Hierarchy, Head
Hoion, City, Israel
Israel, Kingdom, Hebrew
Jacob, Old Testament, Patriarch
Joash, Name, King
Jehoshaphat, Hebrew, Jehovah
Jehu, Hebrew, Jehovah
Jeremiah, Book, Old Testament Jeroboam I, Old Testa- -.r. See Jeroboam Ii, King, Israel Jew, Usage, Hebrews Jezebel, Tyrian, Princess Jonathan, Old Testament Books, Samuel Judah, Old Testament, Name Judaism, Culture, Jew Justification, Theology, Way King, Book, Old Testament Lost Tribes, History, Tribe Manasseh, Son, Old Testament Meir, Golda, Israeli Michael, Hebrew, God Moab, Country, Hill
National Jewish Welfare Board, National, Agency Negeb, Region, Middle East Philistine, Inhabitant, Region Putnam, Israel, Soldier Ramat Gan, City, Central Rehoboam, King, Judah Samuel, Book, Old Testament Saul, King, Israel Sharon, Plain, Israel She a, Hebrew, Word Solomon, King, Israel Tiberia, Lake, Sea Weizmann, Chai , Long-time Zangwill, Israel, English Cluster 2, lias 39 hits: 'israelS2, arab:26, israeliSO, palestine:ll, egypt:ll, Husein, King, Jordan Acre, Akko, Seaport Agnon, Slimuel Yosef, Israeli Amman, Rabbah Ammon, Philadelphia Arab League, Name, League Arafat, Yasir, Palestinian Aren, Moshe, Israeli Menachem, Israeli, Prime
Mar 16 17:39 1993 test.log Emacs buffer Page
Ben-gurion, David, Israeli
Damascu, Arabic Dimashq, Ash-sham
Dayan, Moshe, Israeli
Egypt, Arab Republic, United Arab Republic
Gaza, Arabic Ghazze, City
Golan Heights, Region, Syria
Haifa, City, Seaport
Hebrew Literature, Literature, Jew
Iraq, Irak, Republic
Israel, Republic, Middle East
Jerusalem, Arabic, Al-qud
Jordan, River, Middle East
Jordan, Hashemite Kingdom, Arabic Kibbutz, Village, Far Lebanon, Arabic Lubnan, Republic
Libya, Full, Socialist People Libyan Arab Jamahiriyah Middle East, Region, Geography Nasser, Gamal Abdel, Egyptian Palestine, Region, Extent
Palestine Liberation Organization, Plo, Body Sadat, Egyptian, Military Six-day War, Conflict, June Suez Canal, Waterway, Running Syria, Arabic Suriyah, Al-arabiyah Tel Aviv-jaffa, Tel Aviv-yafo, City Terrorism, International, Use Tunisia, Republic, Africa West Bank, Area, West Yom Kippur War, Conflict, Israel Zionism, Movement, People Zionist Organization, America, Zoa Cluster 3, has 11 hits: 'book:10, old testament:9, israel:9, chap:5, be:5, proph Dead Sea Scrolls, Collection, Hebrew Hosea, Book, Old Testament Isaiah, Book, Old Testament Joshua, Book, Old Testament Judge, Book, Old Testament Micah, Book, Old Testament Number, Book, Old Testament Obadiah, Book, Old Testament Song, Solomon, Book Wisdo , Solomon, Book Zechariah, Book, Old Testament
Clustering file marx.txt Non-empty clusters: 6 Clusters: 6 β Hits Vals Seed, Value:Count
0 2 0 (OTHER), german:2, germany:2, east:2, baltic sea:2
1 3 0 Hegel, G, W, philosophers, philosophy:2
2 4 0 Bolshevism, Doctrine, Theory, communist:4, lenin:4, revolutions,
3 4 0 Marx Brothers, 20th-century, Comedian, marx:4, socialisms, engels
4 4 0 Communist Manifesto, German Manifest, Partei, capitalists, class.-
5 6 0 Ideology, System, Concept, social:3, marx:3, labor:2, world war ii
Mar 16 17:39 1993 test.log Emacs buffer Page 9
Passes: 722, best pass: 675, best score: 0.663, worst score: 0.248 Cluster 0, has 2 hits: '(OTHER), german:2, germany:2, east:2, baltic sea:2'
Germany, Country, Europe
Germany, German Democratic Republic, Gdr Cluster 1, has 3 hits: 'philosopher:3, philosophy:2'
Hegel, G, W
Philosophy, Greek, Philosophia Political Theory, SuL . ion, Science Cluster 2, has 4 hits: 'communist:4, lenin:4, revolutions, communism:2, govern
Bolshevism, Doctrine, Theory
Communism, Concept, System
International, Name, Socialist
Socialism, Doctrine, Movement Cluster 3, has 4 hits: 'marx:4, socialisms, engels:2'
Bernstein, Eduard, German Social Democratic
Economics, Science, Production
Engels, Friedrich, German
Marx Brothers, 20th-century, Comedian Cluster 4, has 4 hits: 'capitalists, class:3, capitalism:2, communist:2, bourg
Bourgeoisie, Resident, European
Capitalism, System, Individual
Communist Manifesto, German Manifest, Partei
Marx, Karl, German Cluster 5, has 6 hits: 'social 3, marx:3, labor:2, world war ii:2, german:2, ce
Ideology, System, Concept
Karl-marx-stadt, Former, Name
Kauts y, Karl Johann, German Marxist
Lassalle, Ferdinand, German
Sociology, Science, Deal
Wage, Theory, Labor
Clustering file muslim.txt Non-empty clusters: 4 Clusters: 4 if Hits Vals Seed, Value:Count
0 41 0 (OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam:4
1 20 0 Philippine, Republic, Pacific Ocean, 1980s:17, country:8, governm
2 40 0 Kashgar, Kashi, Kaxgar, muslim:38, india:8, muhammad:7, Jerusalem
3 11 0 Mathematics, Study, Relationship, century:11, art:3, franee:3, ar Passes: 146, best pass: 47, best score: 0.210, worst score: 0.124
Cluster 0, has 41 hits: '(OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam Alfonso Viii, King, Castile Arabia, Desert, Peninsula Arabic Literature, Literature, People Archaeology, Greek, Archaio Averros, Arabic, Abu
Black Muslims, Religious, Organization Borneo, Island, World Chess, Game, Skill Christianity, World, Religion Chronology, Science, Division Concubinage, Term, World Costume, Clothing, People Demon, Usage, Spirit
Mar 16 17:39 1993 test.log Emacs buffer Page 10
Egypt, Arab Republic, United Arab Republic Gandhi, Mohandas Karar .1 1, Mahatma Gandhi
Ghana, Kingdom, West :iv.an
Hegira, Hejira, Arabic
Iraq, Irak, Republic
Jacobite Church, Christian, Group
Java, Island, Malay Archipelago
Jew, Usage, Hebrews
Jordan, Hashemite Kingdom, Arabic
Judaism, Culture, Jew
Karbala, City, Iraq
Mahdi, Arabic, Mahdiy
Medina, Medinat-en-nabi, City
Middle East, Region, Geography
Nehru, Indian, Nationalist
Orthodox Church, Major, Branch
Philosophy, Greek, Philosophia
Pottery, Clay, Firing
Punjab, Region, River
Saudi Arabia, Monarchy, Southwest Asia
Shiite, Arabic, Partisan
Sikhs, Follower, Religion
Sudan, Republic, Africa
Trigonometry, Branch, Mathematics
Tobago, Republic, Commonwealth
Tunisia, Republic, Africa
Turkey, Republic, Turkish Trkiye Cumhuriyeti
Vijayanagar, Kingdom, India Cluster 1, has 20 hits: '1980s:17, country:8, government:7, Spanish:5, arab:4, s
Afghanistan, Persian Afghnistn, Republic
Bangladesh, Full, People Republic
Berber, Name, Language
Cameroon, Republic, Africa
Chad, Republic, Central
Ethiopia, Abyssinia, Republic
Gambia, Republic, Commonwealth
Gibraltar, Dependency, Promontory
Indonesia, Republic, Island
Iran, Islamic Republic, Republic
Israel, Republic, Middle East
Kenya, Republic, Africa
Libya, Full, Socialist People Libyan Arab Jamahiriyah
Morocco, Arabic, Al-mamlakah
Nigeria, Federal Republic, Republic
Pakistan, Islamic Republic, Republic
Philippine, Republic, Pacific Ocean
Republic, Europe, Portion
Spain, Spanish Espaa, Monarchy
Syria, Arabic Suriyah^ Al-arabiyah Cluster 2, has 40 hits: 'muslim:38, india:8, muhammad:7, Jerusalem:5, delhi:4, p
Fakhruddin Ali, Fifth, President
Algeria, French Algrie, Popular Republic
Allah, Name, Supreme Being
Almeida, Francisco, De
Almoravid, Berber, Dynasty
Asia, Continent, Island Mar 16 17:39 1993 test.log Emacs buffer Page 11
Babism, Religion, Offshoot Balewa, Sir Abubakar Tafawa, Minister Region, Part, Subcontinent Caliphate, Office, Realm Crusade, Expedition, Undertaken Delhi, Old Delhi, City Delhi Sultanate, Muslim, State Dervish, Turkish, Darvsh Fakir, Arabic, Faqir Farabi, Tarkhan, Al-farabi Gansu, Kansu, Province Ghazali, Name, Abu Ha id Muhammad India, Republic, Hindi Bharat Sir Muhammad, Pakistani, Philosopher Islam, World, Religion Islamic Music, Vocal, Art Ja mu, Kashmir, Known Jerusalem, Arabic, Al-qud Jinnah, Muhammad All, Leader Kashgar, Kashi, Kaxgar Kharijite, Arabic, Kharawrij Lebanon, Arabic Lubnan, Republic Malaysia, Monarchy, Commonwealth Malcolm X, Leader, Omaha Mufti, Title, Lawyer Palestine, Region, Extent Pilgrim, Place, Intent Relic, Usage, Body Roger I, Norman, Conqueror Saladin, Leader, Jerusalem
Shivaji Bhonsle, Founder, India Maratha State Tughluq, Muhammad, Sultan Tuni, Tune, City Umar, Al-hajj, West African Cluster 3, has 11 hits: "century:ll, art:3, france:3, architecture:2, sculpture: Africa, Continent, Island Europe, Continent, World
France, French Rpublique Franaise, Republic Gypsy, People, Heritage History, Historiography, Sense Indian Art, Architecture, Art Indian Literature, Literature, Language Islamic Art, Architecture, Art Librar , Repository, Form Mathematics, Study, Relationship Portraiture, Representation, Art
Clustering file pope.txt Non-empty clusters: 3 Clusters: 3 8 Hits Vals Seed, Value:Count
0 50 0 (OTHER), church:12, henry:8, king:7, english:6, roman:6, governme
1 138 0 Benedict Xiv, Pope, Moderation, pope:138, church:28, rome:26, cou 12 0 Angelico, r Italian, florence:10, meo±c J, flόre-itineT:4,~ ddmiή
Mar 16 17:39 1993 test.log Emacs buffer Page 12
Passes: 86, best pass: 34, best score: 0.149, worst score: 0.082
Cluster 0, has 50 hits: "(OTHER), church:12, henry:8, king:7, english:6, roman:6
Aquina, Saint Thomas, Angelic Doctor
Borgia, Cesare, Italian
Bruno, Saint, Carthusian
Bulgaria, Full, People Republic
Canon Law, Greek, Kanon
Carpini, Giovanni, De
Carroll, John, American Roman Catholic
Christianity, World, Religion
Church, England, Anglican Church
Civil War, Conflict, United States
Conrad Iii, King, Germany
Corsica, French Corse, Island
Counter Reformation, Movement, Roman Catholic
Couplet, Poetry, Term
Cranmer, Thoma, Archbishop
Cyril, Methodiu, Saint
Demarcation, Line, Boundary
Duns Scotus, John, Theologian
Easter, Festival, Resurrection
England, Latin Anglia, Portion
English Literature, Literature, England
Erigena, John Scotus, Scholar
Este, Italian, Family
Europe, Continent, World
Felix V, Last, Antipope
Ferdinand I, Naple, King
Feuillant, French, Organizations-one
Finland, Finnish Suomi, Republic
Fisher, Saint John, English Christian
France, French Rpublique Franaise, Republic
Gardiner, Stephen, English
Germany, Country, Europe
Henry Viii, King, England
Henry Iv, France, Bourbon
Holy Roman Empire, Eatity, Europe
Hungary, Hungarian Magyarorszg, Republic
Ireland, Geography, Island Italian Italia, Republic, Europe
Knight, Saint John, Jerusalem
Lincoln, Abraham, President
Loyola, Saint Ignatius, Spanish Inigo
Lutheranism, Protestant, Denomination
Mary, Virgin Mary, Mother
Mendelssohn, Mos, German
Middle Ages, Period, European
Modernism, Theology, Philosophy Neri, Saint Philip, Italian Orthodox Church, Majo.. inch Poland, Republic, Polska zeczpospolita Pole, Reginald, English Roman Catholic Cluster 1, has 138 hits: 'pope:138, church:28, rome:26, council:23,, papacy:23, Adrian I, Pope, Power Adrian Iv, Pope, Englishman Adrian Vi, Pope, Dutchman
Mar 16 17:39 1993 test,log Emacs buffer Page 13
Alexander Iii, Pope, Authority
Alexander Vi, Pope, Worldliness
Algardi, Alessandro, Italian
Antonelli, Giaco o, Italian
Arnold, Brescia, 1100-c
Augustinian, Order, Roman Catholic
Bacon, Roger, English Scholastic
Basel, Council, Middle Ages
Bembo, Pietro, Italian
Benedict Viii, Pope, Reformer
Benedict Ix, Pope, 1032- 4
Benedict Xiii, Antipope, Avignon
Benedict Xiv, Pope, Moderation
Benedict Xv, Pope, Church
Bernard, Clairvaux, Saint
Bonaventure, Saint, Theologian
Boniface, Saint, English Benedictine
Boniface Viii, Pope, Power
Boniface Ix, Pope, Papal States
Bossuet, Jacques Bnigne, French Roman Catholic
Bull, Letter, Document
Bull Run, Battle, Manassa
Callistu, Calixtus I, Saint
Callistus Ii, Calixtus Ii, Pope
Callistus Iii, Calixtus Iii, Pope
Canonization, Roman Catholic, Church
Canossa, Village, Reggio
Cardinal, Title, Latin
Catherine, Aragn, Queen
Catherine, Siena, Saint
Cedar Mountain, Battle, Military
Celestine V, Saint, Pope
Celestine Iii, Pope, Born Giacinto Bobo
Censorship, Supervision, Control
Chalcedon, Council, Emperor
Charlemagne, Latin Carolus Magnus, Charle
Charles V, Holy Roman Empire, Holy Roman
Church, State, Relationship
Clement V, Pope, Avignon
Clement Vi, Pope, Church
Clement Vii, Pope, Pontificate
Clement Vii, Antipope, Great Schism
Clement Viii, Last, Pope Clement Xiv, Pope, Jesαi
Conciliar Theory, Doctrine, Superiority
Conclave, Latin, Cum
Constance, Council, City
Coptic Church, Christian, Church
Council, Assembly, Doctrine
Crusade, Expedition, Undertaken
Damasus I, Saint, Pope
Damian, Saint Peter, Doctor
Doctor, Church, Christian
Dllinger, Johann Joseph Ignaz, Von
Ecumenical Movement, Movement, Cooperation
Edmund, Abingdon, Saint
Mar 16 17:39 1993 test.log Emacs buffer Page 14
Elector, German Imperial, German Kurfrsten
Eugene Iii, Pope, Cistercian
Eugene Iv, Pope, Dispute
Formosu, Pope, Trial
Franciscan, Order, Friars Minor
Frederick I, Holy Roman Empire, Frederick Barbarossa
Frederick Ii, Holy Roman Empire, Holy Roman
Gallicanism, History, Combination
Gregory I, Saint, Pope
Gregory Ii, Saint, Pope
Gregory Vii, Saint, Pope
Gregory Ix, Pope, Inquisition
Gregory Xi, Pope, Return
Guiscard, Robert, Norman
Henry Ii, Holy Roman Empire, Henry The Saint
Henry Iv, Holy Roman Empire, Holy Roman
Henry V, Holy Roman Empire, German
Hippolytu, Rome, Saint
Honorius I, Pope, Heretic
Infallibility, Theology, Doctrine
Innocent Iii, Pope, Pop
Innocent Iv, Pope, Dominion
Innocent Xi, Pope, King Louis Xiv
Inquisition, Institution, Papacy
Interdict, Roman Catholic, Church
Investiture Controversy, Dispute, Church
Jesuit, Society, Jesu
Joan, Pope, Female
John Ii, Pope, Born Mercurius
John Viii, Pope, Ablest
John Xii, Pope, Boy Pope
John Xxi, Pope, Pontiff
John Xxii, Pope, Second
John Xxiii, Antipope, Born Baldassare Cossa
John Xxiii, Pope, Era
John, John Lackland, King
John Paul I, Pope, Born Albino Luciani John Paul Ii, Pope, N -. lian
Jubilee, Jew, Sabbatical
Julius Ii, Pope, Reign
K.ulturkampf, German, Culture
Langton, Stephen, English
Lateran Councils, Council, Roman Catholic
Lateran Treaty, Designation, Agreement
Leo Iii, Saint, Pope
Leo Ix, Saint, Pope
Leo X, Pope, Renaissance
Leo Xiii, Pope, Modern
Louis Iv, German, Ludwig Iv
Lyon, Council, Church
Martin I, Saint, Pope
Martin Iv, Pope, Born Simon
Martin V, Pope, Election
Molino, De, Spanish Roman Catholic
Nicholas Iii, Pope, Papal States
Nichola, Cusa, German
Mar 16 17:39 1993 test.log Emacs buffer Page 15
Occam, William, 1285-1349 Otto Iii, Holy Roman, Emperor Otto Iv, Otto, Brunswick Papacy, Office, Pope
Papal States, Church, Pontifical States Paschal Ii, Pope, Reign Paul" V, Pope, Born Camillo Borghese Paul Vi, Pope, Second Vatican Council Pepin, Short, Mayor Peter Pence, Offering, Pope Philip Iv, France, The Fair Photiu, 820-91, Patriarch Pico Delia Mirandola, Giovanni, Conte Pius Ii, Pope, Writer Pius Iv, Pope, Conclusion Pius V, Saint, Pope Pius Vi, Pope, Reign Pius Vii, Pope, Napoleon Pius Ix, Pope, Pontificate Pius X, Saint, Pope Pius Xi, Pope, Path Pius Xii, Pope, World War Ii Pope, Latin, Papa Cluster 2, has 12 hits: 'florence:10, medici:5, florentine:4, dominican:3, chur Alberti, Leon Battista, Italian Albertus Magnus, Saint, Albert Angelico, Fra, Italian Cellini, Benvenuto, Florentine Dante Alighieri, Italian, Poet Dominican, Friars Preachers, Member Ferrara-florence, Council, Basel-ferrara-florence Florence, Italian Firt. z. Florentia Guicciardini, Francesco, Italian Leonardo, Da, Vinci Medici, Lorenzo, De Michelangelo, Creator, History
Clustering file sound.txt Non-empty clusters: 5 Clusters: 5 I Hits Vals Seed, Value:Count
0 68 0 (OTHER), music:10, american civil war:6, state:6, bass:5, century:
1 57 0 Mach Number, Aerodynamics, Mechanic, sound:51, instruments, pitch
2 8 0 Letter, Vowel, English, sound:6, long:3, letter:3, sign:2, atlanti
3 19 0 Linguistics, Study, Language, language:14, english:9, speech:6, so
4 11 0 Vowel, English, Alphabet, sound;11, alphabets, letter:9, hierogly Passes: 103, best pass: 74, best score: 0.173, worst score: 0.072 Cluster 0, has 68 hits: '(OTHER), music:10, american civil war:6, state:6, bass:
Amati, Family, Italian
American Indian Languages, Language, People
American Indians, People, America
Audiovisual Education, Planning, Preparation
Band, Ensemble, Brass
Transaction, Service, Consumer
Mar 16 17:39 1993 test.log Emacs buffer Page 16
Bird, Name, Member
Bremerton, City, Kitsap County
British Columbia, Province, Canada
Bronx, Borough, New York City
Building Construction, Procedure, Erection
Circulatory System, Anatomy, Physiology
Communication, Method, Receiving
Connecticut, New England, United States
Copyright, Body, Right
Currency, Economics, Term
Deep-sea Exploration, Investigation, Chemical
Bass, Member, Violin
Drama, Dramatic Arts,-Form
Edison, Thomas Alva, Inventor
Encyclopedia, Encyclopaedia, Greek
Firework, Device, Material
Floor, Floor Coverings, Ceiling
Folk Dance, Dance, Member
Folk Music, Music, Performance
Frequency, Term, Science
Golden Globe Awards, Motion, Picture
Harmony, Music, Combination
Harpsichord, Italian, Cembalo
Insect, Name, Animal
Jazz, Type, Music Jet Propulsion, Thrus.., parting
Mississippi, East South Central, U.S.
Motion Picture Arts, Science, Academy
Music, Vocal, Part
Music, Western, Europe
Musical Form, Arrangement, Element
Mystic, Village, Stonington
Navigation, Science, Position
Haven, City, New Haven County
North Carolina, South Atlantic, U.S.
Ocean, Oceanography, Body
Orchestra, Ensemble, Instrument
Orchestration, Art, Musical
Philosophy, Greek, Philosophia
Pianoforte, Keyboard, Musical
Social Dance, Term, Dance
Radio, System, Communication
Rhode Island, Full, State
Scale, Music, Italian
Scott, Robert Falcon, Officer
Seattle, City, Seat
Seward Peninsula, Peninsula, Alaska
Snake, Reptile, Name
Sonata, Italian, Sonare
Tacoma, City, Seat
Telephone, Communication, Instrument
Television, Tv, Transmission
Theater Production, Mean, Form
United States, America, Republic
Valdez, City, Alaska
Video Recording, Process, Recording
Mar 16 17:39 1993 test.log Emacs buffer Page 17
Viol, Instrument, Century Washington, State, U.S. Wave Motion, Physic, Mechanism Whale, Mammal, Order Yachting, Operation, Boat Zither, Instrument, String Cluster 1, has 57 hits-- 'sound:51, instruments, pitch:7, string:5, recordings Acoustics, Greek, Akouein Aerodynamics, Branch, Mechanic Airplane, Craft, Action Albemarle Sound, Inlet, Atlantic Ocean Bell, Instrument, Percussion Chaplin, Charlie, Name Clair, Ren, Name Digital Audio Tape, Dat, Tape De Forest, Lee, Inventor Doppler Effect, Physic, Variation Ear, Organ, Hearing Edmond, City, Snohomish County Electronic Music,
Exxon Valdez, Oil
Falkland Islands,
Figure imgf000056_0001
Fluid Mechanics, Science, Action
Grunt, Name, Fish
Guitar, Instrument, Lute
Harmonic, Vibration, Primary
Harp, Instrument, Run
Hearing, Main, Sense
Hearing Aid, Device, Sound
Mach Number, Aerodynamics, Mechanic
Microphone, Device, Energy
Midi, Acronym, Musical Instrument Digital Interface
Motion Picture, Sequence, Photograph
Motion Pictures, History, Development
Music, Movement, Sound
Musical Instruments, Tool, Scope
Noise, Physic, Signal
Oboe, Wind, Instrument
Organ, Instrument, Air
Petroleum, Oil, Bituminou
Phonograph, Known, Player
Physic, Science, Constituent
Prince William Sound, Inlet, Gulf
Propeller, Device, Force
Puget Sound, Arm, Pacific Ocean
Radiometer, Instrument, Intensity
Reflection, Physic, Phenomenon
Singing, Use, Voice
Sonar, Acronym, Sound Navigation And Ranging
Sound, Phenomenon, Sense
Determination, Depth, Body
Sound Recording, Reproduction, Conversion
Supersonics, Branch, Physic
Synthesizer, Computer, Peripheral
Tone, Music, Sound
Transformer, Device, Coi1
Mar 16 17:39 1993 test.log Emacs buffer Page 18
Tyndall, John, Physicist Ultrasonics, Branch, Physic Ventriloquism, Art, Sound Violin, Instrument, Member Viscount Melville Sound, Arm, Arctic Ocean Voiceprint Identification, Method, Person Warner Brothers, Motion, Picture Xylophone, Greek, Xylon Cluster 2, has 8 hits: 'sound:6, long:3, letter:3, sign:2, atlantic ocean:2, mi Animal Behavior The, Behavior, Animal C, English, Romance-language Diacritic Mark, Sign, Mark Island Sound, Body, Salt Letter, Vowel, Engli-
Pamlico Sound, Inlet, Atxantic Ocean
Rhyme, Likeness, Sound , Letter, English Cluster 3, has 19 hits: *language:14, english:9, speech:6, sound-.6;' word:5, spok
American English, English, Spoken
Celtic Languages, Indo-european, Family
Chinese Language, Language, Chinese
Cuneiform, Latin, Cuneu
Deafness, Inability, Definition
English Language, Medium, Communication
English Literature, Literature, England
Etymology, Branch, Linguistics
Grammar, Branch, Linguistics
Greek Language, Language, People
Hieroglyph, Character, System
Japanese Language, Language, Spoken
Language, Communication, Being
Linguistics, Study, Language
Phonetics, Branch, Linguistics
Poetry, Form, Expression
Semantics, Greek, Seπtantiko
Versification, Art, Verse
Writing, Method, Intercommunication Cluster 4, has 11 hits: 'sound:11, alphabet:9, letter:9, hieroglyph:8, english:7
Vσwel, English, Alphabet
Alphabet, Alpha, Beta
F, Letter, Consonant
K, Letter, English
L, Letter, English
M, Letter, English
Q, Letter, English
R, Letter, English
U.. 21st, Letter
X, Letter, English
Y, Letter, English
Clustering file strike.txt Non-empty clusters: 4 Clusters: 4 « Hits Vals Seed, Value:Count
Mar 16 17:39 1993 test.log Emacs buffer Page 19
0 6 0 (OTHER), electron:2, beam:2, tube:2, television:2
1 11 0 Gary, City, Lake County, strike:10, united states:3, presidents,
2 10 0 National Labor Relations Act, Nlra, Law, labor:9, strike:8, union
3 15 0 Poland, Republic, Polska Rzeczpospolita, government:11, 1980s:8, Passes: 453, best pass: 208, best score: 0.445, worst score: 0.154
Cluster 0, has 6 hits: '(OTHER), electron:2, beam:2, tube:2, television-^' Baseball, Game, Skill Cathode-ray Tube, El*- : , Tube
Napoleon I, Emperor, x'ench
Russia, History, Empire
Television, Tv, Transmission
Warfare,. Use, Force Cluster 1, has 11 hits: 'strike:10, united states:3, presidents, injunctions,
Chartism, Reform, Movement
Coolidge, John, Calvin
Defense Systems, Defense, Country
Deb, Eugene Victor, American Socialist
Dollfuss, Engelbert, Chancellor
Fault, Geology, Line
Gary, City, Lake County
Homestead Strike, Labor, Strike
Pullman Strike, See, Deb
Sound, Phenomenon, Sense
Ueberroth, Peter Victor, Sport Cluster 2, has 10 hits: 'labor:9, strike:8, union:7, labor-management relations
Cleveland, Grover, 22d
Industrial Workers, World, Former
International Ladies, Garment Workers, Union
Knight, Labor, Union
Labor Relations, Transaction, Determination
Lockout, Labor, Relation
National Labor Relations Act, Nlra, Law
Labor, Relation, Practice
Strike, Labor, Relation
Trade Unions, United States, Labor Cluster 3, has 15 hits: 'government:11, 1980s:8, war:6, country:4, soviet:3, pa
Colombia, Republic, South America
France, French Rpublique Franaise, Republic
Ghana, Country, Africa
Britain, United Kingdom, Great Britain
Illinoi, East North Central, U.S.
Italian Italia, Republic, Europe
Japan, Japanese Dai, Great
Northern Ireland, Part, United Kingdom
Poland, Republic, Polska Rzeczpospolita
Russian Revolution, Event, Russia
Spain, Spanish Espaa, Monarchy
Sweden, Konungariket Sverige, Kingdom
Union, Soviet Socialist Republics, Russian Soyuz Sovyetskikh Sotsialisticheski
United States, America, Republic
World War Ii, Military, Conflict
Clustering file utah.txt Non-empty clusters: 5 _ Clusters: 5
Mar 16 17:39 1993 test.log Emacs buffer Page 20
# Hits Vals Seed, Value:Count 0 2 0 (OTHER), stateS
1 3 0 Utah, University, Institution, Utah:3
2 9 0 City, Davis County, Utah, city:8, Utah:8, mormon:5, state:4, name:
3 3 0 Mormonism, World, Religion, mormonism:3, polygamy:3, Smith:3, morm
4 7 0 Green, River, Utah, utah:6, colorado:5, mi:4, km:4, rivers, yampa Passes: 764, best pass: 515, best score: 0.652, worst score: 0.147
Cluster 0, has 2 hits: '(OTHER), states'
United States, America, Republic
State, U.S. , North Cluster 1, has 3 hits: 'Utah:3'
Bushnell, Nolan Kay, Founder-chairman
Orem, City, Utah County
Utah, University, Institution Cluster 2, has 9 hits: 'city:8, Utah:8, mormon:5, state:4, name:3, lake:3, salt
City, Davis County, Utah
Deseret, State, Name
Logan, City, Seat
Hurray, City, Salt Lake County
Nevada, State, U.S.
Provo, City, Seat
Salt Lake City, City, Capital
Utah, State, U.S.
Utah Lake, Freshwater, Lake Cluster 3, has 3 hits: 'mormonism:3, polygamy:3, smith:3, mormon:3, church , ki
Mormonism, World, Religion
Smith, Joseph, Religious
Brigham, Religious, Leader Cluster 4, has 7 hits: 'Utah:6, Colorado:5, mi:4, km:4, river:2, yampa:2, uteS,
Colorado, State, United States
Colorado, River, North America
Salt Lake, Body, Salt
Green, River, Utah
Hovenweep National Monument, Colorado, Utah
Uinta Mountains, Range, Mountain
Ute, North American Indian, Tribe

Claims

CLAIMSI claim:
1. A system for case-based organizing and querying of a database, said database having a set of objects, said system comprising means for organizing said database, by examining each object in said database and associating that object with a first set of property values; means responsive to a query, by associating said query with a second set of property values and performing matching on the objects of the database for objects which are similar.
2. A system as in claim 1, wherein said objects comprise text.
3. A system as in claim 1, wherein said first set of property values comprise keywords or other indicators of content.
4. A system as in claim 1, wherein said first set of property values comprise those words which appear more frequently in the document than in the database at large.
5. A system as in claim 1, wherein said first set of property values comprise those words which appear in a predetermined section of text of the object.
6. A system as in claim 1, wherein said first set of property values comprise those words which appear in a title of the object.
7. A system as in claim 1, wherein said matching is case-based matching or other fuzzy associative matching.
8. A system as in claim 1, wherein said query comprises tex .
9. A system as in claim 1, wherein said means responsive to a query associates said query with keywords or other indicators of its content.
10. A system as in claim 1, comprising means for presenting a set of matched objects in response to said query.
11. A system as in claim 1, comprising means responsive to refinement of said query.
12. A system as in claim 1, comprising means responsive to iterative refinement of said query.
13. A system as in claim 12, wherein said means responsive to iterative refinement uses a case-based technique.
14. A system as in claim 1, comprising means for ordering said set of matched objects in response to quality of match.
15. A system as in claim 1, comprising means for organizing said set of matched objects.
16. A system as in claim 15, wherein said means for organizing comprises means for grouping said set of matched objects into a set of clusters.
17. A system as in claim 15, wherein said means for organizing comprises means for grouping said set of matched objects into a set of clusters of objects which have similar properties, which relate to similar content, which have similar likelihood to be of relevance to the query, or which have similar likelihood to be of interest to an operator posing the query.
18. A system as in claim 15, comprising means for generating suggestions for iterative refinement of said query.
19. A system as in claim 18, wherein said means for generating is responsive to a result of organizing matched objects.
PCT/US1994/007569 1993-07-07 1994-07-05 Case-based organizing and querying of a database WO1995002221A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU73236/94A AU7323694A (en) 1993-07-07 1994-07-05 Case-based organizing and querying of a database

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US8830793A 1993-07-07 1993-07-07
US08/088,307 1993-07-07

Publications (1)

Publication Number Publication Date
WO1995002221A1 true WO1995002221A1 (en) 1995-01-19

Family

ID=22210607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1994/007569 WO1995002221A1 (en) 1993-07-07 1994-07-05 Case-based organizing and querying of a database

Country Status (2)

Country Link
AU (1) AU7323694A (en)
WO (1) WO1995002221A1 (en)

Cited By (143)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6263333B1 (en) 1998-10-22 2001-07-17 International Business Machines Corporation Method for searching non-tokenized text and tokenized text for matches against a keyword data structure
US6336029B1 (en) 1996-12-02 2002-01-01 Chi Fai Ho Method and system for providing information in response to questions
US6498921B1 (en) 1999-09-01 2002-12-24 Chi Fai Ho Method and system to answer a natural-language question
US6571240B1 (en) 2000-02-02 2003-05-27 Chi Fai Ho Information processing for searching categorizing information in a document based on a categorization hierarchy and extracted phrases
EP1260919A3 (en) * 2001-05-22 2004-10-20 ICMS Group n.v. A method of storing, retrieving and viewing data
CN1320481C (en) * 2004-11-22 2007-06-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
US7702541B2 (en) 2000-08-01 2010-04-20 Yahoo! Inc. Targeted e-commerce system
US7809663B1 (en) 2006-05-22 2010-10-05 Convergys Cmg Utah, Inc. System and method for supporting the utilization of machine language
US7996398B2 (en) 1998-07-15 2011-08-09 A9.Com, Inc. Identifying related search terms based on search behaviors of users
US8379830B1 (en) 2006-05-22 2013-02-19 Convergys Customer Management Delaware Llc System and method for automated customer service with contingent live interaction
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8977584B2 (en) 2010-01-25 2015-03-10 Newvaluexchange Global Ai Llp Apparatuses, methods and systems for a digital conversation management platform
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US9075783B2 (en) 2010-09-27 2015-07-07 Apple Inc. Electronic device with text error correction based on voice recognition data
US9190062B2 (en) 2010-02-25 2015-11-17 Apple Inc. User profiling for voice input processing
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9311043B2 (en) 2010-01-13 2016-04-12 Apple Inc. Adaptive audio feedback system and method
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9361886B2 (en) 2008-02-22 2016-06-07 Apple Inc. Providing text input using speech data and non-speech data
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9412392B2 (en) 2008-10-02 2016-08-09 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9501741B2 (en) 2005-09-08 2016-11-22 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9691383B2 (en) 2008-09-05 2017-06-27 Apple Inc. Multi-tiered voice feedback in an electronic device
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9946706B2 (en) 2008-06-07 2018-04-17 Apple Inc. Automatic language identification for dynamic text processing
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10019994B2 (en) 2012-06-08 2018-07-10 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10078487B2 (en) 2013-03-15 2018-09-18 Apple Inc. Context-sensitive handling of interruptions
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
CN112749207A (en) * 2020-12-29 2021-05-04 大连海事大学 Deep sea emergency disposal auxiliary decision making system based on case reasoning
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11151899B2 (en) 2013-03-15 2021-10-19 Apple Inc. User training by intelligent digital assistant
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5062074A (en) * 1986-12-04 1991-10-29 Tnet, Inc. Information retrieval system and method
US5099426A (en) * 1989-01-19 1992-03-24 International Business Machines Corporation Method for use of morphological information to cross reference keywords used for information retrieval
US5201048A (en) * 1988-12-01 1993-04-06 Axxess Technologies, Inc. High speed computer system for search and retrieval of data within text and record oriented files
US5303361A (en) * 1989-01-18 1994-04-12 Lotus Development Corporation Search and retrieval system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5062074A (en) * 1986-12-04 1991-10-29 Tnet, Inc. Information retrieval system and method
US5201048A (en) * 1988-12-01 1993-04-06 Axxess Technologies, Inc. High speed computer system for search and retrieval of data within text and record oriented files
US5303361A (en) * 1989-01-18 1994-04-12 Lotus Development Corporation Search and retrieval system
US5099426A (en) * 1989-01-19 1992-03-24 International Business Machines Corporation Method for use of morphological information to cross reference keywords used for information retrieval

Cited By (201)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336029B1 (en) 1996-12-02 2002-01-01 Chi Fai Ho Method and system for providing information in response to questions
US6480698B2 (en) 1996-12-02 2002-11-12 Chi Fai Ho Learning method and system based on questioning
US6501937B1 (en) 1996-12-02 2002-12-31 Chi Fai Ho Learning method and system based on questioning
US7996398B2 (en) 1998-07-15 2011-08-09 A9.Com, Inc. Identifying related search terms based on search behaviors of users
US6263333B1 (en) 1998-10-22 2001-07-17 International Business Machines Corporation Method for searching non-tokenized text and tokenized text for matches against a keyword data structure
US6498921B1 (en) 1999-09-01 2002-12-24 Chi Fai Ho Method and system to answer a natural-language question
US6571240B1 (en) 2000-02-02 2003-05-27 Chi Fai Ho Information processing for searching categorizing information in a document based on a categorization hierarchy and extracted phrases
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US7702541B2 (en) 2000-08-01 2010-04-20 Yahoo! Inc. Targeted e-commerce system
US7209914B2 (en) 2001-05-22 2007-04-24 Icms Group N.V. Method of storing, retrieving and viewing data
EP1260919A3 (en) * 2001-05-22 2004-10-20 ICMS Group n.v. A method of storing, retrieving and viewing data
CN1320481C (en) * 2004-11-22 2007-06-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
US9501741B2 (en) 2005-09-08 2016-11-22 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US7809663B1 (en) 2006-05-22 2010-10-05 Convergys Cmg Utah, Inc. System and method for supporting the utilization of machine language
US8379830B1 (en) 2006-05-22 2013-02-19 Convergys Customer Management Delaware Llc System and method for automated customer service with contingent live interaction
US9549065B1 (en) 2006-05-22 2017-01-17 Convergys Customer Management Delaware Llc System and method for automated customer service with contingent live interaction
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9361886B2 (en) 2008-02-22 2016-06-07 Apple Inc. Providing text input using speech data and non-speech data
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9946706B2 (en) 2008-06-07 2018-04-17 Apple Inc. Automatic language identification for dynamic text processing
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9691383B2 (en) 2008-09-05 2017-06-27 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9412392B2 (en) 2008-10-02 2016-08-09 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US9311043B2 (en) 2010-01-13 2016-04-12 Apple Inc. Adaptive audio feedback system and method
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US9431028B2 (en) 2010-01-25 2016-08-30 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US8977584B2 (en) 2010-01-25 2015-03-10 Newvaluexchange Global Ai Llp Apparatuses, methods and systems for a digital conversation management platform
US9424861B2 (en) 2010-01-25 2016-08-23 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US9424862B2 (en) 2010-01-25 2016-08-23 Newvaluexchange Ltd Apparatuses, methods and systems for a digital conversation management platform
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9190062B2 (en) 2010-02-25 2015-11-17 Apple Inc. User profiling for voice input processing
US9075783B2 (en) 2010-09-27 2015-07-07 Apple Inc. Electronic device with text error correction based on voice recognition data
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10672399B2 (en) 2011-06-03 2020-06-02 Apple Inc. Switching between text data and audio data based on a mapping
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US10019994B2 (en) 2012-06-08 2018-07-10 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US10078487B2 (en) 2013-03-15 2018-09-18 Apple Inc. Context-sensitive handling of interruptions
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11151899B2 (en) 2013-03-15 2021-10-19 Apple Inc. User training by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
CN112749207A (en) * 2020-12-29 2021-05-04 大连海事大学 Deep sea emergency disposal auxiliary decision making system based on case reasoning
CN112749207B (en) * 2020-12-29 2023-06-02 大连海事大学 Case reasoning-based deep sea emergency treatment auxiliary decision-making system

Also Published As

Publication number Publication date
AU7323694A (en) 1995-02-06

Similar Documents

Publication Publication Date Title
WO1995002221A1 (en) Case-based organizing and querying of a database
Green The Greek & Latin Roots of English
Jackson et al. Words, meaning and vocabulary: An introduction to modern English lexicology
Louth et al. Genesis 1-11
Corbett Gender
US6487545B1 (en) Methods and apparatus for classifying terminology utilizing a knowledge catalog
Tuzzi et al. What is Elena Ferrante? A comparative analysis of a secretive bestselling Italian writer
US6199034B1 (en) Methods and apparatus for determining theme for discourse
Treffers-Daller Grammatical collocations and verb-particle constructions in Brussels French: A corpus-linguistic approach to transfer
Broughton Essential Library of Congress subject headings
Jenstad et al. Shakespeare's language in digital media: old words, new tools
Benor et al. A Research Agenda for Comparative Jewish Linguistic Studies
Verheul et al. Using word vector models to trace conceptual change over time and space in historical newspapers, 1840–1914
Mansour et al. Investigating explicitation in literary translation from English into Arabic
Brewer The use of literary quotations in the Oxford English Dictionary
Menon What’s in a name? William Jones,‘philological empiricism’and botanical knowledge making in eighteenth-century India
Jukes A grammar of Makasar: A language of South Sulawesi, Indonesia
Moon Words, frequencies, and texts (particularly Conrad): A stratified approach
Häberl Hebraisms in Mandaic
Li et al. Sacrifice to the wind gods in late Shang China–religious, paleographic, linguistic and philological analyses: An integrated approach
Salkova et al. The genitive case as a grammatical preference of modern press
Boye et al. A Stylistic Reading of Representation of Illegal Migration Tendencies among Gambians in Baaba Sillah’s Péñcum Taakusaan and Juka Fatou Jabang’s The Phoenix
Backfish Writing the right words from left to right: Septuagint translation of wordplay in the Fourth Book of the Psalter
Khabtagaeva The Sartul Buryat Dialect: A Preliminary Analysis
Geraghty The Cape Gooseberry and its Many Fijian and Pacific Names

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK ES FI GB GE HU JP KE KG KP KR KZ LK LT LU LV MD MG MN MW NL NO NZ PL PT RO RU SD SE SI SK TJ TT UA US UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE MW SD AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642